emlékeztető december 1.-én pótoljuk az elmaradt előadást (nov. 24-ről) az a/1 228-as teremben...

65
Emlékeztető • December 1.-én pótoljuk az elmaradt előadást (nov. 24-ről) az A/1 228-as teremben 16-18-ig • December 8.-án tartjuk a pótZH- t a sikertelen ZH-t írók számára az előadás ezért fél órával később kezdődik.

Upload: geraldine-hall

Post on 24-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Emlékeztető

• December 1.-én pótoljuk az elmaradt előadást (nov. 24-ről) az A/1 228-as teremben 16-18-ig

• December 8.-án tartjuk a pótZH-t a sikertelen ZH-t írók számára az előadás ezért fél órával később kezdődik.

Common Gateway Interface

Need for CGI

• HTML/XHTML is static, it is not parameterized;• using only HTML/XHTML, CSS and JS one can not write

dynamic web pages: pages that look differently depending on the user who visit it (client, administrator etc.), pages that display different products depending on what is in a database, pages that should be displayed depending on the value of some parameters.

• using only HTML/XHTML, CSS and JS one can not develop distributed web applications (e-commerce sites, hotel booking, web search applications etc.)

What is CGI?

• a standard protocol for interfacing external application software with the web server

• developed in 1993 at NCSA (National Center for Supercomputing Applications)

• CGI 1.1 specified in RFC 3875, 2004• allows an external executable file to respond to an HTTP

Request from the browser• CGI defines how information is passed from the web server to

the executable program and how information is passed from this back to the server

What is CGI? CGI is an acronym that stands for Common

Gateway Interface is a standard for interfacing external applications with information servers, such as HTTP or Web servers

This interface provides a means for browsers and the server where document resides to communicate and pass information back and forth

Primarily, this is done through the <FORM> tag, but there can be other ways to use CGI effectively, like through Server Side Includes (SSI)

Common Gateway Interface

• CGI is a standard mechanism for: – Associating URLs with programs that can be run by

a web server.– A protocol (of sorts) for how the request is passed

to the external program.– How the external program sends the response to

the client.

What is CGI?

CGI, permits interactivity between a client and a host operating system through the World Wide Web via the Hyper Text Transfer Protocol (HTTP)

CGI program can be written in C or C++, Perl, ASP, PHP, Python, TCL, shells, and many others languages and scripts

Drawbacks of CGI

• because no special web-oriented language is used for writing CGI scripts (e.g. shell, perl, c/c++, python etc.) errors are highly probable and so, security vulnerabilities due to these problems

• usually a new process is created for each run of a CGI script; this increases the load on the server

• CGI scripts are executable file; they can write/delete from the local disk, so this is a security vulnerability

CGI URLs• There is some mapping between URLs and CGI

programs provided by a web sever. The exact mapping is not standardized (web server admin can set it up).

• Typically:– requests that start with /CGI-BIN/ , /cgi-bin/

or /cgi/, etc. refer to CGI programs (not to static documents).

Examples of uses for CGI

Forms – forms on web sites allow the user to enter information

which is processed by CGI and mailed to an administrator or logged

On-the-Fly Pages – web pages can be created dynamically (as needed)

with up-to-date information. Database Interaction

– an application of on-the-fly page creation. Web pages can be created using information read from a database, or a web site form can allow a user to update database entries

Examples of uses for CGI

Logging / Counters – a log file can record traffic data updated with information

on each visitor. A counter can be included on the web page to advertise traffic.

Animation – "server-push" programs can be used to feed the client

successive images in an animated sequence.

Catalogs, Search engines

Requirements

Web server (NCSA, Apache, IIS, Microsoft Personal Web server etc.)

Compiler (C/C++) or Interpreter (Perl), PHP, ASP

Web browser (NN, IE etc.)

Writing CGI programs involves

Obtaining input from a user or from a data file.

Storing that input in program variables. Manipulating those variables to achieve some

desired purpose, andSending the results to a file or video display.

CGI Programming

CLIENT

HTTPSERVER

CGI Program

http request

http response

setenv(), dup(),

fork(), exec(), ...

First CGI example (in shell)

#!/bin/bash

echo Status: 200 OK

echo Content-Type: text/html

echo

echo

echo "<html><head></head>"

echo "<body>"

echo "Hello world."

echo "</body></html>"

Getting parameters from the client/browser

• parameters can be passed from the user to the CGI script through an html <form><form action=“script.cgi” method=“GET | POST”>

<input type=“…” name=“input1” />

<input type=“…” name=“input2” />

<input type=“…” name=“inputN” />

</form>

• the script.cgi will get the parameters as:input1=val1&input2=val2& … &inputN=valN

Getting parameters from the client/browser (2)

• parameters can be sent through the GET method (in the HTTP Request header) => the CGI script will receive the parameters from the web server in an environment variable $QUERY_STRING

• or they can be passed through the POST method (in the body of the HTTP Request) => the CGI script will receive the parameters from the web server in the standard input

Request CGI program

• The web server sets some environment variables with information about the request.

• The web server fork()s and the child process exec()s the CGI program.

• The CGI program gets information about the request from environment variables.

HTTPSERVER

CGI Program

stdin

stdout

EnvironmentVariables

STDIN, STDOUT

• Before calling exec(), the child process sets up pipes so that stdin comes from the web server and stdout goes to the web server.

• In some cases part of the request is read from stdin.

• Anything written to stdout is forwarded by the web server to the client.

Environment Variables (What are they used for?)

In order to pass data from the server to the script, the server uses command line arguments along with environment variables.

The Environment Variables are set when the server executes a CGI Script.

Environment Variables allow the CGI Script to reference variables that might be wanted for the Script output.

There are two types of environment variables:Non-Request specific variables - those set for every

requestRequest specific variables - those that are dependent on

the request being fulfilled by the CGI Script

Data are obtained in ENVIRONMENT variables.

The ENVIRONMENT variables are shown below in the table

ENVIRONMENTVARIABLE DESCRIPTION

SERVER_NAME The server's Host name or IP address .

SERVER_SOFTWARE The name and version of the server-software that is answering the client requests.

SERVER_PROTOCOLThe name and revision of the information protocol the request came in with.

REQUEST_METHOD The method with which the information request was issued.

QUERY_STRING The query information passed to the program. It is appended to the URL with a "?".

DOCUMENT_ROOT It displays the server document root directory

CONTENT_TYPE The MIME type of the query data, such as "text/html".

CONTENT_LENGTH The length of the data in bytes, passed to the CGI program through standard input.

GATEWAY_INTERFACE The revision of the CGI that the server uses.

HTTP_USER_AGENT The browser the clients is using to issue the request.

HTTP_REFERER The URL of the document that the client points to before accessing the CGI program.

ENVIRONMENTVARIABLE DESCRIPTION

CONTENT_TYPE The MIME type of the query data, such as "text/html".

CONTENT_LENGTH The length of the data in bytes, passed to the CGI program through standard input.

HTTP_REFERER The URL of the document that the client points to before accessing the CGI program.

GATEWAY_INTERFACE The revision of the CGI that the server uses.

HTTP_USER_AGENT The browser the client is using to issue the request.

Where does the data for the CGI Script come from?

The most common way for data to be sent to CGI Scripts is through HTML forms. HTML forms use a multitude of input methods to get data to a CGI Script. Some of these input types are radio buttons, check boxes, text input and pull-down menus.

After the input necessary for the Script is determined and what type of input are going to be used, there are two main ways to receive information using the form. The methods are Get and Post. The information will be encoded differently depending on on which method is used.

GET Method

The form data is encoded and then appended to the URL after ? mark

The information contained in the part of the URL after the ? mark is called the QUERY_STRING, which consists of a string of name=value pairs separated by ampersands (&)

GET http://www.ncsi.iisc.ernet.in/cgi-bin/example/simple.pl?first=Jason&last=Nugent

Example 3

GET Method All the form data is appended to the URL QUERY_STRING contains query information passed to

the program When user clicks the submit button from a html form

, browser generates a HTTP request GET /Scrits/Workshop/simple2.pl?

u11/11/99name=Rani&service=CAS&entrydate= 26%2F11%2F1999 HTTP/1.0 and sends to the web browser.

GET Method Cont…

The continuous string of text that follows the question mark represents the query string.

In response to this request from the browser, the server executes the script simple2.pl and places the string

uname=Rani&service=CAS&entrydate= 26%2F11%2F1999, in the QUERY_STRING environment variable and HTTP/1.0 in SERVER_PROTOCOL

CGI program reads these environment variables, process, and passes some results to Web Server

Request Method: Get

• GET requests can include a query string as part of the URL:

GET /cgi-bin/finger?hollingd HTTP/1.0

RequestMethod Resource

Name

Delimiter

QueryString

/cgi-bin/finger?hollingd

• The web server treats everything before the ‘?’ delimiter as the resource name

• In this case the resource name is the name of a program.

• Everything after the ‘?’ is a string that is passed to the CGI program.

Simple GET queries - ISINDEX• You can put an <ISINDEX> tag inside an HTML

document.• The browser will create a text box that allows

the user to enter a single string.• If an ACTION is specified in the ISINDEX tag,

when the user presses Enter, a request will be sent to the server specified as the ACTION.

ISINDEX Example

Enter a string: <ISINDEX ACTION=http://foo.com/search.cgi>Press Enter to submit your query.

If you enter the string “blahblah”, the browser will send a request to the http server at foo.com that looks like this:

GET /search.cgi?blahblah HTTP/1.1

What the CGI sees

• The CGI Program gets REQUEST_METHOD using getenv:

char *method;method = getenv(“REQUEST_METHOD”);if (method==NULL) … /* error! */

Getting the GET• If the request method is GET:

if (strcasecmp(method,”get”)==0)

• The next step is to get the query string from the environment variable QUERY_STRING

char *query; query = getenv(“QUERY_STRING”);

Send back http Response and Headers:

• The CGI program can send back a http status line :

printf(“HTTP/1.1 200 OK\r\n”);

• and headers:printf(“Content-type: text/html\r\n”);printf(“\r\n”);

Important!• A CGI program doesn’t have to send a

status line (the http server will do this for you if you don’t).

• A CGI program must always send back at least one header line indicating the data type of the content (usually text/html).

• The web server will typically throw in a few header lines of it’s own (Date, Server, Connection).

Simple GET handler

int main() { char *method, *query; method = getenv(“REQUEST_METHOD”); if (method==NULL) … /* error! */ query = getenv(“QUERY_STRING”); printf(“Content-type: text/html\r\n\r\n”); printf(“<H1>Your query was %s</H1>\n”,

query);return(0);

}

URL-encoding• Browsers use an encoding when sending

query strings that include special characters.– Most nonalphanumeric characters are encoded

as a ‘%’ followed by 2 ASCII encoded hex digits.– ‘=‘ (which is hex 3D) becomes “%3D”– ‘&’ becomes “%26”

More URL encoding

• The space character ‘ ‘ is replaced by ‘+’.– Why? (think about project 2 parsing…)

• The ‘+’ character is replaced by “%2B”

Example: “foo=6 + 7” becomes “foo%3D6+%2B+7”

Security!!!

• It is a very bad idea to build a command line containing user input!

• What if the user submits: “ ; rm -r *;”

grep ; rm -r *; /usr/dict/words

Beyond ISINDEX - Forms

• Many Web services require more than a simple ISINDEX.

• HTML includes support for forms:– lots of field types– user answers all kinds of annoying questions– entire contents of form must be stuck together

and put in QUERY_STRING by the Web server.

Form Fields

• Each field within a form has a name and a value.

• The browser creates a query that includes a sequence of “name=value” substrings and sticks them together separated by the ‘&’ character.

Form fields and encoding

• 2 fields - name and occupation.• If user types in “Dave H.” as the name and

“none” for occupation, the query would look like this:

“name=Dave+H%2E&occupation=none”

HTML Forms

• Each form includes a METHOD that determines what http method is used to submit the request.

• Each form includes an ACTION that determines where the request is made.

What a CGI will get

• The query (from the environment variable QUERY_STRING) will be a URL-encoded string containing the name,value pairs of all form fields.

• The CGI must decode the query and separate the individual fields.

Form example

<html>

<head></head>

<body>

<form action="cgi-bin/post_ex.cgi" method="POST">

User: <input type="text" size="20" name="user" /><br />

Password: <input type="text" size="20" name="pass" /><br />

<input type="submit" value="Submit" name="submit" />

</form>

</body>

</html>

Getting parameters through GET

#!/bin/bash

echo "Content-Type: text/html"echoecho

echo "<html><head></head>"echo "<body>"echo "Parameters are:<br />"user=`echo $QUERY_STRING | cut -d"&" -f 1 | cut -d"=" -f 2`pass=`echo $QUERY_STRING | cut -d"&" -f 2 | cut -d"=" -f 2`

echo $user $passecho "</body></html>"

POST Method

Difference between Get and Post method is primarily defined in terms of form data encoding

The information is sent after all request headers have been sent to the server

With the post method, the server passes the information contained in the submitted form as standard input (STDIN) to the CGI program

POST Method ...The length of the information (in bytes) is also

sent to the server, to let the CGI script know how much information it has to read

The environment variable CONTENT_LENGTH contains information about how much amount of data being transferred from html form.

Examples 4

POST MethodData from the form is encoded as string of

data divided in NAME/VALUE pair and separated by &.

In case of POST methods with the same html form it will generate the request

POST Method Cont… POST /Scripts/simple2.pl HTTP/1.0 Accept: text/html Accept: text/plain User-Agent: Content-type: application/ x-www-urlencoded Content-length: 28 uname=Rani&service=CAS&entrydate=26%2F11%2F1999

POST Method Cont…With the post method, the server passes the

information contained in the submitted form as standard input (STDIN) to the CGI program.

CONTENT_LENGTH contains information about how much amount of data being transferred from html form.

HTTP Method: POST

• The HTTP POST method delivers data from the browser as the content of the request.

• The GET method delivers data (query) as part of the URI.

GET vs. POST

• When using forms it’s generally better to use POST:– there are limits on the maximum size of a GET

query string (environment variable)– a post query string doesn’t show up in the

browser as part of the current URL.

HTML Form using POST

Set the form method to POST instead of GET.

<FORM METHOD=POST ACTION=…>

The browser will take care of the details...

CGI reading POST

• If REQUEST_METHOD is a POST, the query is coming in STDIN.

• The environment variable CONTENT_LENGTH tells us how much data to read.

Getting parameters through POST#include <stdio.h>#include <string.h>

main() {char line[255], *userline, *passline, *s;char user[20], pass[20];

printf("Content-Type: text/html\n\n");printf("<html><head></head>");printf("<body>");fgets(line, 255, stdin);printf("Parameters are: <br />");

userline = strtok(line, "&");passline = strtok(0, "&");

user[0] = 0;if (userline) {

s = strtok(userline, "=");s = strtok(0, "=");if (s) strcpy(user, s);

}

pass[0] = 0;if (passline) {

s = strtok(passline, "=");s = strtok(0, "=");if (s) strcpy(pass, s);

}printf("%s, %s", user, pass);

printf("</body>");printf("</html>");

}

Possible Problemchar buff[100];char *clen = getenv(“CONTENT_LENGTH”);

if (clen==NULL) /* handle error */

int len = atoi(clen);

if (read(0,buff,len)<0) … /* handle error */pray_for(!hacker);

CGI Method summary

• GET:– REQUEST_METHOD is “GET”– QUERY_STRING is the query

• POST:– REQUEST_METHOD is “POST”– CONTENT_LENGTH is the size of the query (in

bytes)– query can be read from STDIN

What are the Drawbacks of using CGI?

CGI applications can be slowed down considerably if network is slow

If your script is long or has to do a lot of processing, your visitor will have to wait a bit until your script is finished running

Biggest concern with CGI programs is security

Client Side Scripting Client-side programming is based on the idea that

the computer which the client is using to browse the web has quite a bit of CPU power sitting there doing nothing.

Meanwhile, web servers are being tasked to death handling hundreds of CGI requests above and beyond their regular duties.

Thus, it makes sense to share some of that burden between the client and server by taking some of the processing load off the server and giving it to the client.

Disadvantages of Client Side Scripting

Browser-Dependent Client-Side Scripts– Different set of codes for both the browsers

Secure Source Code of Client-Side Scripts.Pages Take Longer to DownloadProgram Scope Is Limited to a Single HTML

PageNo Direct Access to System Objects

Which Should I Use? Client- or Server-Side?

If you want to have dynamic client forms with client-side validation, you must use client-side scripting.

If you want your site to have highly interactive pages, you should use client-side scripting.

If you need to provide your client with advanced functionality that can be created only using ActiveX controls, you must use client-side scripting.

Which Should I Use? Client- or Server-Side? Cont…

If you want to control the user's browser (that is, you want to turn off the menus and place the browser in kiosk mode), you must use client-side scripting

If your Web site must work with every browser on the market, and you do not want to create several different versions for different browsers, you should avoid client-side scripting

If you want to protect your source code, you must use only server-side scripting. All client-side source code is transferred to the browser.

Which Should I Use? Client- or Server-Side? Cont…

If you need to track user information across several Web pages to create a "Web application," you must use server-side scripting

If you need to interact with server-side databases, you must use server-side scripting.

If you need to use HTTP server variables or check the capabilities of the user's browser, you must use server-side scripting