1 http – hypertext transfer protocol part 1. 2 common protocols in order for two remote machines...

42
1 HTTP – HyperText Transfer Protocol Part 1

Post on 20-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

1

HTTP – HyperText Transfer Protocol

Part 1

2

Common Protocols

• In order for two remote machines to “understand” each other they should – ‘‘speak the same language’’– coordinate their ‘‘talk’’

• The solution is to use protocols• Examples:

– FTP – File Transfer Protocol– SMTP – Simple Mail Transfer Protocol– NNTP – Network News Transfer Protocol– HTTP – HyperText Transfer Protocol

3

Why HTTP was Needed?

• According to Tim Berners-Lee (1991), a protocol was needed with the following features:– A subset of the file transfer protocol– The ability to request an index search– Automatic format negotiation– The ability to refer the client to

another server

4

File System

Proxy Server

Web Server

HTTPRequest

HTTPRequest

HTTP Response

HTTPResponse

www.cs.huji.ac.il:80

http://www.cs.huji.ac.il/~dbihttp://www.cs.huji.ac.il/~dbi

5

DepartmentProxy Server

UniversityProxy Server

IsraelProxy Server

Web Server www.w3.org:80

6

Terminology• User agent: client which initiates a

request (browser, editor, Web robot, …)• Origin server: the server on which a

given resource resides (Web server a.k.a. HTTP server)

• Proxy: acts as both a server and a client• Gateway: server which acts as

intermediary for other servers• Tunnel: acts as a blind relay between two

applications – we can implement a custom protocol using HTTP tunneling

7

Resources

• A resource is a chunk of information that can be identified by a URL (Universal Resource Locator)

• A resource can be– A file– A dynamically created page

• What we see on the browser can be a combination of some resources

8

Universal Resource Locator

• There are other types of URL’s– mailto:<account@site>– news:<newsgroup-name>

protocol://host:port/path#anchor?parameters

http://www.cs.huji.ac.il/~dbi/index.html#info

http://www.google.com/search?hl=en&q=blabla

protocol://host:port/path#anchor?parametersprotocol://host:port/path#anchor?parametersprotocol://host:port/path#anchor?parametersprotocol://host:port/path#anchor?parametersprotocol://host:port/path#anchor?parameters

9

In a URL

• Spaces are represented by “+”• Characters such as &,+,% are

encoded in the form “%xx” where xx is the ascii value in hexadecimal; For example, “&” = “%26”

• The inputs to the parameters are given as a list of pairs of a parameter and a value:var1=value1&var2=value2&var3=value3

10

war&peace Tolstoy

11

http://www.google.com/search?hl=en&q=war%26peace+Tolstoy

12

An HTTP Session

• A basic HTTP session has four phases:1.Client opens the connection (a TCP

connection)2.Client makes a request3.Server sends a response4.Server closes the connection

13

Nesting in PageIndex.html

Left frame Right frame

Jumping fish Fairy icon HUJI icon

What we see on the browser can be a combination of several resources

What we see on the browser can be a combination of several resources

14

Nested Objects

• Suppose a client accesses a page containing 10 inline images, how many sessions will be required to display the page completely?

• The answer is 11 HTTP sessions – why? • Some browsers/servers support a feature

called keep-alive which can keep the connection open until it is explicitly closed

• How can this help?

15

Stateless Protocol

• HTTP is a stateless protocol, which means that once a server has delivered the requested data to a client, the server retains no memory of what has just taken place (even if the connection is keep-alive)

• What are the difficulties in working with a stateless protocol?

• How would you implement a site for buying some items?

• So why don’t we have states in HTTP?

16

The Format of HTTPRequests and Responses

• An initial line • Zero or more header lines • A blank line (i.e., a CRLF by itself), and • An optional message body (e.g., a file,

query data, or query output)

Note: CRLF = “\r\n” (usually ASCII 13 followed by ASCII 10)

17

Headers

• HTTP 1.0 defines 16 headers– None are required

• HTTP 1.1 defines 46 headers– One header (Host:) is required in

requests that are sent to Web servers– A request that is sent to a proxy does

not have to include any header– A response does not have to include

any header

How do we know who is the host when there is no host header?

18

HTTP Requests

19

The Format of a RequestThe Format of a Request

method sp URL sp versionheader

cr lf: value cr lf

header : value cr lfcr lf

Entity Body

headerslines

20

Request Example

GET /index.html HTTP/1.1 [CRLF]Accept: image/gif, image/jpeg [CRLF]User-Agent: Mozilla/4.0 [CRLF]Host: www.cs.huji.ac.il:80 [CRLF]Connection: Keep-Alive [CRLF][CRLF]

21

Request Example

GET /index.html HTTP/1.1Accept: image/gif, image/jpegUser-Agent: Mozilla/4.0Host: www.cs.huji.ac.il:80Connection: Keep-Alive[blank line here]

methodrequest URL

version

headers

22

Request Methods

23

Common Request Methods• GET returns the contents of the

indicated document• HEAD returns the header information

for the indicated document– Useful for finding out info about a

resource without retrieving it

• POST treats the document as an application and sends some data to it

24

More Request Methods

• PUT replaces the content of the document with some data

• DELETE deletes the indicated document• TRACE invokes a remote loop-back of

the request. The final recipient SHOULD reflect the message back to the client

• Usually these methods are not allowed

25

GET Request

• A request to get a resource from the Web

• The most frequently used method• The request has no message body,

but parameters can be sent in the request URL (i.e., the URL without the host part)

26

HEAD Request

• A HEAD request asks the server to return the response headers only, and not the actual resource (i.e., no message body)

• This is useful for checking characteristics of a resource without actually downloading it, thus saving bandwidth

• Used for testing hypertext links for validity, accessibility and recent modification

27

Post Request

• POST request can send data to the server

• POST is mostly used in form-filling– The data filled into the form are

translated by the browser into some special format and sent to a program on the server using the POST command

28

Post Request (cont.)

• There is a block of data sent with the request, in the message body

• There are usually extra headers to describe this message body, like Content-Type: and Content-Length:

• The request URL is a URL of a program to handle the sent data, not a file

• The HTTP response is normally the output of a program, not a static file

29

Post Example

• Here's a typical form submission, using POST:

POST /path/register.cgi HTTP/1.0

From: [email protected]

User-Agent: HTTPTool/1.0

Content-Type: application/x-www-form-urlencoded

Content-Length: 35

home=Ross+109&favorite+flavor=flies

30

Request Headers

31

HTTP 1.1 Request Headers

• The common request headers of HTTP 1.1 are described in the following slides– Accept– Accept-Encoding– Authorization– Connection– Cookie– Host– If-Modified-Since– Referer– User-Agent

32

Accept Request Headers

• Accept– Specifies the MIME types that the client

can handle (e.g., text/html, image/gif)– Server can send different content to

different clients

• Accept-Encoding– Indicates encodings (e.g., gzip) client

can handle

33

More Accept Request Headers

• Accept-Charset• Accept-Language

34

Authorization Request Header

• Authorization– User identification for password-

protected pages– Instead of HTTP authorization, use

HTML forms to send username/password and store in state (e.g., session object )

35

Connection Request Header

• Connection– Connection: keep-alive means that the

browser can handle persistent connection– Keep-alive is the default in HTTP 1.1– In a persistent connection, the server can

reuse the same socket over again for requests that are very close together from the same client

– Connection: close means that the connection is closed after each request

36

Content-Length Request Header

• This header is only applicable to POST requests

• It specifies the size of the POST data in bytes

37

Cookie Request Header

• Gives cookies previously sent to the client

• Not in the HTTP 1.1 specification, but is widely supported (originally, a Netscape extension)

38

Host Request Header

• Indicates host and port as given in the original URL– Required in HTTP 1.1

• Needed due to request forwarding and machines that have multiple hostnames

39

If-Modified-Since Request Header

• This header indicates that client wants the page only if it has been changed after the specified data

• If-Unmodified-Since is the reverse of If-Modified-Since– It is used for PUT requests (“update

this document only if nobody else has changed it since I generated it”)

40

The Format of the Date inIf-Modified-Since

and in If-Unmodified-Since

• Greenwich Mean Time should be used and the format is:

Last-Modified: Fri, 31 Dec 1999 23:59:59 GMT

41

Referer Request Header

• URL of referring Web page• Useful for tracking traffic• It is logged by many servers• Can be easily spoofed• Note the spelling error – correct

spelling is Referrer, but use Referer

42

User-Agent Request Header

• The value of this header is a string identifying the browser making the request

• Use sparingly• Again, can be easily spoofed