1 http – hypertext transfer protocol part 1. 2 common protocols in order for two remote machines...
Post on 20-Dec-2015
213 views
TRANSCRIPT
2
Common Protocols
• In order for two remote machines to “understand” each other they should – ‘‘speak the same language’’– coordinate their ‘‘talk’’
• The solution is to use protocols• Examples:
– FTP – File Transfer Protocol– SMTP – Simple Mail Transfer Protocol– NNTP – Network News Transfer Protocol– HTTP – HyperText Transfer Protocol
3
Why HTTP was Needed?
• According to Tim Berners-Lee (1991), a protocol was needed with the following features:– A subset of the file transfer protocol– The ability to request an index search– Automatic format negotiation– The ability to refer the client to
another server
4
File System
Proxy Server
Web Server
HTTPRequest
HTTPRequest
HTTP Response
HTTPResponse
www.cs.huji.ac.il:80
http://www.cs.huji.ac.il/~dbihttp://www.cs.huji.ac.il/~dbi
6
Terminology• User agent: client which initiates a
request (browser, editor, Web robot, …)• Origin server: the server on which a
given resource resides (Web server a.k.a. HTTP server)
• Proxy: acts as both a server and a client• Gateway: server which acts as
intermediary for other servers• Tunnel: acts as a blind relay between two
applications – we can implement a custom protocol using HTTP tunneling
7
Resources
• A resource is a chunk of information that can be identified by a URL (Universal Resource Locator)
• A resource can be– A file– A dynamically created page
• What we see on the browser can be a combination of some resources
8
Universal Resource Locator
• There are other types of URL’s– mailto:<account@site>– news:<newsgroup-name>
protocol://host:port/path#anchor?parameters
http://www.cs.huji.ac.il/~dbi/index.html#info
http://www.google.com/search?hl=en&q=blabla
protocol://host:port/path#anchor?parametersprotocol://host:port/path#anchor?parametersprotocol://host:port/path#anchor?parametersprotocol://host:port/path#anchor?parametersprotocol://host:port/path#anchor?parameters
9
In a URL
• Spaces are represented by “+”• Characters such as &,+,% are
encoded in the form “%xx” where xx is the ascii value in hexadecimal; For example, “&” = “%26”
• The inputs to the parameters are given as a list of pairs of a parameter and a value:var1=value1&var2=value2&var3=value3
12
An HTTP Session
• A basic HTTP session has four phases:1.Client opens the connection (a TCP
connection)2.Client makes a request3.Server sends a response4.Server closes the connection
13
Nesting in PageIndex.html
Left frame Right frame
Jumping fish Fairy icon HUJI icon
What we see on the browser can be a combination of several resources
What we see on the browser can be a combination of several resources
14
Nested Objects
• Suppose a client accesses a page containing 10 inline images, how many sessions will be required to display the page completely?
• The answer is 11 HTTP sessions – why? • Some browsers/servers support a feature
called keep-alive which can keep the connection open until it is explicitly closed
• How can this help?
15
Stateless Protocol
• HTTP is a stateless protocol, which means that once a server has delivered the requested data to a client, the server retains no memory of what has just taken place (even if the connection is keep-alive)
• What are the difficulties in working with a stateless protocol?
• How would you implement a site for buying some items?
• So why don’t we have states in HTTP?
16
The Format of HTTPRequests and Responses
• An initial line • Zero or more header lines • A blank line (i.e., a CRLF by itself), and • An optional message body (e.g., a file,
query data, or query output)
Note: CRLF = “\r\n” (usually ASCII 13 followed by ASCII 10)
17
Headers
• HTTP 1.0 defines 16 headers– None are required
• HTTP 1.1 defines 46 headers– One header (Host:) is required in
requests that are sent to Web servers– A request that is sent to a proxy does
not have to include any header– A response does not have to include
any header
How do we know who is the host when there is no host header?
19
The Format of a RequestThe Format of a Request
method sp URL sp versionheader
cr lf: value cr lf
header : value cr lfcr lf
Entity Body
headerslines
20
Request Example
GET /index.html HTTP/1.1 [CRLF]Accept: image/gif, image/jpeg [CRLF]User-Agent: Mozilla/4.0 [CRLF]Host: www.cs.huji.ac.il:80 [CRLF]Connection: Keep-Alive [CRLF][CRLF]
21
Request Example
GET /index.html HTTP/1.1Accept: image/gif, image/jpegUser-Agent: Mozilla/4.0Host: www.cs.huji.ac.il:80Connection: Keep-Alive[blank line here]
methodrequest URL
version
headers
23
Common Request Methods• GET returns the contents of the
indicated document• HEAD returns the header information
for the indicated document– Useful for finding out info about a
resource without retrieving it
• POST treats the document as an application and sends some data to it
24
More Request Methods
• PUT replaces the content of the document with some data
• DELETE deletes the indicated document• TRACE invokes a remote loop-back of
the request. The final recipient SHOULD reflect the message back to the client
• Usually these methods are not allowed
25
GET Request
• A request to get a resource from the Web
• The most frequently used method• The request has no message body,
but parameters can be sent in the request URL (i.e., the URL without the host part)
26
HEAD Request
• A HEAD request asks the server to return the response headers only, and not the actual resource (i.e., no message body)
• This is useful for checking characteristics of a resource without actually downloading it, thus saving bandwidth
• Used for testing hypertext links for validity, accessibility and recent modification
27
Post Request
• POST request can send data to the server
• POST is mostly used in form-filling– The data filled into the form are
translated by the browser into some special format and sent to a program on the server using the POST command
28
Post Request (cont.)
• There is a block of data sent with the request, in the message body
• There are usually extra headers to describe this message body, like Content-Type: and Content-Length:
• The request URL is a URL of a program to handle the sent data, not a file
• The HTTP response is normally the output of a program, not a static file
29
Post Example
• Here's a typical form submission, using POST:
POST /path/register.cgi HTTP/1.0
From: [email protected]
User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 35
home=Ross+109&favorite+flavor=flies
31
HTTP 1.1 Request Headers
• The common request headers of HTTP 1.1 are described in the following slides– Accept– Accept-Encoding– Authorization– Connection– Cookie– Host– If-Modified-Since– Referer– User-Agent
32
Accept Request Headers
• Accept– Specifies the MIME types that the client
can handle (e.g., text/html, image/gif)– Server can send different content to
different clients
• Accept-Encoding– Indicates encodings (e.g., gzip) client
can handle
34
Authorization Request Header
• Authorization– User identification for password-
protected pages– Instead of HTTP authorization, use
HTML forms to send username/password and store in state (e.g., session object )
35
Connection Request Header
• Connection– Connection: keep-alive means that the
browser can handle persistent connection– Keep-alive is the default in HTTP 1.1– In a persistent connection, the server can
reuse the same socket over again for requests that are very close together from the same client
– Connection: close means that the connection is closed after each request
36
Content-Length Request Header
• This header is only applicable to POST requests
• It specifies the size of the POST data in bytes
37
Cookie Request Header
• Gives cookies previously sent to the client
• Not in the HTTP 1.1 specification, but is widely supported (originally, a Netscape extension)
38
Host Request Header
• Indicates host and port as given in the original URL– Required in HTTP 1.1
• Needed due to request forwarding and machines that have multiple hostnames
39
If-Modified-Since Request Header
• This header indicates that client wants the page only if it has been changed after the specified data
• If-Unmodified-Since is the reverse of If-Modified-Since– It is used for PUT requests (“update
this document only if nobody else has changed it since I generated it”)
40
The Format of the Date inIf-Modified-Since
and in If-Unmodified-Since
• Greenwich Mean Time should be used and the format is:
Last-Modified: Fri, 31 Dec 1999 23:59:59 GMT
41
Referer Request Header
• URL of referring Web page• Useful for tracking traffic• It is logged by many servers• Can be easily spoofed• Note the spelling error – correct
spelling is Referrer, but use Referer