world wide web basics original version by carolyn watters (dalhousie u. computer science)

41
World Wide Web Basics Original version by Carolyn Watters (Dalhousie U. Computer Science)

Post on 19-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

World Wide Web

Basics

Original version by Carolyn Watters (Dalhousie U. Computer Science)

2

The Web…

• …is a distributed document delivery system that uses Internet protocols

• …links documents stored in computers communicating by the Internet

• Main authority is the W3 Consortiumwww.w3.org

3

Basic Definitions• Web server – machine that services

Internet request• Web client – machine that initiates

Internet request• Browser – software to interact with

Internet data at the web client• TCP/IP – internet data protocol• FTP – internet file transfer protocol• HTTP – hypertext transfer protocol• HTML – hypertext markup language

4

Servers and Clients

• Servers – computer systems at the end of a network that store files and provide other services

• Clients – computer systems that are end points for users of the data

5

Client-Server Model & WWW

• Cloud model

• TCP/IP

• HTTP and MIME types

• FTP

• Protocol stacks

6

Client-Server Model

7

Internet Model Layers

Application layerCommunication services (FTP, telnet, e-mail)

Transport layerTransmission of messages end-to-end

Network services layerTransmission of messages sequence of links

Data Link layerTransmission of packet across one link

Physical layerWhere the signals move

8

Internet Layer Model

Application layerhttp ftp smtp telnet rlogin

Transport layer TCP UDP

Network Services IP

Data Link layer LAN link

Physical layerPhysical

Connection

9

Application Layer

• FTP

• HTTP

• SMTP

• Telnet

• Etc.

10

TCP/IP

• Suite of protocols made the standard for the Internet

• facilitates communication between heterogeneous and similar networks that are connected together

• reliable, connection oriented, byte stream protocol

11

Transport layer: TCP & UDP

TCP– transmission control

protocol– full duplex byte stream– virtual path (connected)– error free– uses acknowledgements– 16 bit address of ports

UDP– user datagram protocol– connectionless– no acknowledgements– no flow control– no resending of

erroneous packets– some error detection– 16 bit port addresses

12

Data Flow and Headers

13

TCP and IP

14

Network Layer: IP

• Delivers packets up to 64 Kb, 1 at a time• Each packet has a header

– sending host and intended host network addresses

– 32 bit addresses

• IP layer (like UDP)– unreliable– connectionless

15

Data Encapsulation

16

TCP/IP apps

TCP/IP software usually includes:– remote terminal client using TELNET

protocol for remote login– electronic mail client using SMTP protocol

to transfer e-mail to remote system – file transfer client using FTP protocol to

transfer files between 2 machines

17

HTTPHyperText Transport Protocol• Native protocol for WWW

• Sits on top of internet’s TCP/IP protocol

• HTTP is a 4 step process per transaction

• Uses a predefined set of document formats from MIME

18

MIME

Multipurpose Internet Mail Extensions– defines file formats (images, video, text, etc)– e.g. Content-type: text/html– Data type/subtype

» text/html» text/plain» image/gif» video/mpeg» application/msword » etc!

19

HTTP Connection• 1. Client

– Makes an HTTP request for a web page– Makes a TCP/IP connection

• 2. Server accepts request– Sends page as HTTP

• 3. Client downloads page

• 4. Server breaks the connection

20

HTTP is Stateless!

• Each operation or transaction makes a new connection

• each operation is unaware of any other connection

• each click is a new connection

• So how do they do those shopping carts?

21

What does it look like?

• Header + object file• Header

– plain text– info about the object (MIME, etc.)– methods allowed– etc.– browser sends a header to server each time you

ask for information– server sends a header and possibly content

22

HTTP Transaction Example

GET /catalog/ip/ip.htm HTTP 1.0

Accept: text/plain

Accept: text/html

Referer: http://www.june.com/catalog.html

User-Agent: Mozilla/2.0 CRLF

23

HTTP REQUEST PROTOCOL

Request = Simple | FullSimple = GET <URI> CRLFFull = Method URI ProtVersion CRLF

[<HTRQ Header>*] [CRLF <data>] Method = GET | POST | HEAD | ….<HTRQ Header> = <Fieldname>:<Value>CRLF<data> = MIME conforming message

w.w3.org/Protocols/HTTP/

24

HTTP Header fields

• General-header fields– used for both requests and responses

• Request-header fields– used for responses– extra client information for use by server– optional

25

General-header fields

• Date: Mon,11, Jan 1999 08:14:32 GMT

• MIME-version: 1.0

• Pragma: no cache– directives

26

Request-header fields

• acceptable MIME types for response – Accept:text/html– Accept:*/*

• 401 response from client– Authorization: Basic abcdef (uuencoded

username and password)

• From:client-email-addr

27

More Request-header fields

• If-Modified-Since:date– conditional get

• source of current requested URL– Referer:URL

• robot/browser identification– User-Agent:Mozilla/2.0

28

Examining HTTP Header Values

• In perl– $ENV{"From"}

• In Netscape– www.cs.dal.ca/~jamie/cgi-bin/4173/about/env.c

gi

29

HTTP Methods

• Client requests either– simple request– full request

Request-line= method Request-URI HTTP-version CRLF

GET /catalog/ip.html HTTP/1.0

30

Simple requests

• Only for HTTP 0.9

• only uses Get method

• causes the server to locate and transfer the object specified

• client responsible for handling the object

GET <uri> CRLF

31

Full Request

• Uses HTTP version and more methods

• method tells server what to do to the resource requested

• Methods– GET– POST– HEAD

32

GET Method

• Request server to retrieve object specified

• conditional GET– request message includes– If-Modified-Since in header

33

HEAD Method

• Like GET but does not return the object

• returns a header about the resource requested (meta information)

• good way to test link validity

34

POST Method

• Include an object in the request

• server should use that object in processing the request

• must include a Content-Length in header

35

HTTP Response Message

• HTTP protocol version

• 3 digit status code

• reason phrase• CRLF• optional header fields• CRLF

36

HTTP Response Header Fields• Additional information about the server• such as:

– LOCATION: exact URI address– SERVER: server software (CERN/3.0)– WWW-AUTHENTICATE:

• status 401 responses (unauthorized request)• server challenges client• client may use to send authorization info to

server

37

Understanding STATUS Codes• 1xx – for information only

• 2xx – action successful

• 3xx – further action needed (redirect)

• 4xx – client request error

• 5xx – server error

38

HTTP Transaction

1. Client and server establish a connection

2. Client makes a request

3. Server makes a response

4. Server terminates connection

39

• Step 1 establish connection– TCP/IP connection set up– uses a port number as application reference– usually port 80– ports ≤ 1024 are privileged (>1024 are open)

• Step 2 client request– HTTP message sent with a request line– request-line = method URL HTTP version

40

• Step 3 Server response– server sends HTTP message and

optionally requested data– resp-message = HTTP version status code

reason-phrase [optional stuff]

• Step 4 connection terminated– usually the server– sometimes the client “stops” it– anything else, whoever notices terminates

41

Some Port Assignments

• 21 FTP

• 23 Telnet

• 25 smtp (mail)

• 70 gopher

• 79 finger

• 80 HTTP