5/3/05cs118/spring051 plan ahead 5th week: r congestion control, tcp delay modeling r network...

51
5/3/05 CS118/Spring05 1 Plan Ahead 5th week: Congestion control, TCP delay modeling Network protocols: IPv4, IPv6 6th week: network routing, routing in the Internet 7th week: Midterm Broadcast and multicast routing Before final: Data link layer, Ethernet, switches, wireless networking

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

5/3/05 CS118/Spring051

Plan Ahead

5th week: Congestion control, TCP delay modeling Network protocols: IPv4, IPv66th week: network routing, routing in the Internet7th week: Midterm Broadcast and multicast routingBefore final: Data link layer, Ethernet, switches, wireless networking

5/3/05 CS118/Spring052

Congestion ControlCongestion: “too many

sources sending too much data too fast for network to handle”

Scenario 1 2 identical senders, 2 receivers, one router w/infinite buffer, no retransmission when congested:

large delays;maximum achievable throughput

5/3/05 CS118/Spring053

Congestion: scenario 2 one router, finite buffers; senders retransmit when timeout

R/2

R/2in

ou

t

in

out

=

retransmission of delayed (not lost) packet makes much larger than

R/2

R/2in

ou

t

R/4

'in

out

R/2

R/2in

ou

t

Data losses leads to

'in >

out

5/3/05 CS118/Spring054

• Long delays• superfluous retransmissions• when a packet is dropped, any “upstream transmission capacity” used for that packet was wasted!

finite shared output link buffers

Host Ain : original data

Host B

out

'in : original data, plus retransmitted data

Congestion: scenario 3Q: what happens as and increase?

in

'in

5/3/05 CS118/Spring055

Approaches towards congestion controlNetwork-assisted congestion control: routers provide

feedback to end hosts Single bit congestion indication Explicit rate sender should send at

End-end congestion control: no explicit feedback from network

congestion inferred from end-system observed loss, delay approach taken by TCP

5/3/05 CS118/Spring056

TCP Congestion Control Add a “congestion control window” congwin on top of flow-control

window

Sender limits LastByteSent-LastByteAcked CongWin How to adjust CongWin

CongWin initialized to 1 mss, increase quickly until loss (= congestion) Upon loss: decrease congwin, then begin probing (increasing) again two “phases”: (1)slow start, (2)congestion avoidance

• threshold defines the boundary between the two How the sender infers congestion: Timeout, or 3 duplicate ACKs

Congwin

recvwin

5/3/05 CS118/Spring057

Basic idea: learn from observations

when congwin < threshold, increase congwin exponentially

when congwin ≥ threshold, increase congwin linearly

if packet lost, have gone too far threshold = congwin / 2 If 3 dup. ACKs: network capable of delivering some

packets, congwin cut in half If timeout: slow-start again (congwin = 1 mss)

Additive Increase, Multiplicative Decrease (AIMD)

5/3/05 CS118/Spring058

TCP SlowStart & Congestion Avoidanceinitialize:

Congwin = 1threshold = RcvWindow

if (CongWin < threshold){ for every segment ACKed Congwin++} until (loss event)

/* slowstart is over */ { for every w segments ACKed: Congwin++} Until (loss event)

/* loss detected */threshold = Congwin/2If (3 dup. ACKs) Congwin = thresholdElse Congwin = 1 mss

one segment

RTT

time

two segments

four segments

5/3/05 CS118/Spring059

TCP sender congestion controlState Event TCP Sender Action Commentary

Slow Start (SS)

Received ACK for previously unacked data

CongWin = CongWin + MSS, If (CongWin > Threshold) set state to “Congestion Avoidance”

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

Received ACK for previously unacked data

CongWin = CongWin+MSS * (MSS/CongWin)

Additive increase, resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by 3 duplicate ACK

Threshold = CongWin/2, CongWin = Threshold,Set state to “Congestion Avoidance”

Fast recovery, implementing multiplicative decrease. CongWin will not drop below 1 MSS.

SS or CA Timeout Threshold = CongWin/2, CongWin = 1 MSS,Set state to “Slow Start”

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

5/3/05 CS118/Spring0510

Is TCP fair?Fairness: if N TCP sessions share same bottleneck link, each should

get 1/N of link capacityExample: 2 competing connections, same RTT Additive increase gives slope of 1 multiplicative decrease decreases throughput proportionally

capacity R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 t h

rou g

h pu t

congestion avoidance: additive increaseloss: decrease window by factor of 2

congestion avoidance: additive increaseloss: decrease window by factor of 2

TCP connection 1

bottleneckrouter

TCP conn 2

5/3/05 CS118/Spring0511

Fairness (more)

Fairness and UDP Multimedia apps often do

not use TCP do not want rate throttled

by congestion control

Instead use UDP: pump audio/video at

constant rate, tolerate packet loss

Research area: TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel cnctions between 2 hosts.

Web browsers do this Example: link of rate R

supporting 9 cnctions; new app asks for 1 TCP, gets

rate R/10 new app asks for 11 TCPs, gets

R/2 !

5/3/05 CS118/Spring0512

Delay modeling

Q: How long does it take to receive an object from a Web server after sending a request?

Ignoring congestion, delay is influenced by:

TCP connection establishment data transmission delay slow start

Assumptions: Assume one link between client

and server of rate R no retransmissions (no loss, no

corruption)

Window size: First assume: fixed congestion

window, W segments Then dynamic window,

modeling slow start

5/3/05 CS118/Spring0513

Fixed congestion window (1)

First case: WS/R > S/R+RTT

ACK for first segment in window returns before window’s worth of data sent

delay = 2RTT + O/R

Notations:S: #bits in one segmentO: #bits in one objectR: bandwidthW: window size (# segments)K: O/WSQ: # times server idles if O=∞P = min(Q, K-1)

5/3/05 CS118/Spring0514

Fixed congestion window (2)

Second case: WS/R < RTT + S/R:

wait for ACK after sending window’s worth of data sent

delay = 2RTT + O/R + (K-1)[S/R + RTT - WS/R]Server's waiting time

5/3/05 CS118/Spring0515

RTT

initiate TCPconnection

requestobject

first window= S/R

second window= 2S/R

third window= 4S/R

fourth window= 8S/R

completetransmissionobject

delivered

time atclient

time atserver

TCP Delay Modeling: Slow Start (1)

Example:• O/S = 15 segments• K = 4 windows• Q = 2• P = min{K-1,Q} = 2

Server idles P=2 times

Delay components:• 2 RTTs for connection establish and request• O/R to transmit object• Server's idle time

Server idles: P = min{K-1,Q} times

5/3/05 CS118/Spring0516

TCP Delay Modeling: Slow Start (2)

Now suppose window grows according to slow start

The delay for one object is:

R

S

R

SRTTPRTT

R

O

R

SRTT

R

SRTT

R

O

idleTimeRTTR

O

P

kP

k

P

pp

)12(][2

]2[2

2delay

1

1

1

−−+++=

−+++=

++=

=

=

5/3/05 CS118/Spring0517

HTTP Modeling Assume Web page consists of:

1 base HTML page (of size O bits) M images (each of size O bits)

Non-persistent HTTP: M+1 TCP connections in series Response time = (M+1)O/R + (M+1)2RTT + sum of idle times

Persistent HTTP: 2 RTT to request and receive base HTML file 1 RTT to request and receive M images Response time = (M+1)O/R + 3RTT + sum of idle times

Non-persistent HTTP with X parallel connections Suppose M/X integer. 1 TCP connection for base file M/X sets of parallel connections for images. Response time = (M+1)O/R + (M/X + 1)2RTT + sum of idle times

5/3/05 CS118/Spring0518

02468

101214161820

28Kbps

100Kbps

1Mbps

10Mbps

non-persistent

persistent

parallel non-persistent

RTT = 100 msec, O = 5 Kbytes, M=10 and X=5

For low bandwidth, connection & response time dominated by transmission time.Persistent connections only give minor improvement over parallel connections.

HTTP Response time (in seconds)

5/3/05 CS118/Spring0519

0

10

20

30

40

50

60

70

28Kbps

100Kbps

1Mbps

10Mbps

non-persistent

persistent

parallel non-persistent

HTTP Response time (in seconds)

RTT =1 sec, O = 5 Kbytes, M=10 and X=5

For larger RTT, response time dominated by TCP establishment & slow start delays. Persistent connections now give important improvement: particularly in high delaybandwidth networks.

5/3/05 CS118/Spring0520

Network layer transport segment from sending to receiving host Source host: encapsulates segments into packets Destination host: delivers segments to transport layer network layer protocols in every host and router Each router examines header fields in all packets passing

through it Routing: calculate the best path to each destination Forwarding: move packets from input to output

segment

Network protocol header

S Dsegment

To transport protocol

R R R

5/3/05 CS118/Spring0521

Makeup lectures on Monday June 6

There will be no class on Thursday June 9To make it up:8-9:50am Boelter 5422, or6-7:50pm Boelter 5419

Pick the lesser evil one Additional office hours on the final exam day:

Saturday June 11: 10:00AM - 1:00PM

And the Final exam is: 3:00 - 6:00PM

5/3/05 CS118/Spring0522

Always keep the big picture in mind

HTTP

TCP

IP

Ethernetinterface

HTTP

TCP

IP

Ethernetinterface

IP IP

Ethernetinterface

Ethernetinterface

SONETinterface

SONETinterface

host host

router router

HTTP message

TCP segment

IP packet IP packetIP packet

5/3/05 CS118/Spring0523

Network layer: Connection vs. connection-less service

Virtual Circuit network provides connection-oriented service source-to-dest path works in a way much like telephone circuit

Datagram network provides connectionless service The two services analogous to TCP vs. UDP at

transport-layer, but: Network delivery service: host-to-host No choice: a given network provides one or the other but not

both (as in transport layer)

5/3/05 CS118/Spring0524

Virtual circuit Network Use a signaling protocol to setup connection before data can flow every router on source-dest path maintains “state” for each passing

connection link, router resources (bandwidth, buffers) allocated to each VC each packet carries VC identifier (not destination host address) VC number must be changed on each link.

New VC number comes from forwarding table

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

1. Initiate call 2. incoming call

3. Accept call4. Call connected

5. Data flow begins 6. Receive data

5/3/05 CS118/Spring0525

Forwarding table

12 22 32

13

2

VC number

interfacenumber

Incoming interface Incoming VC # Outgoing interface Outgoing VC #

1 12 2 222 63 1 18 3 7 2 171 97 3 87… … … …

Forwarding table innorthwest router:

Routers maintain connection state information!

5/3/05 CS118/Spring0526

R1

ETH FDDI

IPIP

ETH

R2

FDDI WLAN

IPR3

WLAN ETH

IP

H1

IP

ETH

H8

Internet: A Datagram Network

hosts are connected to subnets subnets are interconnected by IP routers All hosts and routers speak IP

routers also “speak” many different link layer protocols IP provides two basic functions

globally unique address for all connected points Best effort datagram delivery from source to destination hosts

• Fragmentation/reassembly of packets whenever needed

5/3/05 CS118/Spring0527

The Internet Network layer

forwardingtable

Host, router network layer functions:

Routing protocols•RIP, OSPF, BGP …

IP protocol•addressing conventions•datagram format•packet handling conventions

ICMP protocol•error reporting•router “signaling”

Transport layer: TCP, UDP

Link layer

physical layer

Networklayer

Router function

5/3/05 CS118/Spring0528

IP datagram format

ver Total length

32 bits

data (variable length,typically a TCP

or UDP segment)

16-bit identifier

IP headerchecksum

time tolive

source IP address

IP version number

header length

upper layer protocolto deliver payload to

head.len

type ofservice

flgsfragment

offset

protocolmax number

of remaining hops

destination IP address

Options (if any)

3 fields used for packetfragmentation/reassembly

basic h

eader

E.g. timestamp,route recording,Specify list of routers to visit.

how much overhead for a TCP segment?

20 bytes of TCP 20 bytes of IP = 40 bytes

5/3/05 CS118/Spring0529

IP Address structure

IP address space: 2-level hierarchy

What’s a network ? (from IP address perspective) device interfaces with same network

part of IP address can physically reach each other without

going thru a router

173.1.1.1

173.1.1.2

173.1.1.3

173.1.1.4 173.1.2.9

173.1.2.2

173.1.2.1

173.1.3.2173.1.3.1

173.1.3.27

LAN

173.1.1.1 = 10101101 00000001 00000001 00000001

173 1 11

•32-bits, uniquely identifies a host or router interface –interface: connection between host/router and physical link

Network-ID host-ID 4 byte

5/3/05 CS118/Spring0530

IP Address: how many bits for net-ID Original IP design: class-based address

Two changes added over the last 25 years Subnetting: add a hidden level to address hierarchy

• An organization gets one address block, then split the host part into two parts: subnet and host parts

CIDR: Classless InterDomain Routing (today)• network portion of address of arbitrary length

0network host

10 network host

110 network host

1110 multicast address

A

B

C

D

1.0.0.0 to127.255.255.255

128.0.0.0 to191.255.255.255

192.0.0.0 to223.255.255.255

224.0.0.0 to239.255.255.255

Network ID Host ID

5/3/05 CS118/Spring0531

Classless InterDomain Routing address format: a.b.c.d/x, x # bits in network portion

Internet Service Providers get blocks of IP addresses from the Internet address authority

Internet customers get portion of their ISP’s addr. blockISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20

Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23

Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23

Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23 ... ….. …. ….

Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23

11001000 00010111 00010000 00000000

networkpart

hostpart200.23.16.0/23

5/3/05 CS118/Spring0532

Hierarchical addressing: route aggregation

“Send me anythingwith addresses beginning 200.23.16.0/20”

200.23.16.0/23

200.23.18.0/23

200.23.30.0/23

Fly-By-Night-ISP

Organization 0

Organization 7Internet

Organization 1

ISPs-R-Us“Send me anythingwith addresses beginning 199.31.0.0/16”

200.23.20.0/23Organization 2

...

...

Hierarchical addressing allows efficient advertisement of routing information:

5/3/05 CS118/Spring0533

200.23.16.0/23

200.23.18.0/23

200.23.30.0/23

Fly-By-Night-ISP

Organization 0

Organization 7Internet

Organization 1

ISPs-R-Us“Send me anything with addresses beginning 199.31.0.0/16”

200.23.20.0/23Organization 2

...

...

“Send me anything with addresses beginning 200.23.16.0/20”

Multi-homing

Hierarchical addressing: route aggregation Route aggregation helps reduce routing table size multi-homing defeats address aggregation

ISPs-R-Us has a more specific route to Org. 7

“Send me anything with addresses beginning 199.31.0.0/16, or200.23.30.0/23 ”

5/3/05 CS118/Spring0534

IP Subnetsubnet mask: indicates the portion of the address that is

considered as “network ID” by the local sitesubnet mask does not need to align with a byte boundary

Each host must be configured with both an IP address and a subnet mask

subnets are invisible outside of the local sitebackbone routers only know how to forward packets to the

networkIDWithin the organization, routers store: [subnet, mask, next hop]

Subnet advantages: aggregate local info., keep backbone routers table size small

Network ID Host ID11111111111111111111110000000000

Viewed from inside10-bit host ID

Viewed from outside

5/3/05 CS118/Spring0535

An example

BA

Network# next-hop

131.179 B

Look up IP addr.131.179.96.15

C

Network# mask next-hop

131.179.96 255.255.255.0 C …… ………..

131.179.96.15

Global Internet

UCLA CS

131.179.96.0

111111111111111111111111 00000000

131 . 179 . 96 15 subnetted address

subnet mask(255.255.255.0)

Network ID host IDa class-B address

5/3/05 CS118/Spring0536

Getting an IP packet from source to dest.

173.1.1.1

173.1.1.2

173.1.1.3

173.1.1.4 173.1.2.9

173.1.2.2

173.1.2.1

173.1.3.2173.1.3.1

173.1.3.27

A

BE

Source host A destination B: Host A: [A’s addr & subnet mask] ═ [B’s addr & subnet mask] ?

yes: B is on the same net, use link layer to send pkt directly to B

Source host Adestination E: [A’s addr & subnet mask] =

[E’s addr & subnet mask] ? yes No: Send pkt to default router

173.1.1.4

Router: Is E on any of my directly connect subnets?

Yes: send pkt directly to E No: forward to another router

according to routing table

5/3/05 CS118/Spring0537

IP Fragmentation & Reassembly Different subnets have different

MTUs (Maximum Transmission Unit)

Sender host always uses its max MTU size

Routers “fragment” IP packets if the next link has a smaller MTU chop packets to the MTU size of next

link further fragmentation down the path

possible

packet reassembled at dest. host

reassembly

H1

H2

R3

R2

R1

1300B

MTU=532B

512B

276B

MTU=1500B

H1 sending an IP packet of 1300 byte data to H2:

5/3/05 CS118/Spring0538

IP Fragmentation: An example

reassembly

H1

H2

R3

R2 1300B

512B

276B

data (1300 bytes)

rest of the IP header

4 5 TOS 13207394 0 0 0 0

data (512 bytes)

rest of the IP header

4 5 TOS 5327394 0 0 1 0

data (512 bytes)

rest of the IP header

4 5 TOS 5327394 0 0 1 64

data (276 bytes)

rest of the IP header

4 5 TOS 2967394 0 0 0 128

At destination:- identifier: tell all pieces in the same packet- the last fragment: MF=0- the offsets tell whether there are holes missing in the middle

MTU=532B

5/3/05 CS118/Spring0539

ICMP: Internet Control Message Protocol used by hosts & routers to

communicate network-level information error reporting: unreachable

host, network, port, protocol echo request/reply

ICMP msgs carried in IP packets

ICMP message format

Type Code description0 0 echo reply (ping)3 0 dest. network unreachable3 1 dest host unreachable3 2 dest protocol unreachable3 3 dest port unreachable3 6 dest network unknown3 7 dest host unknown4 0 source quench (congestion control - not used)8 0 echo request (ping)9 0 route advertisement10 0 router discovery11 0 TTL expired12 0 bad IP headertype code checksum

unused (or used by certain ICMP types)

IP header and first 64bits of dataOr

data (according to ICMP types)

IP header

5/3/05 CS118/Spring0542

NAT: Network Address Translation

10.0.0.1

10.0.0.2

10.0.0.3

10.0.0.4

138.76.29.7

local network(e.g., home network)

10.0.0/24

rest ofInternet

Datagrams with source or destination in this networkhave 10.0.0/24 address for

source, destination (as usual)

All datagrams leaving localnetwork have same single source

NAT IP address: 138.76.29.7,different source port numbers

5/3/05 CS118/Spring0543

NAT: Network Address Translation

10.0.0.1

10.0.0.2

10.0.0.3

10.0.0.4

138.76.29.7

local network(e.g., home network)

10.0.0/24

rest ofInternet

Datagrams with source or destination in this networkhave 10.0.0/24 address for

source, destination (as usual)

All datagrams leaving localnetwork have same single source

NAT IP address: 138.76.29.7,different source port numbers

5/3/05 CS118/Spring0544

NAT: Network Address Translation

10.0.0.1

10.0.0.2

10.0.0.3

S: 10.0.0.1, 3345D: 128.119.40.186, 80

1

10.0.0.4

138.76.29.7

1: host 10.0.0.1 sends datagram to 128.119.40, 80

NAT translation tableWAN side addr LAN side addr

138.76.29.7, 5001 10.0.0.1, 3345…… ……

S: 128.119.40.186, 80 D: 10.0.0.1, 3345

4

S: 138.76.29.7, 5001D: 128.119.40.186, 80

2

2: NAT routerchanges datagramsource addr from10.0.0.1, 3345 to138.76.29.7, 5001,updates table

S: 128.119.40.186, 80 D: 138.76.29.7, 5001

3

3: Reply arrives dest. address: 138.76.29.7, 5001

4: NAT routerchanges datagramdest addr from138.76.29.7, 5001 to 10.0.0.1, 3345

5/3/05 CS118/Spring0545

NAT implementation NAT router must do the following:

outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #)

• . . . remote clients/servers will respond using (NAT IP address, new port #) as destination addr.

remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair

incoming datagrams: replace (NAT IP address, new port #) in destination fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table

Problems due to NAT Increased network complexity, reduced robustness Cannot run services from inside a NAT box

address shortage should instead be solved by IPv6

5/3/05 CS118/Spring0546

IPv6Motivation: 32-bit address space exhaustionTake the opportunity for some clean-upIPv6 datagram format:

Address length changed from 32 bits to 128 bits fragmentation fields moved out of base header IP options moved out of base header

• Header Length field eliminatedHeader Checksum eliminatedType of Service field eliminatedTime to Live Hop Limit, Protocol Next HeaderPrecedence Priority, added Flow Label fieldLength field excludes IPv6 header

5/3/05 CS118/Spring0547

IPv6 header format

Destination Address (16 bytes)

Version Priority Flow Label

Payload Length Next Header Hop Limit

Source Address (16 bytes, 128 bits)

Version Hdr Len Total Length

Identification Fragment Offset

Prec TOS

Time to Live Protocol Header Checksum

Flags

Source Address

Destination Address

PaddingOptions

 32 bits 

IPv4 header

5/3/05 CS118/Spring0548

Changes from IPv4

Priority: identify priority among datagrams in flow Flow Label: identify datagrams in same “flow” (concept

of“flow” not well defined). Next header: identify upper layer protocol for data Options: allowed, but outside of the basic header,

indicated by “Next Header” field Checksum: removed entirely to reduce processing time

at each hop ICMPv6: new version of ICMP

additional message types, e.g. “Packet Too Big” multicast group management functions

5/3/05 CS118/Spring0549

Transition From IPv4 To IPv6 Not all routers can be upgraded simultaneous to allow the Internet operate with mixed IPv4 and IPv6

routers : tunnelingA B E F

IPv6 IPv6 IPv6 IPv6

tunnelLogical view:

Physical view:A B E F

IPv6 IPv6 IPv6 IPv6

C D

IPv4 IPv4

Flow: XSrc: ADest: F

data

Flow: XSrc: ADest: F

data

Flow: XSrc: ADest: F

data

Src:BDest: E

Flow: XSrc: ADest: F

data

Src:BDest: E

A-to-B:IPv6

E-to-F:IPv6

B-to-C:IPv6 inside

IPv4

B-to-C:IPv6 inside

IPv4

5/3/05 CS118/Spring0550

1

23

0111

value in arrivingpacket’s header

routing algorithm

local forwarding tableheader value output link

0100010101111001

3221

Interplay between routing and forwarding

5/3/05 CS118/Spring0551

Router Architecture Overview

Two key router functions:

run routing algorithms/protocol (RIP, OSPF, BGP) forwarding datagrams from incoming to outgoing link

5/3/05 CS118/Spring0552

Input Port Functions

Decentralized switching: given datagram dest., lookup output port using

forwarding table in input port memory goal: complete input port processing at ‘line

speed’ queuing: if datagrams arrive faster than

forwarding rate into switch fabric

Physical layer:bit-level reception

Data link layer:e.g., Ethernetsee chapter 5

5/3/05 CS118/Spring0553

Output Ports

Buffering required when datagrams arrive from fabric faster than the transmission rate

Scheduling discipline chooses among queued datagrams for transmission