Content Delivery: HTTP, Web Caching, and Content Distribu;on Networks
3035/GZ01 Networked Systems Kyle Jamieson Lecture 17
Department of Computer Science
University College London
Outline
• Background and HTTP – Cookies: client-‐side state – LimitaFons of HTTP/1.0 – HTTP range requests
• InteracFon between applicaFons and TCP • Web caching • Content distribuFon networks
HTTP: the Hypertext Transfer Protocol • The web’s applicaFon layer protocol
• Client/server model over TCP – Client: browser that requests, receives, web objects – Server: web server responds to requests with data
• Accepts connecFons on TCP port 80
• HTTP itself is “stateless:” the protocol maintains no informaFon about past client requests – Design goal: don’t burden web servers with state maintenance – Cookies added later to carry state client-‐side (scalable)
• Metadata plays a key role in design and operaFon
An HTTP/1.0 exchange
Request: GET /index.html HTTP/1.0\n\n!Response: HTTP/1.0 200 OK!MIME-Version: 1.0!Server: CERN/3.0!Date: Thu, 03 Dec 2009 16:15:53 GMT!Content-Type: text/html!Content-Length: 22631!Last-Modified: Wed, 11 Nov 2009 09:10:45 GMT!!<p class="contact_new">Computer Science Department University College London - Gower Street - London - WC1E 6BT - <img src=“/images/phone.gif” width="13" height="9" id="phone” alt="Telephone:”/> +44 (0)20 7679 7214 - Copyright © 1999-2007 UCL</p>
HTTP Header
HTTP body
Cookies: Client-‐side state
• Recall: the HTTP protocol is stateless
• However: state is useful for session authenFcaFon, online sessions (online shopping, banking, etc.)
• Idea: client keeps small pieces of state (cookies) – Received from server in responses, sent to server in requests – At least two possibiliFes:
• All state may be encoded into cookie stored at client • Cookie indexes state in a database on the server (online shopping)
• Privacy issue: o]en user is unaware of cookies e.g.: search engine returns link http://ad.doubleclick.net/x26362d2/
www.foo.com/foo.html!
Cookies allow content personaliza;on Client Server
ucl.ac.uk HTTP GET
HTTP/1.0 200 OK; Set-‐cookie: 1678
HTTP GET; Cookie: 1678
HTTP/1.0 200 OK
HTTP GET; Cookie: 1678
HTTP/1.0 200 OK
One week later
cookies.txt
ucl.ac.uk: 1678 Backend database
Create id 1678
Lookup id 1678
Lookup id 1678
Problems with HTTP/1.0
1. Poor use of TCP, especially for short responses – Recall: three-‐way handshake to open TCP connecFon, four packets to close TCP connecFon • HTTP message o]en fits in ≈10 packets → overhead • Don’t get past slow-‐start, so don’t use available bandwidth
– For each embedded image: open a new TCP connecFon and send another HTTP GET
2. Lack of support for web caching 3. Lack of bandwidth opFmizaFon – Range requests – Data compression
HTTP/1.1 bandwidth op;miza;on
• Data compression – Image files pre-‐compressed, but much other content is not – [1997]: compression would save 40% of bytes sent via HTTP – Client uses Content-‐Encoding header in HTTP GET to signal which
encodings it can decompress, and which it prefers
• Range requests – Problem: client wants to resume an incomplete download, view
certain pages of a long document – Need to retrieve just a small secFon of a resource
GET bigfile.html HTTP/1.1!Range: 2000-3999!
HTTP/1.1 206 Partial Content!Date: Thu, 10 Feb 2006 20:02:06 GMT!Last-Modified: Wed, 9 Feb 2006 14:58:08 GMT!Content-Range: bytes 2000-3999/100000!Content-Length: 2000!Content-Type: text/html!
Outline
• Background and HTTP • Interac;on between applica;ons and TCP – ConnecFon establishment and slow start – Web transfers, HTTP/1.1 persistent connecFons – InteracFve applicaFons: Nagle’s algorithm
• Web caching • Content distribuFon networks
TCP connec;on establishment delay
• Recall TCP retransmit Fmeout: RTO = RTT + 4*variance – IniFal RTO = 3 seconds – Double RTO on Fmeout
• Avoids unnecessary SYN retransmissions if RTT is large
• Frustrated users hit stop buron (terminate TCP connecFons) and reload buron
• Ongoing issue: Lowering iniFal RTO is a topic of acFve discussion in the IETF (May ‘11)
SYN
SYN
SYN
3 s
SYN-‐ACK
6 s
Effect of lowering ini;al RTO • Today on the Internet – ≈ 2% TCP flows lose first SYN • Current proposal: Retransmit SYN a]er 1 sec • Lower penalty for SYN loss from 3 secs to one sec
– ≈ 2% TCP flows’ RTT > 1 sec • Why? 802.11 MAC layer, slow dialup line, slow VPN connecFon • Spurious SYN retransmission
• Tradeoff between spurious SYN retransmission and user wait
SYN SYN 1 s
SYN-‐ACK
SYN 1 s
SYN-‐ACK SYN
SYN-‐ACK
Delay in the middle of a web transfer
• Now, the TCP connecFon is open and in slow start • Recall TCP slow start: increase sender cwnd by a packet
size for each received acknowledgement – Short HTTP transfers spend most of their Fme in slow start
• Small cwnd means that a]er a loss, receiver unlikely to generate three dup ACKs – Retransmission Fmeouts are more likely
2 CONSERVATIONAT EQUILIBRIUM: ROUND-TRIP TIMING 6
Figure 4: Startup behavior of TCP with Slow-start
• •• ••• ••••• •••••••• ••••••
••••• ••••••••••••••••• ••• ••••••
••••••••••••••••••• ••••••
••••••••••••
••••••••••••
•• •••• ••••••••••••
••••••••••••••• • ••••••
•••• ••••••••••••••••••
••••••••••••
•• ••••••••••••
••••••••••••
•••••••• ••••••
••••••••••••
••••••••••••
•• ••••••••••••
••••••••••••
••••••• ••• ••••••
• ••••••••••••
••••••••••
Send Time (sec)
Pack
et S
eque
nce
Num
ber (
KB)
0 2 4 6 8 10
020
4060
8010
012
014
016
0
Same conditions as the previous figure (same time of day, same Suns, same network path,same buffer and window sizes), except the machines were running the 4.3+TCP with slow-start. No bandwidth is wasted on retransmits but two seconds is spent on the slow-startso the effective bandwidth of this part of the trace is 16 KBps — two times better thanfigure 3. (This is slightly misleading: Unlike the previous figure, the slope of the trace is20 KBps and the effect of the 2 second offset decreases as the trace lengthens. E.g., if thistrace had run a minute, the effective bandwidth would have been 19 KBps. The effectivebandwidth without slow-start stays at 7 KBps no matter how long the trace.)
important feature of any protocol implementation that expects to survive heavy load. Andit is frequently botched ([26] and [13] describe typical problems).
One mistake is not estimating the variation, σR, of the round trip time, R. From queuingtheory we know that R and the variation in R increase quickly with load. If the load is ρ(the ratio of average arrival rate to average departure rate), R and σR scale like (1!ρ)!1.To make this concrete, if the network is running at 75% of capacity, as the Arpanet was inlast April’s collapse, one should expect round-trip-time to vary by a factor of sixteen (!2σto +2σ).
The TCP protocol specification[2] suggests estimating mean round trip time via the low-pass filter
R" αR+(1!α)M
where R is the average RTT estimate, M is a round trip time measurement from the mostrecently acked data packet, and α is a filter gain constant with a suggested value of 0.9.Once the R estimate is updated, the retransmit timeout interval, rto, for the next packet sentis set to βR.
Time (seconds)
Sender sequence number (bytes)
10 packets
An HTTP/1.0 page fetch (Obsolete)
• Fetch an 8.5 Kbyte page with 10 embedded objects, most < 10 Kbyte • One HTTP transacFon/fetch, open and close a new TCP connecFon each Fme • Stay in slow start, except for the large object
Bytes received
Time (milliseconds)
Improvement: Persistent HTTP connec;ons
• Keep TCP connecFon open for many HTTP transacFons – Reduce TCP connecFon setup and teardown overhead – Remove slow start performance problems (cwnd grows)
• Keep-‐Alive mechanism in HTTP/1.0 (a persistent TCP connecFon is the default in HTTP/1.1) GET /index.html HTTP/1.0!Connection: Keep-Alive!!reply from server!GET image1.jpg HTTP/1.0!Connection: Keep-Alive!!reply from server!GET image2.jpg HTTP/1.0!Connection: Keep-Alive!
GET /index.html HTTP/1.1!!reply from server!GET image1.jpg HTTP/1.1!!reply from server!GET image2.jpg HTTP/1.1!
Persistent connec;ons avoid unnecessary slow starts
• Fetch an 8.5 Kbyte page with 10 embedded objects, most < 10 Kbyte • Leave TCP connecFon open a]er server response, next HTTP request reuses • Only incur one slow start, but takes an RTT to issue next request
Bytes received
Time (milliseconds)
HTTP/1.0 with Keep-‐alive
First slow start HTTP/1.0
HTTP request pipelining • Normally client only issues HTTP requests aUer server’s
response • Clients using HTTP pipelining issue mulFple requests without
wai;ng for responses • Advantages
– Reduces latency of page loading – Can reduce overhead, packing mulFple requests into one packet
GET /index.html HTTP/1.1!GET image1.jpg HTTP/1.1!GET image2.jpg HTTP/1.1!
Client Server
GET /index.html HTTP/1.1!GET image1.jpg HTTP/1.1!GET image2.jpg HTTP/1.1!
Client Server
Pipelined requests overlap RTTs
• Fetch an 8.5 Kbyte page with 10 embedded objects, most < 10 Kbyte • Send mulFple HTTP requests simultaneously • May or may not use same TCP connec;on -‐ fairness issues
Bytes received
Time (milliseconds)
HTTP/1.0 with Keep-‐alive
HTTP/1.1 + pipelined requests
HTTP/1.0
Outline
• Background and HTTP • Interac;on between applica;ons and TCP – ConnecFon establishment and slow start – Web transfers, HTTP/1.1 persistent connecFons – Interac;ve applica;ons: Nagle’s algorithm
• Web caching • Content distribuFon networks
Interac;ve applica;ons: When exactly should TCP send?
• Server wriFng data in pieces of size < MSS into a TCP sender socket buffer
• Common case: cwnd ≥ maximum segment size (MSS) • Transport layer problem: when to transmit a segment? – Early TCP: send immediately when data arrives
• Wastes capacity, sending small packets (“Fnygrams”) – Send when reach MSS? Break ssh, remote desktop, other interacFve apps
TCP sender TCP receiver
Sender app
MSS
Nagle’s algorithm • Proposed in RFC 896, in 1984 • Delay sending unFl have MSS bytes of data to send, but • Use the ACK clock to trigger small packet transmission
– Low RTT connecFons: ACK comes back before next keystroke – High RTT connecFons: amorFze keystrokes, avoid Fnygrams
When applica.on generates new data: if available data ≥ MSS and cwnd ≥ MSS then send a full segment
else if unacked data in flight then buffer new data unFl ACK arrives
else send immediately On ACK receipt: send pending data (subject to cwnd)
Nagle’s algorithm on a WAN • Various amounts of data
from slip (client) to vangogh: 1, 1, 2, 1, 2, 2, 3… bytes
• Coalescing data unFl previous data acknowledged
• Segments 14, 15 appear to contradict Nagle – 14 is in response to 12 – 15 is in response to 13
--
--
268 TCP Interactive Data Flow Chapte
Things change, however, when the round-trip time (RTI) increases, typically ac a WAN. Let's look at an Rlogin connection between our host slip and the vangogh. cs . berkeley. edu. To get out of our network (see inside front cover), SLIP links must be traversed, and then the Internet is used. We expect much Ion round-trip times. Figure 19.4 shows the time line of some data flow while charac were being typed quickly on the client (similar to a fast typist). (We have removed type-of-service information, but have left in the window size advertisements.)
slip.l023 vangogh.login
PSH 5:6(1) ack 47, win 40960.0 1
PSH 47:48(1) ack 6, win 8192 2 0.197694 (0.1977) -
PSH 6:7(1) ack 48, win 40960.232457 (0.0348) 3
PSH 48:49(1) ack 7, win 8192 4 0.437593 (0.2051)
0.464257 (0.0267) 5 PSH 7:9(2) ack 49, win 4095
PSH 49:51(2) ack 9, win 8192 6 0.677658 (0.2134)
0.707709 (0.0301) 7 PSH 9:10(1) ack 51, win 4094
PSH 51:52(1) ack 10, win 8192 8 0.917762 (0.2101)
0.945862 (0.0281) 9 PSH 10:12(2) ack 52, win 4095
PSH 52:54(2) ack 12, win 8192 10 1.157640 (0.2118)
1.187501 (0.0299) 11 PSH 12:14(2) ack 54, win 4094
ack 14, win 8190 12 1.427852 (0.2404) -- PSH 54:56(2) ack 14, win 8192 13 1.428025 (0.0002)
1.457191 (0.0292) 14 PSH 14:17(3) ack 54, win 4096
1.478429 (0.0212) 15 PSH 17:18(1) ack 56, win 4096 --PSH 56:59(3) ack 18, win 8191 16
1.727608 (0.2492)
1.762913 (0.0353) 17 PSH 18:21(3) ack 59, win 4093 -PSH 59:60(1) ack 21, win 8189 1
1.997900 (0.2350)
Figure 19.4 Data flow using rlogin between sl ip and vangogh. cs. berkeley. edu . [Stevens, TCP/IP Illustrated, vol. 1]
Nagle’s algorithm at a web server
• HTTP server should disable Nagle’s algorithm (setsockopt(…,TCP_NODELAY,…)) and send data in one write
Client Server Client Server
• Suppose MSS=1460 bytes, and HTTP header length = 25 bytes, HTTP response length = 1000 bytes, cwnd large
1:1460
1461:2920
2921:2925
Wasted Fme
Server with Nagle on, one write call
Server with Nagle on, separate write calls
write(1:2925) (5 bytes remain)
1:25 write(1:25)
26:1025
write(26:1025)
Wasted Fme
Outline
• Background and HTTP • InteracFon between applicaFons and TCP • Web caching – CondiFonal GET – Cache coherency – Web cache hierarchies
• Content distribuFon networks
Web caching: Introduc;on
• Why cache web pages? 1. Reduce load on web servers
(“origin” servers) • Persistent load • Flash crowds (“Slashdot effect”)
2. Reduce Internet congesFon 3. Reduce bandwidth
consumpFon in the Internet • IncenFve: reduce ISP peering costs
4. Improve client fetch latency 5. Improve availability of web
pages for clients
Time (one day)
Requests per second at web server
7000
[Jung, Krishnamurthy, Rabinovich in WWW ’02]
Web caching: Mechanism • Web proxy is somewhere on path from client to origin server • User configures browser to access web via proxy • Browser sends all HTTP requests to the cache – Object in cache: cache returns object in HTTP response – Otherwise: cache requests object from origin server, then returns object to client
• Proxies may be chained: requests go from client to server via more than one proxy
Origin server Client
Client
Web proxy server (aka web cache)
More about web caching • Cache acts as both client and server • Typically cache is installed by ISP (university, company, residenFal ISP)
• Internet dense with caches: enables “poor” content providers to effecFvely deliver content (but so does peer-‐to-‐peer file sharing)
Web proxy server (aka web cache)
Web server Client
Client
Caching example Assump;ons • Avg. object size: 100 Kbits • Avg. request rate from
insFtuFon’s browsers to origin servers: 15/sec
• InsFtuFonal router − origin server RTT: 2 sec.
Consequences • LAN UFlizaFon: 15% • Access link uFlizaFon: 100% • Total delay = Internet delay +
access delay + LAN delay = 2 sec. + minutes +
milliseconds
Origin servers
Public Internet
Ins;tu;onal network
10 Mbit/s LAN
1.5 Mbit/s Access link
Caching example: Provisioning • Increase bandwidth of access
link to, say, 10 Mbps Consequence • LAN uFlizaFon: 15% • Access link uFlizaFon : 15% • Total delay = Internet delay +
access delay + LAN delay = 2 sec. + millis. + millis. • O]en a costly upgrade
Origin servers
Public Internet
Ins;tu;onal network
10 Mbit/s LAN
10 Mbit/s Access link
Caching example: Web cache • Install a web cache – Suppose hit rate is 0.4
Consequence • 40% requests will be saFsfied
almost immediately • 60% requests saFsfied by
origin server • UFlizaFon of access link
reduced to 60%, resulFng in negligible delays (say 10 milliseconds)
• Total average delay = Internet delay + access delay + LAN delay = .6*(2.01) secs + .4*milliseconds < 1.4 secs
Origin servers
Public Internet
Ins;tu;onal network
10 Mbit/s LAN
1.5 Mbit/s Access link
Placing web caches Where to place caches? ☞ Where are Internet borlenecks?
1. Network link from HosFng AS to server 2. Access line from client to ISP 3. Client’s ISP to backbone AS 4. Client’s Internal AS 5. Peering points 6. The web server itself
• HosFng service AS edge (2.)✔, (6.)✔
• Client’s premises (1.)✔
• An ISP’s AS (3.)✔
HosFngAS Backbone
AS
Client’s ISP AS
Web server
Client
Access line STOP
STOP
STOP
STOP STOP
(1.)
(2.)
(3.) STOP
STOP (6.)
Cache consistency • DefiniFon: Cached objects match corresponding objects on
origin server • HTTP/1.1 has mechanisms for ensuring freshness:
– Expires header on server responses Expires: Fri, 30 Oct 1998 14:19:41 GMT!
– Cache control header on server responses Cache-control: no-cache!
• Forces caches to resubmit request to origin server every Fme Cache-control: no-store!
• Forbids caches from storing content under any condiFons Cache-control: max-age=[seconds]!
• Similar to expires header, relaFve Fme !
Condi;onal GET • Goal: Don’t send object if
cache has up-‐to-‐date cached version
• Cache: specify date of cached copy in HTTP request – If-‐modified-‐since: <date>
• Server: response contains no object if cached copy is up-‐to-‐date: – HTTP/1.0 304 Not Modified
Cache Server
HTTP request If-modified-since:
<date>
HTTP response HTTP/1.0
304 Not Modified
Object not
modified
HTTP request If-modified-since:
<date>
HTTP response HTTP/1.0 200 OK
<data>
Object modified
Coopera;ve caching • Can arrange caches together in a cluster • How to locate an item in a cluster of caches? 1. Always local (caches in cluster don’t cooperate); results
in duplicaFon 2. Primary plus mulFcast – Uses inter-‐cache protocol such as Internet Cache Protocol
(ICP) 3. Hashing – Direct access to resource – Changing the hash funcFon can result in objects moving about
4. Consistent hashing
Outline
• Background and HTTP • InteracFon between applicaFons and TCP • Web caching • Content distribu;on networks – Mechanism: client DNS redirecFon – Consistent hashing for server selecFon – OperaFon under failure
Content distribu;on networks • Challenge ca. 2002: How to reliably deliver large amounts of content to users worldwide? – Flash crowds overwhelm web server, access link, or back end database infrastructure
– More rich content: audio, video
• Possible soluFons – Web caching: dynamic content and diversity causes low proxy hit rates (25−40%)
– MulFhoming for the server: BGP takes minutes to converge upon route failure
Content distribu;on networks
• Replicate content at the edge – Deploy thousands of content distribuFon network (CDN) edge servers at many ISPs
– Push content to edge servers ahead of Fme
– Avoid impairments (loss, delay) of using long paths
– Solve “flash crowds” problem
• Which CDN server? 1. Nearest server to client 2. Likely to have content 3. Available (load, bandwidth) 4. Adapt choice quickly (secs.)
Content Server
Clients
Clients Edge Server
Edge Server
Sending clients to the CDN for content
• Web client fetches web page via HTTP GET, as usual • Content provider rewrites content links to point to CDN
– Browser will issue HTTP GETs for rich content to the CDN edge servers instead of the content server
– CDN companies provide automated tools to rewrite links!!!
Content provider’s network
Content Server (news.bbc.co.uk)
Clients
An “Akamized” URL (ARL)
Serial number: Defines content served from same set of servers Type code: f = staFc TTL, 6 = If-‐Modified-‐Since, 7 = MD5 hash – Various ways of ensuring freshness of object served
Content provider code (CPC): Unique ID for content provider Object data: f = Fme, 6 = “000”, 7 = MD5 hash value Absolute URL: Used to retrieve content from provider! Serial number Type Serial CPC Object data !http://a764.g.akamai.net/f/764/16742/1h/↵!!!! !www.1800flowers.com/800f_assets/jet/website/↵!!!! !images/flowers/flowers/banners/hp/hpb/↵!!!! !btn_findagift.gif"!
• Note: Serial number is the only input to DNS name resoluFon
Absolute URL
Mapping requests to CDN servers
• Goal: find the CDN server nearest to the client – Establish BGP peering sessions with Internet border routers → coarse-‐grained AS map of Internet
– Combine with live traceroute, loss measurement data between CDN servers
– Result: a map of the Internet
• Finding an available CDN server – Server health, service requested, server load balancing (CPU, disk, network), and network condiFon
– Agents simulate user behavior, downloading objects, and reporFng failure rates and service Fmes
DNS-‐based redirec;on
• Two levels of DNS indirecFon 1. Akamai top-‐level nameservers (TLNSs) • LocaFons: US (4), Europe (4), Asia (1)
• TLNSs return eight LLNSs in three different regions – Chosen to be close to the requesFng client (use the map) – Handles complete failure of any two parFcular regions
2. Akamai low-‐level nameservers (LLNSs) • Point to Akamai edge servers, which serve content • Do most of the load-‐balancing
End user
DNS resolu;on
Akamai TLNS
10 g.akamai.net
1
OS
2
Local name server
3
xyz.com’s nameserver
6 7 9
16
15
11 20.20.123.55
Akamai LLNS
12 a212.g.akamai.net
30.30.123.5 13 14
4 xyz.com Generic Top-‐Level Domain Nameserver 10.10.123.5 5
8
Select cluster
Select servers within cluster
[Slide: Bruce Maggs]
Akamai DNS redirec;on
Content server (news.bbc.co.uk)
LLNS
Any network
TLNS TTL 30−60m
Web clients’ network
Content provider’s network Network close to client
Select cluster
HTML
TTL 20s
Edge server cluster
Select server
End user
LLNS
Akamai Cluster
Servers pool resources
• RAM
• Disk
• Throughput
Hashing
• Universe U of all possible objects, set S of servers – Object: Web content objects – Server: Edge content server
• Hash funcFon h: U → S assigns objects to servers – E.g., h(x) = [ax + b (mod p)] (mod |S|), where • p is a prime integer, and p > |S| • a, b are constant integers chosen uniformly at random from [0, p − 1] • x is an object’s serial number
Difficulty: Changing number of servers
Server
Object serial number
h(x) = x + 1 (mod 4)
7 10 11 27 29 36 38 40
4
3
2
1
0
5
Add one machine: h(x) = x + 1 (mod 5)
Consistent hashing
• Low-‐level DNS servers act independently and may have different ideas about how many and which servers are alive
• Who uses it? – Akamai – amazon.com: Dynamo data store – Last.fm music website – Akamai CDN – Chord P2P network
D. Karger, E. Lehman, T. Leighton, M. Levine, R. Panigrahy. Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. Proceedings of the ACM STOC, 1997
Consistent hashing Map both objects and servers onto the unit circle.
0 1
➙
For objects with serial number x: h(x) = successor( [ax + b (mod p)] / p), where successor finds the next server clockwise along the ring.
For servers with id number x´: h(x´) = [ax´ + b (mod p)] / p
Server 0 Content object
Proper;es of consistent hashing • Balance: Objects are assigned to buckets “randomly” • Locality: When a server is added/removed, the only objects
affected are those that are/were mapped to the bucket • Load: Objects are assigned to buckets evenly, even over a set
of views – Can be improved by mapping each server to mulFple places on the unit circle
• Spread: An object should be mapped to a small number of servers over a set of views – Can choose different a, b to make a new hash funcFon – Use mulFple hash funcFons to map object to mulFple servers
Adding and removing servers
• Each server is responsible for a different secFon of the ring
• Adding a server – Only need to move content between predecessor server and new server (segment AC)
• Removing a server – Clean shutdown: Server pushes content to its successor
– Failures: lookup a different server, and push content to successor in background
Server Content object
A B
C
0
Hardware/server failures
• Linux and Windows servers with large RAM and disk capacity
• Example failures 1. Memory modules jumping out of their sockets 2. Network cards screwed down but not in slot 3. Hardware component failures: HDD, motherboard, etc. 4. Others
Opera;on under failure
LLNS
Network X
TLNS
Network C
Transit network B
Network A
Cluster C
LLNS
Cluster A
• Network X fails – No impact on already-‐
selected LLNSs – Different TLNS next Fme – Mapping fails over
• Server in cluster A fails – DetecFon using heartbeats – Another server
(“buddy”)proxy-‐ARPs (immediately)
– LLNS removes failed server (30−60 min)
• Transit network B fails – Route using intermediate
server in transit network E • Network A or LLNS A fails
– Client’s local NS will use another LLNS (30−60 min)
Transit network E
Low-‐level DNS load balancing
Random permutaFons of servers
Why? To spread load for one serial number.
a212
a213
a214
a215
10.10.10.1 10.10.10.4 10.10.10.3 10.10.10.2
10.10.10.3 10.10.10.4 10.10.10.2 10.10.10.1
10.10.10.1 10.10.10.2 10.10.10.3 10.10.10.4
10.10.10.2 10.10.10.1 10.10.10.4 10.10.10.3
[Slide: Bruce Maggs]
LLNS DNS-‐based load balancing
• Input: which servers have which content (from consistent hashing), server load, proximity to client (from DNS request)
• Consider edge server A. Two thresholds: 1. Load > low threshold: add servers to distribute load
(more hash funcFons to consistent hash ring) 2. Load > high threshold: shed load on that server (remove
that hash funcFon from consistent hash ring)