cs 4700 / cs 5700 network fundamentals lecture 16: ixps and dcns (the underbelly of the internet)...

82
CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

Upload: alberta-williams

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

CS 4700 / CS 5700Network Fundamentals

Lecture 16: IXPs and DCNs(The Underbelly of the Internet)

Revised 3/15/2014

Page 2: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

2

Internet connectivity and IXPs

Data center networks

Outline

Page 3: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

3

The Internet as a Natural System You’ve learned about the TCP/IP Internet

Simple abstraction: Unreliable datagram transmission

Various layers Ancillary services (DNS) Extra in-network support

So what is the Internet actually being used for? Emergent properties impossible to predict from

protocols Requires measuring the network Constant evolution makes it a moving target

Page 4: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

4

Measuring the capital-I Internet*

Measuring the Internet is hard Significant previous work on

Router and AS-level topologies Individual link / ISP traffic studies Synthetic traffic demands

But limited “ground-truth” on inter-domain traffic Most commercial arrangements under NDA Significant lack of uniform instrumentation

*Mainly borrowed stolen from Labovitz 2010

Page 5: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

5

Conventional Wisdom (i.e., lies) Internet is a global scale end-to-end network

Packets transit (mostly) unmolested Value of network is global addressability

/reachability

Broad distribution of traffic sources / sinks

An Internet “core” exists Dominated by a dozen global transit providers

(tier 1) Interconnecting content, consumer and regional

providers

Page 6: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

6

Traditional view

Page 7: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

7

How do we validate/improve this picture?

Measure from 110+ ISPs / content providers Including 3,000 edge routers and 100,000

interfaces And an estimated ~25% all inter-domain

traffic Do some other validation

Extrapolate estimates with fit from ground-truth data

Talk with operators

Page 8: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

8

Where is traffic going?

Increasingly: Google and Comcast Tier 1 still has large fraction,

but large portion of it is to Google! Why?

Consolidation of traffic Fewer ASes responsible

for more of the traffic

Page 9: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

9

Why is this happening?

Page 10: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

10

Transit is dead! Long live the eyeball! Commoditization of IP and Hosting / CDN

Drop of price of wholesale transit Drop of price of video / CDN Economics and scale drive enterprise to “cloud”

Consolidation Bigger get bigger (economies of scale) e.g., Google, Yahoo, MSFT acquisitions

Success of bundling / Higher Value Services – Triple and quad play, etc.

New economic models Paid content (ESPN 3), paid peering, etc. Difficult to quantify due to NDA / commercial privacy

Disintermediation Direct interconnection of content and consumer Driven by both cost and increasingly performance

Page 11: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

11

A more accurate model?

Page 12: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

12

How do ASes connect?

Point of Presence (PoP) Usually a room or a building (windowless) One router from one AS is physically connected to the

other Often in big cities Establishing a new connection at PoPs can be

expensive

Internet eXchange Points Facilities dedicated to providing presence and

connectivity for large numbers of ASes Many fewer IXPs than PoPs Economies of scale

Page 13: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

13

IXPs worldwide

https://prefix.pch.net/applications/ixpdir/

Page 14: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

14

Inside an IXP

Connection fabric Can provide illusion of all-to-all

connectivity Lots of routers and cables

Also a route server Collects and distributes routes

from participants Example routing policies

Open: Peer with anyone (often used by content providers)

Case-by-case: Give us a reason to peer (mutual incentives, $$)

Closed: Don’t call us, we’ll call you

Page 15: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

15

Inside an IXP

Page 16: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

16

How much traffic is at IXPs?*

We don’t know for sure! Seems to be a lot, though. One estimate: 43% of exchanged bytes are not

visible to us Also 70% of peerings are invisible

*Mainly borrowed stolen from Feldmann 2012

Page 17: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

17

Revised model 2012+

Page 18: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

18

Internet connectivity and IXPs

Data center networks

Outline

Page 19: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

19

“The Network is the Computer” Network computing has been around forever

Grid computing High-performance computing Clusters (Beowulf)

Highly specialized Nuclear simulation Stock trading Weather prediction

All of a sudden, datacenters/the cloud are HOT Why?

Page 20: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

20

The Internet Made Me Do It

Everyone wants to operate at Internet scale Millions of users

Can your website survive a flash mob? Zetabytes of data to analyze

Webserver logs Advertisement clicks Social networks, blogs, Twitter, video…

Not everyone has the expertise to build a cluster The Internet is the symptom and the cure Let someone else do it for you!

Page 21: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

21

The Nebulous Cloud

What is “the cloud”? Everything as a

service Hardware Storage Platform Software

Anyone can rent computing resources Cheaply At large scale On demand

Page 22: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

22

Example:Amazon EC2

Amazon’s Elastic Compute Cloud Rent any number

of virtual machines For as long as you

want Hardware and

storage as a service

Page 23: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

23

Example: Google App Engine

Platform for deploying applications From the developers perspective:

Write an application Use Google’s Java/Python APIs Deploy app to App Engine

From Google’s perspective: Manage a datacenter full of machines and storage All machines execute the App Engine runtime Deploy apps from customers Load balance incoming requests Scale up customer apps as needed

Execute multiple instances of the app

Page 24: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

24

Page 25: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

25

Page 26: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

26

Page 27: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

27

Typical Datacenter Topology

20-40 Machines Per

Rack

Top of Rack (ToR) Switches

Aggregation Routers

Core Routers

1 Gbps Etherne

t

10 Gbps

Ethernet

The Internet

Link Redundanc

y

Page 28: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

28

Advantages of Current Designs Cheap, off the shelf, commodity parts

No need for custom servers or networking kit (Sort of) easy to scale horizontally

Runs standard software No need for “clusters” or “grid” OSs Stock networking protocols

Ideal for VMs Highly redundant Homogeneous

Page 29: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

29

Lots of Problems

Datacenters mix customers and applications Heterogeneous, unpredictable traffic patterns Competition over resources How to achieve high-reliability? Privacy

Heat and Power 30 billion watts per year, worldwide May cost more than the machines Not environmentally friendly

All actively being researched

Page 30: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

30

Today’s Topic : Network Problems Datacenters are data intensive Most hardware can handle this

CPUs scale with Moore’s Law RAM is fast and cheap RAID and SSDs are pretty fast

Current networks cannot handle it Slow, not keeping pace over time Expensive Wiring is a nightmare Hard to manage Non-optimal protocols

Page 31: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

Outline

• Intro• Network Topology and

Routing• Fat Tree• 60Ghz Wireless• Helios• Cam Cube

• Transport Protocols

31

Page 32: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

32

Problem: Oversubscription

40 Machines

1 Gbps Each

1:4

40x1 Gbps Ports

40x1Gbps 1x10 Gbps

1:80-240

#Racksx40x1 Gbps 1x10 Gbps

1:1

• Bandwidth gets scarce as you move up the tree

• Locality is key to performance

• All-to-all communication is a very bad idea

Page 33: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

36

Consequences of Oversubscription Oversubscription cripples your datacenter

Limits application scalability Bounds the size of your network

Problem is about to get worse 10 GigE servers are becoming more affordable 128 port 10 GigE routers are not

Oversubscription is a core router issue Bottlenecking racks of GigE into 10 GigE links

What if we get rid of the core routers? Only use cheap switches Maintain 1:1 oversubscription ratio

Page 34: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

37

Fat Tree Topology

To build a K-ary fat tree• K-port switches• K3/4 servers• (K/2)2 core switches• K pods, each with K

switches

In this example K=4• 4-port switches• K3/4 = 16 servers• (K/2)2 = 4 core switches• 4 pods, each with 4

switches

Pod

Page 35: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

38

Fat Tree at a Glance

The good Full bisection bandwidth Lots of redundancy for failover

The bad Need custom routing

Paper uses NetFPGA Cost 3K2/2 switches

48 port switches = 3456 The ugly

OMG THE WIRES!!!! (K3+2K2)/4 48 port switches = 28800

Page 36: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

39

Is Oversubscription so Bad?

Oversubscription is a worst-case scenario If traffic is localized, or short, there is no

problem How bad is the problem?

Page 37: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

40

Idea: Flyways

Challenges Additional wiring Route switching

Page 38: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

41

Wireless Flyways

Why use wires at all? Connect ToR servers wirelessly Why can’t we use Wifi?

Massive interference

Key issue: Wifi is not directed

Page 39: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

42

Direction 60 GHz Wireless

Page 40: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

43

Implementing 60 GHz Flyways

Pre-compute routes Measure the point-to-point

bandwidth/interference Calculate antenna angles

Measure traffic Instrument the network stack per host Leverage existing schedulers

Reroute Encapsulate (tunnel) packets via the flyway No need to modify static routes

Page 41: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

44

Results for 60 GHz Flyways

• Hotspot fan-out is low• You don’t need that

many antennas per rack

• Prediction/scheduling is super important

• Better schedulers could show more improvement

• Traffic aware schedulers?

Page 42: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

45

Random Thoughts

Getting radical ideas published is not easy Especially at top conferences

This is a good example of paper that almost didn’t make it Section 7 (Discussion) and Appendix A are all

about fighting fires The authors still got tons of flack at SIGCOMM

Takeaways Don’t be afraid to be ‘out there’ But always think defensively!

Page 43: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

46

Problems with Wireless Flyways

Problems Directed antennas still cause directed

interference Objects may block the point-to-point signal

Page 44: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

47

3D Wireless Flyways

Prior work assumes 2D wireless topology Reduce interference by using 3D beams Bounce the signal off the ceiling!

Stainless Steel Mirrors 60 GHz

Directional Wireless

Page 45: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

48

Comparing Interference

2D beam expands as it travels Creates a cone of

interference 3D beam focuses into a

parabola Short distances = small

footprint Long distances = longer

footprint

Page 46: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

49

Scheduling Wireless Flyways

Problem: connections are point-to-point Antennas must be mechanically angled to form

connection Each rack can only talk to one other rack at a

time How to schedule the links? Proposed solution

Centralized scheduler that monitors traffic Based on demand (i.e. hotspots), choose links

that: Minimizes interference Minimizes antenna rotations (i.e. prefer smaller

angles) Maximizes throughput (i.e. prefer heavily loaded

links)

• NP-Hard scheduling problem• Greedy algorithm for approximate

solution

Page 47: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

50

Other issues

Ceiling height Antenna targeting errors Antenna rotational delay

Page 48: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

51

3D Flyway Performance

Page 49: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

52

Modular Datacenters

Shipping container “datacenter in a box” 1,204 hosts per container However many containers you want

How do you connect the containers? Oversubscription, power, heat… Physical distance matters (10 GigE 10 meters)

Page 50: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

53

Possible Solution: Optical Networks Idea: connect containers using optical

networks Distance is irrelevant Extremely high bandwidth

Optical routers are expensive Each port needs a transceiver (light packet) Cost per port: $10 for 10 GigE, $200 for optical

Page 51: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

54

Helios: Datacenters at Light Speed Idea: use optical circuit switches, not routers

Uses mirrors to bounce light from port to port No decoding!

•Tradeoffs▫Router can forward from any port to any other port▫Switch is point to point▫Mirror must be mechanically angled to make connection

In Port Out Port

Mirror

In Port Out Port

Transceiver

Transceiver

OpticalRouter

OpticalSwitch

Page 52: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

55

Dual Optical Networks Typical, packet

switch network Connects all

containers Oversubscribed Optical routers

•Fiber optic flyway▫Optical circuit switch▫Direct container-to-

container links, on demand

Page 53: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

56

Circuit Scheduling and Performance Centralized topology manager

Receives traffic measurements from containers Analyzes traffic matrix Reconfigures circuit switch Notifies in-container routers to change routes

Circuit switching speed ~100ms for analysis ~200ms to move the mirrors

Page 54: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

57

Datacenters in 4D

Why do datacenters have to be trees?

Cam Cube 3x3x3 hyper-cube of servers Each host directly connects to 6

neighbors Routing is now hop-by-hop

No monolithic routers Borrows P2P techniques New opportunities for

applications

Page 55: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

Outline

• Intro• Network Topology and

Routing• Transport Protocols

• Google and Facebook• DCTCP• D3

58

Actually Deployed

Never Gonna Happen

Page 56: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

59

Transport on the Internet

TCP is optimized for the WAN Fairness

Slow-start AIMD convergence

Defense against network failures Three-way handshake Reordering

Zero knowledge congestion control Self-induces congestion Loss always equals congestion

Delay tolerance Ethernet, fiber, Wi-Fi, cellular, satellite, etc.

Page 57: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

60

Datacenter is not the Internet

The good: Possibility to make unilateral changes

Homogeneous hardware/software Single administrative domain

Low error rates The bad:

Latencies are very small (250µs) Agility is key!

Little statistical multiplexing One long flow may dominate a path Cheap switches have queuing issues

Incast

Page 58: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

61

Partition/Aggregate Pattern

Common pattern for web applications Search E-mail

Responses are under a deadline ~250ms

Web Server

Aggregators

Workers

User RequestResponse

Page 59: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

62

Problem: Incast

Aggregator sends out queries to a rack of workers 1 Aggregator 39 Workers

Each query takes the same time to complete

All workers answer at the same time 39 Flows 1 Port Limited switch memory Limited buffer at aggregator

Packet losses :(

Aggregator

Workers

Page 60: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

63

Problem: Buffer Pressure

In theory, each port on a switch should have its own dedicated memory buffer

Cheap switches share buffer memory across ports The fat flow can congest

the thin flow!

Page 61: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

64

Problem: Queue Buildup

Long TCP flows congest the network Ramp up, past slow start Don’t stop until they induce queuing + loss Oscillate around max utilization

•Short flows can’t compete▫Never get out of

slow start▫Deadline

sensitive!▫But there is

queuing on arrival

Page 62: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

65

Industry Solutions Hacks

Limits search worker responses to one TCP packet

Uses heavy compression to maximize data

Largest memcached instance on the planet Custom engineered to use UDP Connectionless responses Connection pooling, one packet queries

Page 63: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

66

Dirty Slate Approach: DCTCP

Goals Alter TCP to achieve low latency, no queue

buildup Work with shallow buffered switches Do not modify applications, switches, or

routers Idea

Scale window in proportion to congestion Use existing ECN functionality Turn single-bit scheme into multi-bit

Page 64: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

67

Explicit Congestion Notification Use TCP/IP headers to send ECN signals

Router sets ECN bit in header if there is congestion

Host TCP treats ECN marked packets the same as packet drops (i.e. congestion signal) But no packets are dropped :)

No Congestio

n

Congestion

ECN-bit set in ACK

Sender receives feedback

Page 65: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

68

ECN and ECN++

Problem with ECN: feedback is binary No concept of proportionality Things are either fine, or disasterous

DCTCP scheme Receiver echoes the actual EC bits Sender estimates congestion (0 ≤ α ≤ 1) each

RTT based on fraction of marked packets cwnd = cwnd * (1 – α/2)

Page 66: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

69

DCTCP vs. TCP+RED

Page 67: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

70

Flow/Query Completion Times

Page 68: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

71

Shortcomings of DCTCP

Benefits of DCTCP Better performance than TCP Alleviates losses due to buffer pressure Actually deployable

But… No scheduling, cannot solve incast Competition between mice and elephants Queries may still miss deadlines

Network throughput is not the right metric Application goodput is Flows don’t help if they miss the deadline Zombie flows actually hurt performance!

Page 69: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

72

Poor Decision Making

• Two flows, two deadlines• Fair share causes both to

fail• Unfairness enables both

to succeed

• Many flows, untenable deadline

• If they all go, they all fail• Quenching one flow results in

higher goodput

Page 70: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

73

Clean Slate Approach: D3

Combine XCP with deadline information Hosts use flow size and deadline to request

bandwidth Routers measure utilization and make soft-reservations

RCP ensures low queuing, almost zero drops Guaranteed to perform better than DCTCP High-utilization

Use soft state for rate reservations IntServe/DiffServe to slow/heavy weight

Deadline flows are small, < 10 packets, w/ 250µs RTT rate = flow_size / deadline Routers greedily assign bandwidth

Page 71: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

74

Soft Rate Reservations

Motivation Don’t want per flow state in the router Use a malloc/free approach

Once per RTT, hosts send RRQ packets Include desired rate Routers insert feedback into packet header

Vector Fields Previous feedback Vector of new feedback ACKed feedback

Page 72: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

75

Soft Rate Reservations (cont.)

Router keeps track of Number of active flows Available capacity

At each hop, router… Frees the bandwidth already in use Adjust available capacity based on new rate request

If no deadline, give fair share If deadline, give fair share + requested rate If bandwidth isn’t available, go into header-only mode

Insert new feedback, increment index Why give fair share + requested rate?

You want rate requests to be falling Failsafe against future changes in demand

Page 73: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

76

D3 Example

D3

Header

Feedback copied

into ACK

cwnd = cwnd + feedback

Previous Rates

Desired Rate = 20 Mbps

30 Mbps

45 Mbps

40 Mbps

Previous Rates

Desired Rate = 20 Mbps

30 Mbps

45 Mbps

40 Mbps

Previous Rates

Desired Rate = 20 Mbps

28 Mbps

30 Mbps

45 Mbps

40 Mbps

28 Mbps

23 Mbps

This example is for a deadline flow Non-deadline flows have desired rate = 0

This process occurs every RTT

Page 74: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

77

Router Internals

Separate capacity from demand Demand increases irrespective of utilization Fair share rate is based on demand As deadline flows arrive, even if all

bandwidth is used, demand increases During the next RTT, fair share is reduced Frees bandwidth for satisfying deadlines

Capacity is virtual Like XCP, multi-hop bottleneck scenarios

Desired Rate = 10 Mbps

Desired Rate = 10 Mbps

Desired Rate = 10 Mbps

10 Mbps

Page 75: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

78

D3 vs. TCP

Flows that make 99% of deadlines

Page 76: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

79

Results Under Heavy Load

With just short flows, 4x as many flows with 99% goodput

With background flows, results are even better

Page 77: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

80

Flow Level Behavior

TCP RCP D3

Page 78: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

81

Flow Quenching

Idea: kill useless flows rather Deadline is missed Needed rate exceeds capacity

Prevents performance drop-off under insane load

Page 79: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

82

Benefits of D3

Higher goodput under heavy load TCP can not operate at 99% utilization Network provisioning can be tighter

More freedom for application designers Recall Google and Facebook Current networks can barely support tight

deadlines Deadline-bound apps can:

Send more packets Operate under tighter constraints

Page 80: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

83

Challenges with D3

All-or-nothing deployment Not designed for incremental deployment May not play nice with TCP

Significant complexity in the switch? XCP ran in an FPGA 2005 NetFPGAs are still expensive

Application level changes For most apps, just switch socket type For deadline aware, need to know the flow size

Current deadline apps do this! Periodic state flushes at the router

Page 81: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

84

Conclusions

Datacenters are a super hot topic right now Lots of angles for research/improvement

Heat, power Topology, routing Network stack VM migration, management Applications: NoSQL, Hadoop, Cassandra, Dynamo

Space is getting crowded Tough to bootstrap research

All but one of today’s papers were from Microsoft

Who else has the hardware to do the research?

Page 82: CS 4700 / CS 5700 Network Fundamentals Lecture 16: IXPs and DCNs (The Underbelly of the Internet) Revised 3/15/2014

85

Big Open Problem

Measurement data Real datacenter traces are very hard to get Are they representative?

Really, what is a ‘canonical’ datacenter? Application dependent

Makes results very hard to quantify Cross-paper comparisons Reproducibility