ictcp: incast congestion control for tcp in data center networks∗

41
ICTCP: INCAST CONGESTION CONTROL FOR TCP IN DATA CENTER NETWORKS Haitao Wu ⋆ , Zhenqian Feng ⋆ †, Chuanxiong Guo ⋆ , Yongguang Zhang ⋆ {hwu, v-zhfe, chguo, ygz}@microsoft.com, ⋆ Microsoft Research Asia, China †School of computer, National University of Defense Technology, China B99106017 圖圖圖 圖圖圖

Upload: felix

Post on 15-Feb-2016

54 views

Category:

Documents


1 download

DESCRIPTION

ICTCP: Incast Congestion Control for TCP in Data Center Networks∗. Haitao Wu ⋆ , Zhenqian Feng ⋆ †, Chuanxiong Guo ⋆ , Yongguang Zhang ⋆ { hwu , v- zhfe , chguo , ygz }@microsoft.com, ⋆ Microsoft Research Asia, China - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

ICTCP: INCAST CONGESTION CONTROL FOR TCPIN DATA CENTER NETWORKS∗

Haitao Wu , Zhenqian Feng †, Chuanxiong Guo , Yongguang Zhang ⋆ ⋆ ⋆ ⋆{hwu, v-zhfe, chguo, ygz}@microsoft.com,

⋆ Microsoft Research Asia, China†School of computer, National University of Defense Technology, China

B99106017 圖資三 謝宗昊

Page 2: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Outline• Background• Design Rationale• Algorithm• Implementation• Experimental results• Discussion and conclusion

Page 3: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Outline• Background• Design Rationale• Algorithm• Implementation• Experimental results• Discussions, related work and conclusion

Page 4: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Background• In distributed file systems, files are stored at multiple

servers.• TCP does not work well for many-to-one traffic pattern on

high-bandwidth, low-latency networks.

Page 5: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Background• Three preconditions of data center

1) Be well structured and layered to achieve high-bandwidth and low-latency. Buffer size of ToR (top-of-rack)

2) Barrier synchronized many-to-one traffic pattern is common in data center network

3) Transmission data volume for such traffic pattern is usually small

Page 6: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Background• TCP incast collapse

• Due to multiple connections overflow the Ethernet switch buffer in a short period of time.

• Intense packet losses and thus TCP retransmission and timeout• Previous solution

• Reducing the waiting time for packet loss• Control switch buffer occupation to avoid overflow by using ECN

and modified TCP at both sender and receiver side

Page 7: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Background• This paper focus on:

• Avoiding packet losses before incast congestion• Modify TCP receiver only

• Receiver side knows the throughput of all TCP connections and the available bandwidth

Page 8: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Background• Well controlling the receive windows is challenging

• Receive window should be small enough to avoid incast congestion

• Also should be large enough for good performance and other non-incast cases

• Good setting for one scenario may not fit well to others

Page 9: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Background• The technical novelities in this paper:

1) Use the available bandwidth as a quota to coordinate the receive window increase

2) Per flow congestion control is performed independently in slotted time of RTT on each connection

3) Receive window adjustment is based on the ratio of difference of measured and expected throughput over expected one

Page 10: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Background• TCP incast congestion

• Happen when multiple sending servers under the same ToR switch send to one receiver server simultaneously

• TCP throughput is severely degraded on incast congestion

“Goodput” is thorughput obtained and observed at applicaiotn layer

Page 11: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Background• TCP goodput, receive window and RTT

• A small static TCP receive buffer may prevent TCP incast congestion collaspe → Can’t work dynamically

• Requires either losses or ECN marks to trigger windows decrease

Page 12: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Background• TCP goodput, receive window and RTT

• TCP Vegas: Make the assumption that increase of RTT is only caused by packet queuing at bottleneck buffer.

• Unfortunately, the increase of TCP RTT in high-bandwidth, low-latency does not follow such assumption

Page 13: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Outline• Background• Design Rationale• Algorithm• Implementation• Experimental results• Discussion and conclusion

Page 14: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Design Rationale• Goal

• Improve TCP performance for incast congestion.• No new TCP option or modification to TCP header.

Page 15: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Design Rationale• Three observation which form the base for ICTCP

1) Available bandwidth at receiver side is the signal for receiver to do congestion control.

2) The frequency of receive window based congestion control should be made according to the per-flow feedback-loop independenty

3) A receive window based scheme should adjust the window according to both link congestion status and also application requirement.

• Set a proper receiver window to all TCP connections sharing the same last-hop• Due to the parallel TCP connections may belong to the same job

Page 16: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Outline• Background• Design Rationale• Algorithmn• Implementation• Experimental results• Disscussion and conclusion

Page 17: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Algorithm• Available bandwidth

• C: The link capacity of the interface on receiver server• BWT: Bandwidth of total incoming traffic observed on that interface• : :Parameter to absorb potential oversubscribed during windows

adjustment• BWA: The quota of all incoming connections to increase receive

window for higher throughtput

Page 18: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Algorithm• Available bandwidth

Page 19: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Algorithm• Window adjustment on single connection

• : Incoming measured throughput• : Sample of current throughput (on connection i)

Page 20: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Algorithm• Window adjustment on single connection

• : : Expected throughput • : Receive window of I• We have the max procedure to endure <=

Page 21: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Algorithm• Window adjustment on single connection

• : The ratio of throughput difference of connection i • <= , thus \

Page 22: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Algorithm• Window adjustment on single connection

• We have two thresholds , ( > )to differentiate three case:1) <= or <=

→ increase receive window if in global second sub-slot and having enough quota of available bandwidth

2) → decrease receive window by one MSS^2 if this condtion hold for three continuous RTT

3) Otherwise, keep current receive window• Initiate newly established or long time idle connection in slow start • Go into congestion avoidance when above second and third is met,

or the first case is met but no enough quota

Page 23: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Algorithm• Fairness controller for multiple connections

• Fairness is only considered among low-latency flows• For windows decrease, cut the receive window by MSS^3, for

connections that have receive window larger than average.• For windows increase, be automatically achieved by algorithm we

have talked about.

Page 24: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Outline• Background• Design Rationale• Algorithm• Implementation• Experimental results• Discussion and conclusion

Page 25: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Implement• Develop ICTCP as a NDIS driver on Windows OS.

1) Naturally supports the case for virtual machine2) The incoming throughput in very short time scale can be easily obtained.3) Does not touch TCP/UP implementation in Windows kernel.

Page 26: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Implement

1) Redirect the packet to header parser module2) Packet header is parsed and the information on flow table is updated3) Algorithm module is responsible for receive window calculation4) If a TCP ACK packet is sent out, the header modifier change the

receive window field in TCP header if need.

Page 27: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Implement• Support for Virtual Machines

• The total capacity of virtual NICs is typically configured high than physical as most virtual machine won’t be busy at the same time• The observed virtual link capacity and available bandwidth does not

represent the real value • There are two solution

1) Change the setting to make the total capacity of virtual NICs equal to physical NIC

2) Deploy a ICTCP driver on virtual machine host server

Page 28: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Implement• Obtain fine-grained RTT at receiver

• Define the reverse RTT as the RTT after a exponential filter at the TCP receiver side.

• The reverse RTT can be obtained in data traffic on both side.• The data traffic on reverse direction may not be enough for keep

obtaining live reverse RTT→ Use TCP timestamp• For implement, modify the timestamp counter into 100ns

granularity

Page 29: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Outline• Background• Design Rationale• Algorithm• Implementation• Experimental results• Discussion and conclusion

Page 30: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Experimental results

Page 31: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Experimental results

Page 32: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Experimental results

Page 33: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Experimental results

Page 34: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Experimental results

Page 35: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Experimental results

Page 36: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Experimental results

Page 37: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Experimental results

Page 38: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Outline• Background• Design Rationale• Algorithm• Implementation• Experimental results• Discussion and conclusion

Page 39: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Discussion and Conclusion• Discussion three issues

1) Scalability: if the number of connections become extremely large Switching the receive window between several value

2) How to handle congestion while sender and receiver are not under the same switch

Use ECN to obtain congestion information3) Whether ICTCP works for future high-bandwidth low-latency

network① The switch buffer should be enlarged correspondingly② The MSS should be enlarged.

Page 40: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Discussion and Conclusion• Conclusion

• Focus on receiver based congestion control to prevent packet loss• Adjust TCP receive window on the ratio of difference of achieved

and expected per connection throughput• Experimental results show that ICTCP is effective to avoid

congestion

Page 41: ICTCP:  Incast Congestion Control for  TCP in  Data Center Networks∗

Thanks for listening