network and tcp performance relationship workshop

33
TWNOG WORKSHOP 2010/7/2, Taipei 網網網網網網網網網網 網網 (Troubleshoot ing) 網網網網 網網網 TCP 網網網網網網 網網網網網網網網 網網網 CCIE/JNCIE kaeatforum [at] gmail.com

Upload: kae-hsu

Post on 19-Dec-2014

1.858 views

Category:

Technology


4 download

DESCRIPTION

Slide in TWNIC 14th OPM TWNOG Workshop. Date: July 2, 2010.

TRANSCRIPT

Page 1: Network and TCP performance relationship workshop

TWNOG WORKSHOP2010/7/2, Taipei

網路維運常見問題原因、偵錯 (Troubleshooting)技術解析

網路與 TCP效能關聯探討

智匯亞洲有限公司許至凱 CCIE/JNCIE

kaeatforum [at] gmail.com

Page 2: Network and TCP performance relationship workshop

2010.7.2 2TWNOG WORKSHOP 2010/7/2, Taipei

Objects

• 對象:網路設備操作、維運人員• 了解有那些網路環境因子會對於 TCP效能造成影響,以連結網路維運與網路應用程式效能,做為網路環境改善方式的參考。– 了解 TCP運作原理– 那些網路事件發生時將影響 TCP效能表現?– 因應對策

Page 3: Network and TCP performance relationship workshop

2010.7.2 3TWNOG WORKSHOP 2010/7/2, Taipei 3

Agenda

• TCP Briefing• TCP Performance Factors• Network Event Impact• Improvement – Network approach• Improvement – Appliance approach• Reference

Page 4: Network and TCP performance relationship workshop

2010.7.2 4TWNOG WORKSHOP 2010/7/2, Taipei

TCP Briefing

• TCP/IP stack in a computer system– Linux

Application

Socket Layer

(net/socket.c)

Inet Layer

(net/ipv4/af_inte.c)

IP Layer (various ip files in net/ipv4)

TCP Layer

(net/ipv4/tcp.c)

UDP Layer

(net/ipv4/udp.c)

Ethernet Device Driver

Ethernet

Card

Other

Drivers

Parallel/Serial/Other

Interface Drivers

Page 5: Network and TCP performance relationship workshop

2010.7.2 5TWNOG WORKSHOP 2010/7/2, Taipei

TCP Briefing

• TCP/IP stack in a computer system– Windows

TCP/IP Stack (Tcpip.sys)

Windows Sockets Applications

Windows Sockets

AFDWSK Clients

WSK

NetBT and other TDI clients

TDI

TDX

TCP UDP RAW

IPv6IPv4

802.3 PPP 802.11 LoopbackIPv4

Tunnel

NDIS

User

Kernel

Page 6: Network and TCP performance relationship workshop

2010.7.2 6TWNOG WORKSHOP 2010/7/2, Taipei

TCP Briefing

• TCP/IP position in computer and network environment

Page 7: Network and TCP performance relationship workshop

2010.7.2 7TWNOG WORKSHOP 2010/7/2, Taipei

TCP Briefing

• TCP header format (RFC793)

Page 8: Network and TCP performance relationship workshop

2010.7.2 8TWNOG WORKSHOP 2010/7/2, Taipei

TCP Briefing

• TCP header format (updated by RFC3168)

Page 9: Network and TCP performance relationship workshop

2010.7.2 9TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

• TCP Performance Factors– Monitoring Tools

– Flow control

– Congestion control

Page 10: Network and TCP performance relationship workshop

2010.7.2 10TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

– Measurement tools• Monitoring tools

– tcpdump» On Windows platform - Wireshark

– tcpstat

• Benchmarking tools– ttcp

– Netperf

– NetPIPE

– DBS (Distributed Benchmark System)

Page 11: Network and TCP performance relationship workshop

2010.7.2 11TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

– Flow control• Sliding Window (window size = 6 in the example)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 Step 1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 Step 2

Step 3

Step 4

Time

已收到ACK等待 ACK中

可傳送區間

不可傳送區間

0 1 2 3 4 5 6 7 8 9 10 11 12 13

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Page 12: Network and TCP performance relationship workshop

2010.7.2 12TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

– Flow control• Window Size

Adjustment– “Receiver

window size filed” in TCP header

Page 13: Network and TCP performance relationship workshop

2010.7.2 13TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

– Congestion Control• Flow control讓接收端控制進入之流量,避免 buffer overflow情況發生

– 藉由 AdvertisedWindow調整發送端 window size

– 無法反應網路連線狀況» 無法避免所經網路是否有類似 buffer overflow情況發生

• 為能偵測可能的網路壅塞, TCP使用 Congestion control。– 藉由 CongestionWindow (cwnd)來進行調整

• Congestion control主要含四種方式 (RFC5681):– Slow start

– Congestion avoidance

– Fast retransmit

– Fast recovery

Page 14: Network and TCP performance relationship workshop

2010.7.2 14TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

• Slow start– TCP connection剛建立時,使用小的 window size。等到收到ACK後再慢慢增加。

» cwnd初始值為 1» 旨在偵測網路頻寬狀況

– 每收到 1 個 ACK 則 cwnd+1» 如此一來,每經過一個 round-

trip time (RTT) , cwnd的值則變成上一次 RTT的兩倍

» 指數成長– 為避免 cwnd增加太快,俟

cwnd超過” slow start threshold, ssthresh”後,每一 RTT只增加1

» 線性成長

Page 15: Network and TCP performance relationship workshop

2010.7.2 15TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

• Congestion avoidance– 在此階段 :

» cwnd > ssthresh» cwnd + 1 for each RTT

– 當有 packet loss發生時,則 :

» ssthresh -> cwnd/2» cwnd -> 1» packet retransmission

– 一旦 packet loss發生時, TCP Performance將受到嚴重影響。

Page 16: Network and TCP performance relationship workshop

2010.7.2 16TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

• Slow start & Congestion avoidance characteristic

Page 17: Network and TCP performance relationship workshop

2010.7.2 17TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

• Fast retransmit (Tahoe)– 仍套用 slow start + congestion avoidance

– sender收到 3 個 duplicate ACK後即重新傳送封包» 避免 sender timeout後,因必須調整 ssthreh/cwnd造成 TCP效能嚴重下降

• Fast recovery (Reno)– 先套用 fast retransmit

» 收到 duplicate封包後即進入 congestion avoidance

– 再執行 fast recovery» ssthresh -> cwnd/2» 重送封包» cwnd -> ssthresh + 3

• NewReno, SACK, Vegas…..– 都在 TCP端進行效能改善

Page 18: Network and TCP performance relationship workshop

2010.7.2 18TWNOG WORKSHOP 2010/7/2, Taipei

Network Event Impact

• Packet loss– By TCP congestion control, packet loss will launch TCP

retransmission• 儘管 TCP congestion control做的再好, packet loss都會造

成 TCP Performance downgrade

Page 19: Network and TCP performance relationship workshop

2010.7.2 19TWNOG WORKSHOP 2010/7/2, Taipei

Network Event Impact

• Packet out-of-order– Packet out-of-order 時 , 雖然 TCP能夠將封包組回 , 但若

TCP fast recovery作用時反可能會造成資源浪費• Reno在收到 duplicate ACK後即會開始重送封包,直到收到

Partial ACK後才停止。– 若 packet只是慢點到而不是不到,則 sender勢必會重傳不需要重傳的封包,造成資源浪費。

• NewReno為改善 Reno的效率,會在收到 Final ACK後才停止重傳遺失封包。

– NewReno會重覆送的封包數量有可能比 Reno還多。

Page 20: Network and TCP performance relationship workshop

2010.7.2 20TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Network approach

• Reduce packet loss– Packet loss 對 TCP Performance影響很大,網路環境中所

有 packet loss都應儘量排除。– Layer 1, layer 2 error

• Unqualified physical media– CRC, P3 error etc…

– Layer 3• Router/Switch hardware or software error

– Congestion– Reduce congestion impact by QoS deployment

– Avoid packet drop for high sensitive TCP application

Page 21: Network and TCP performance relationship workshop

2010.7.2 21TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Network approach

– Packet forward process without QoS• Tail-drop

– 網路設備 hardware queue因線路擁塞而被佔滿,在無法容納更多待傳送封包後直接將待傳送封包丟棄。

– Hardware queue無法判斷 packet priority,一但發生 queue塞滿的情況時則無差別的將封包丟棄。

» 此類情況即為 Tail-drop

– 要儘量避免發生 Tail-drop情況。

Page 22: Network and TCP performance relationship workshop

2010.7.2 22TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Network approach

– Packet forward process with QoS• 先使用不同的 logical queue來存放 priority不同的封包,再置

入 h/w queue中。在 H/W queue塞滿之前,主動丟棄某些暫存於 low priority queue的封包,防止 Tail-drop情況發生。

– RED – Random Early Detection

– WRED – Weighted Random Early Detection

Page 23: Network and TCP performance relationship workshop

2010.7.2 23TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Network approach

• Reduce out-of-order packets– 避免同一 TCP session走在不同的 path上

• Per-packet load-sharing– Load-sharing by destination IP only

• Per-flow load-sharing– Load-sharing by IP packet hash value. Hash index includes:

» Source IP 、 Destination IP» Protocol» Source Port 、 Destination Port

– 有著相同 hash值的封包會走相同的 next-hop interface,避免packet out-of-order情況發生。

– TCP實作 Selective Acknowledgements• RFC2018• RFC2883

Page 24: Network and TCP performance relationship workshop

2010.7.2 24TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• Operating System has to handle TCP session routine– It’s CPU/Memory dependent

• Huge TCP session will occupy system resource like CPU cycles and memory utilization, and shrink the real service processes in asking CPU/Memory

• Reduce system resource consumption in TCP session handling– TCP Offload

– TCP Optimization

Page 25: Network and TCP performance relationship workshop

2010.7.2 25TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Offload– Migrate TCP handling out of

kernel• Use dedicate hardware to

handle TCP• Save system resource for

real service processes

– TOE (TCP Offload Engine) NIC

• Handle TCP/IP on NIC

Page 26: Network and TCP performance relationship workshop

2010.7.2 26TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Offload– NIC w/o TOE and NIC w/ TOE comparison

Page 27: Network and TCP performance relationship workshop

2010.7.2 27TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Offload– TOE is wide deployed in iSCSI environment

• iSCSI:

Page 28: Network and TCP performance relationship workshop

2010.7.2 28TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Optimization– Migrate huge TCP session out of system

– For any TCP session, 3-way handshaking and 4-way handshaking is necessary

• 3-way handshaking for TCP connection establishment• 4-way handshaking for TCP connection termination

– Reduce TCP connection number will reduce connection “overhead”

• Deploy dedicate hardware in the front of servers

Page 29: Network and TCP performance relationship workshop

2010.7.2 29TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Optimization– Regular TCP connection

Client ServerSYN

ACK

SYN+ACK

GET

FIN

ACK

ACK

Data

DataData

FIN

Page 30: Network and TCP performance relationship workshop

2010.7.2 30TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Optimization– Reduce server TCP connection number

• Only ONE 3-way handshaking is necessary in early stage

Client ServerTCP ProxySYN

ACK

SYN+ACK

GET

FINACK

ACK

Data

DataData

GET

Data

DataData

FIN

Page 31: Network and TCP performance relationship workshop

2010.7.2 31TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Optimization– 現實環境中很少僅用來改善 TCP效能

• 多搭配其它功能• L4~L7 load-balance

– 由於 Client TCP connection end-to-end是建立在 TCP Proxy上,更多其它功能可以被加入

• SSL加速• Reverse cache

Page 32: Network and TCP performance relationship workshop

2010.7.2 32TWNOG WORKSHOP 2010/7/2, Taipei

Reference

• Books– High-Speed Networks and Internets – Performance and Quality

of Service, 2nd Ed.• By William Stallings; Prentice Hall

– High Performance TCP/IP Networking – Concepts, Issues and Solutions

• By Mahbub Hassan and Raj Jain; Pearson Prentice Hall

– TCP/IP Illustrated, Volume 1• By W. Richard Stevens; Addison Wesley

• Articles– TCP Performance

• By Geoff Huston; The Internet Protocol Journal - Volume 3, No. 2

– A very good “sliding window” description• http://www.it.uu.se/edu/course/homepage/datakom/

civinght04/schema/sliding_window.pps

Page 33: Network and TCP performance relationship workshop

2010.7.2 33TWNOG WORKSHOP 2010/7/2, Taipei

Q & A