lagopus presentation on 14th annual on*vector international photonics workshop
TRANSCRIPT
0Copyright©2015 NTT corp. All Rights Reserved.
Lagopus:
SDN software switch
Yoshihiro Nakajima
NTT Labs
Feb 26th 2015
1Copyright©2015 NTT corp. All Rights Reserved.
What is Lagopus (雷鳥属)?
Lagopus is a small genus of birds in
the grouse subfamily, commonly known as ptarmigans.
All living in tundra or cold upland areas.
Reference: http://en.wikipedia.org/wiki/Lagopus
2Copyright©2015 NTT corp. All Rights Reserved.
Agenda
1. Background & motivation
2. Lagopus SDN switchI. vSwitch
II. 40GbE FPGA NIC
III.DPDK extension
IV. OSS community
3. What’s next?
3Copyright©2015 NTT corp. All Rights Reserved.
Trend shift in networking
Closed (Vender lock-in)
Yearly dev cycle
Waterfall dev
Standardization
Protocol
Special purpose HW / appliance
Distributed cntrl
Custom ASIC / FPGA
Open (lock-in free)
Monthly dev cycle
Agile dev
DE fact standard
API
Commodity HW/ Server
Logically centralized cntrl
Merchant Chip
4Copyright©2015 NTT corp. All Rights Reserved.
What is Software-Defined Networking?4
Innovate services and applications in software development speed.
Reference: http://opennetsummit.org/talks/ONS2012/pitt-mon-ons.pdf
Decouple control plane and data planeFree control plane out of the box(OpenFlow is one of APIs)
Logically centralized viewHide and abstract complexity of networks, provide entire view of the network
Programmability via abstraction layerEnables flexible and rapid service/application development
SDN conceptual model
5Copyright©2015 NTT corp. All Rights Reserved.
OpenFlow vs conventional NW node
OpenFlow controller
OpenFlow switch
Data-plane
Flow Table
Control plane(Routing /
swithching)
Data-plane(ASIC, FPGA)
Control plane(routing/ switching)
Flow match action
Flow match action counter
OpenFlow Protocol
Flexible flow match pattern(Port #, VLAN ID, MAC addr, IP addr, TCP port #)
Action (frame processing)(output port, drop, modify/pop/push header)
Flow statistics(# of packet, byte size, flow duration,… )
counter
Conventional NW node OpenFlow
Flow Table#2
Flow Table#3
OpenFlow switch agent
6Copyright©2015 NTT corp. All Rights Reserved.
Why SDN?6
Differentiate services (Innovation)• Increase user experience• Provide unique service which is not in the
market
Time-to-Market(Velocity)•Not to depend on vendors’ feature roadmap•Develop necessary feature when you need
Cost-efficiency•Reduce OPEX by automated workflows•Reduce CAPEX by COTS hardware
7Copyright©2015 NTT corp. All Rights Reserved.
7
Evaluate the benefits of SDN by implementing
our control plane and switch
8Copyright©2015 NTT corp. All Rights Reserved.
Lagopus switch
• Lagopus overview
• Xilinx FPGA 40GbE NIC
• DPDK extension
• OSS Community
9Copyright©2015 NTT corp. All Rights Reserved.
Goal of Lagopus project
Provide NFV/SDN-aware switch software stackOpenFlow switch agent
100Gbps-capable software dataplane and I/O libs
Extensible switch configuration data store
Expand software-based packet processing to carrier networksHardware acceleration, processing offload
10Copyright©2015 NTT corp. All Rights Reserved.
# of packet to be proceeded for 10Gbps
with 1 CPU core
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
16,000,000
0 256 512 768 1024 1280
# o
f p
ac
ke
ts p
er
sec
on
ds
Packet size (Byte)
Short packet 64Byte
14.88 MPPS, 67.2 ns
• 2Ghz: 134 clocks
• 3Ghz: 201 clocks
Computer packet 1KByte
1.2MPPS, 835 ns
• 2Ghz: 1670 clocks
• 3Ghz: 2505 clocks
11Copyright©2015 NTT corp. All Rights Reserved.
# of CPU cycle for typical packet
processing (2Ghz CPU)
0 1000 2000 3000 4000 5000 6000
L2 forwarding
IP routing
L2-L4 classification
TCP termination
Simple OF processing
DPI
# of required CPU cycles
10Gbps 1Gbps
13Copyright©2015 NTT corp. All Rights Reserved.
Lessons learned from existing vswitch
Not enough performance Packet processing speed < 1Gbps
10K flow entry add > two hours
Flow management processing is too heavy
Develop & extension difficulties Many abstraction layer cause confuse for me
• Interface abstraction, switch abstraction, protocol abstraction Packet abstraction…
Many packet processing codes exists for the same processing
• Userspace, Kernelspace,…
Invisible flow entries cause chaos debugging
14Copyright©2015 NTT corp. All Rights Reserved.
vSwitch requirement from user side
Commodity x86 PC server and NIC support
Gateway function to connect different NW domains (MPLS, Access NW)
10Gbps-wire rate with >= 1M flow entries Flow-aware control
Multi flow table configuration
High performance flow entry operation
Userland-based dataplane Easy software upgrade and deploy
Strong management plane supportsCo-operable upper management systems
15Copyright©2015 NTT corp. All Rights Reserved.
Approach for vSwitch development
Simple is better for everything Packet processing for speed
Protocol processing for implement
Straight forward approach Full scratch (No use of existing vSwitch code)
No OpenFlow 1.0 support
User land packet processing as much as possible
Keep kernel code small
Every component & algorithm can be replaced Flow lookup, library, protocol handling
16Copyright©2015 NTT corp. All Rights Reserved.
Lagopus switch design
Simple modular-based designUnified configuration data store
Multiple agent support
Support multiple data-planes
Dataplane abstraction layer
Multi-threaded dataplane Flexible parallel pipeline
• Multi-core CPU-aware design
• Intel DPDK and lock-free libraries
for I/O
17Copyright©2015 NTT corp. All Rights Reserved.
high-performance user-space
packet processing with Intel DPDK
NIC
skb_buf
Ethernet Driver API
Socket API
vswitch
packet
buffer
Dataplane
1. Interrupt
& DMA
2. system call
(read)
User
space
Kernel
space
Driver
4. DMA
3. system call (write)
NIC
Ethernet Driver API
Socket API
vswitch
packet
buffer
agentagent
1. DMA
Write 2. DMA
READ
DPDKdataplane
Userspacepacket processing
(Event-based)
DPDK apps(polling-based)
18Copyright©2015 NTT corp. All Rights Reserved.
Packet processing using multi core CPUs
Exploit many core CPUs Reduce data copy & move (reference access)
Simple packet classifier for parallel processing in I/O RX
Decouple I/O processing and flow processing Improve D-cache efficiency
Explicit thread assign to CPU core
NIC 1RX
NIC 2RX
I/O RX
CPU0
I/O RXCPU1
NIC 1TX
NIC 2TX
I/O TXCPU6
I/O TXCPU7
Flow lookuppacket processing
CPU2
Flow lookuppacket processing
CPU4
Flow lookuppacket processing
CPU3
Flow lookuppacket processing
CPU5
NIC 3RX
NIC 4RX
NIC 3TX
NIC 4
TX
NIC RX buffer
Ring buffer
Ring buffer NIC TX buffer
19Copyright©2015 NTT corp. All Rights Reserved.
Implementation techniques for speed
Reduce overhead Polling-based packet receiving Reduction of # of access in I/O and memory Huge DTLB for $ miss of memory controller
Bypass in packet processing Direct data copy from NIC buffer to CPU $ Kernel stack bypass of network Fast flow lookup and flow $ mechanism
Reduce memory access and leverage locality Packet batching Novel flow lookup algorithm Thread local storage and lockless-queue, RCU
Explicit CPU and thread assignment Hardware-topology aware resource assignment
20Copyright©2015 NTT corp. All Rights Reserved.
Performance Evaluation
Summary Throughput: 10Gbps wire-rate
Flow entries: > 1M flow entires> 4000 flow entries mods /sec
Evaluation modelsWAN-DC gateway
• MPLS-VLAN mapping
L2 switch
• Mac address switching
21Copyright©2015 NTT corp. All Rights Reserved.
WAN-DC Gateway
0
1
2
3
4
5
6
7
8
9
10
0 200 400 600 800 1000 1200 1400 1600
Throughput(Gbps)
Packetsize(byte)
10flowrules
100flowrules
1kflowrules
10kflowrules
100kflowrules
1Mflowrules
Throughput vs packet size, 1 flow, flow-cache
0
1
2
3
4
5
6
7
8
9
10
1 10 100 1000 10000 100000 1000000
Throughput(Gbps)
flows
10kflowrules
100kflowrules
1Mflowrules
Throughput vs flows, 1518 bytes packet
22Copyright©2015 NTT corp. All Rights Reserved.
L2 switch performance (Mbps)
10GbE x 2 (RFC2889 test)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
LINC OVS
(netdev)
OVS
(kernel)
Lagopus
(software)
Mb
ps
72
128
256
512
1024
1280
1518
Intel xeon E5-2660 (8cores, 16threads),
Intel X520-DA2, DDR3-1600 64GB
Packet size
23Copyright©2015 NTT corp. All Rights Reserved.
OpenFlow 1.3 Conformance Status
Type Action Set field Match Group Meter Total
# of test scenario(mandatory,
optional)
56(3 , 53)
161(0 , 161)
714(108 , 606)
15(3 , 12)
36(0 , 36)
991(114 , 877)
Lagopus
2014.11.09
56
(3, 56)
161
(0, 161)
714
(108, 606)
15
(3, 12)
34
(0, 34)
980
(114, 866)
OVS (kernel)
2014.08.08
34
(3, 31)
96
(0, 96)
534
(108, 426)
6
(3, 3)
0
(0, 0)
670
(114, 556)
OVS (netdev)
2014.11.05
34
(3, 31)
102
(0, 102)
467
(93, 374)
8
(3, 5)
0
(0, 0)
611
(99, 556)
IVS
2015.02.11
17
(3, 14)
46
(0, 46)
323
(108, 229)
3
(0, 2)
0
(0, 0)
402
(111, 291)
ofswitch
2015.01.08
50
(3, 47)
100
(0, 100)
708
(108, 600)
15
(3, 12)
30
(0, 30)
962
(114, 848)
LINC
2015.01.29
24
(3, 21)
68
(0, 68)
428
(108, 320)
3
(3, 0)
4
(0, 4)
523
(114, 409)
Trema
2014.11.28
50
(3, 47)
159
(0 , 159)
708
(108, 600)
15
(3, 12)
34
(0, 34)
966
(114, 854)
http://osrg.github.io/ryu/certification.html
25Copyright©2015 NTT corp. All Rights Reserved.
Lessoned learned from development
vSwitch needs hardware acceleration for NFV Focus CPU cores for upper-layer packet processing
Encryption, Packet scheduler are CPU heavy processing
vSwitch has several improvement points vSwitch cannot leverage multicore CPU power
Software-based packet scheduler is too heavy
vSwitch cannot protect from ingress DoS attackPerformance
Flexiblity Availability
Existing SDN hardware switch
+ HW
accelerator
High-performance & scalability with HW
advance features with HW
26Copyright©2015 NTT corp. All Rights Reserved.
Lagopus FPGA NIC
HW
SWx86 (Xeon) + DPDK + Frontend API
VNF1@VM
QSFP+
QSFP+
40G TRX
[LOOK-UP]
40G TRX
vSwitch
VNF2@VM
PCIe Gen3 x8 PCIe Gen3 x8
VNF3@VM
Inter-DC (WAN)Intra-DC (East-West)
[SR-IOV DEMUX][LABELING]
[PARSE]
[TAP-CTRL]
・・・
QoS[TM]
FPGA
[DISPATCH]
DMA Queues x32 (duplex)
Frontend
Xilinx 40G/80G QoS NIC (XNIC)
IA server@DC
Traf
fic
Esca
lati
on
Flow Director function for many core
Packet Mirroring function(NFV tap)
27Copyright©2015 NTT corp. All Rights Reserved.
Throughput
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
64 68 256 512 1024 1512
Thro
ug
hp
ut
(Gp
bs)
Frame Size (Bytes)
Throughput on Each Frame Size
Xeon(R) E5-2680 v2 @ 2.80GHz, 10C/20T, DPDK-1.7.0
(1) RX1, TX1, W2
(2) RX1, TX1, W6
(3) RX1, TX1, W2
(4) RX1, TX1, W6
(5) RX2, TX1, W5
Frame Size
(Bytes)
Line Rate
40G (Gbps)
Measured Results of
Xilinx NIC and Lagopus Interwork
RX --> TX Throughput (Gbps)
Flow Director = OFF Flow Director = ON
Core assignment (1) RX1, TX1, W2 (2) RX1, TX1, W6 (3) RX1, TX1, W2 (4) RX1, TX1, W6 (5) RX2, TX1, W5
64 30.47 2.05 4.96 4.20 5.00 8.15
68 30.90 2.17 5.62 4.41 5.69 8.53
256 37.10 8.18 21.68 16.70 21.88 30.32
512 38.49 16.26 38.46 33.50 38.46 38.47
1024 39.23 32.28 39.23 39.23 39.23 39.23
1512 39.47 39.46 39.47 39.46 39.47 39.46
28Copyright©2015 NTT corp. All Rights Reserved.
High performance virtual NIC
for DPDK-enabled NFV apps
29Copyright©2015 NTT corp. All Rights Reserved.
Concept
DPDK is good for performance, but not for operation and management
Performance acceleration for DPDK apps and legacy apps
Simultaneous restart support of DPDK apps and DPDK vswitch
Isolation for security and trust for guest VM and vswitch Trusted guest VM: zero-copy
Non-trusted guest VM: one-copy
Guest1
QEMU
App
DPDK
Guest2
QEMU
App
DPDK
Virtio-net PMD Virtio-net PMD
Lagopusvswitch
DPDK
PMD
Map memory in guest VMto lagopus memory
Map memory in guest VMto lagopus memory
virtio virtio
virtio
qu
eu
e
virtio
qu
eu
e
PMDvNIC
PMDvNIC
30Copyright©2015 NTT corp. All Rights Reserved.
DPDK VNIC extension (virtq-pmd)
4.8 times faster than existing vhost pmd
Support of simultaneous restart of VM gust and
host DPDK apps
0
500
1000
1500
2000
2500
3000
3500
0 500 1000 1500 2000
Re
qu
ire
d #
of
CP
U C
loc
ks
Packet size (bytes)
Transmission overhead per
packet
copy overhead hole PMD
virtq PMD vhost app
Kernel driver
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
0 500 1000 1500 2000
Gbps
COPY OFF COPY ON
Released as OSS on github
32Copyright©2015 NTT corp. All Rights Reserved.
Activity
Hands-on in Tokyo & Taiwan Total over 400 engineers enjoyed United states?
Tokyo hands-on Taiwan hands-on
33Copyright©2015 NTT corp. All Rights Reserved.
PoC in SDN Japan 2014
Location-aware
bandwidth-controlled
Internet WiFi with Ryu
and Lagopus
Good audience can
connect to the Internet
with guaranteed BW Front area sheet, good
connection.
Back area sheet, poor connection
35Copyright©2015 NTT corp. All Rights Reserved.
M-plane needs innovation!
No sufficient abstraction in M-plane Proprietary CLI, script
Invisibility leads to inefficiency Still legacy (SNMP is neither SIMPLE nor realtime)
plan
Provision & control
monitor
Analyze &
visualize
Plan NW based onNW abstraction model
Provisioning on NW abstraction model
Feedback to NW abstraction model
Think NW based-onNW abstraction model
36Copyright©2015 NTT corp. All Rights Reserved.
Y2013 Y2014 Y2015
Control plane
Management
plane
Data plane
Lagopus development
OF1.3 agent
High-performance
software dataplane
with DPDK
(static pipeline
configuration)
Switch config/mngmt
Db prototype
OF-CONFIG,
CLI, SNMP
prototypes
Flexible reconfigurable
pipeline in dataplane
Tunnel protocol
(VxLAN, GRE,
MPLS pseudo-wire)
40Gbps-capable
Soft dataplane
Extensible switch resource
Management/configuration
datastore
(CLI, OF-CONFIG, SNMP)
OF1.5 agent
Legacy protocol ctrl support
M-plane extension
(Openconfig)