lagopus presentation on 14th annual on*vector international photonics workshop

38
0 Copyright©2015 NTT corp. All Rights Reserved. Lagopus: SDN software switch Yoshihiro Nakajima NTT Labs Feb 26 th 2015

Upload: lagopus-sdnopenflow-switch

Post on 14-Jul-2015

394 views

Category:

Software


1 download

TRANSCRIPT

0Copyright©2015 NTT corp. All Rights Reserved.

Lagopus:

SDN software switch

Yoshihiro Nakajima

NTT Labs

Feb 26th 2015

1Copyright©2015 NTT corp. All Rights Reserved.

What is Lagopus (雷鳥属)?

Lagopus is a small genus of birds in

the grouse subfamily, commonly known as ptarmigans.

All living in tundra or cold upland areas.

Reference: http://en.wikipedia.org/wiki/Lagopus

2Copyright©2015 NTT corp. All Rights Reserved.

Agenda

1. Background & motivation

2. Lagopus SDN switchI. vSwitch

II. 40GbE FPGA NIC

III.DPDK extension

IV. OSS community

3. What’s next?

3Copyright©2015 NTT corp. All Rights Reserved.

Trend shift in networking

Closed (Vender lock-in)

Yearly dev cycle

Waterfall dev

Standardization

Protocol

Special purpose HW / appliance

Distributed cntrl

Custom ASIC / FPGA

Open (lock-in free)

Monthly dev cycle

Agile dev

DE fact standard

API

Commodity HW/ Server

Logically centralized cntrl

Merchant Chip

4Copyright©2015 NTT corp. All Rights Reserved.

What is Software-Defined Networking?4

Innovate services and applications in software development speed.

Reference: http://opennetsummit.org/talks/ONS2012/pitt-mon-ons.pdf

Decouple control plane and data planeFree control plane out of the box(OpenFlow is one of APIs)

Logically centralized viewHide and abstract complexity of networks, provide entire view of the network

Programmability via abstraction layerEnables flexible and rapid service/application development

SDN conceptual model

5Copyright©2015 NTT corp. All Rights Reserved.

OpenFlow vs conventional NW node

OpenFlow controller

OpenFlow switch

Data-plane

Flow Table

Control plane(Routing /

swithching)

Data-plane(ASIC, FPGA)

Control plane(routing/ switching)

Flow match action

Flow match action counter

OpenFlow Protocol

Flexible flow match pattern(Port #, VLAN ID, MAC addr, IP addr, TCP port #)

Action (frame processing)(output port, drop, modify/pop/push header)

Flow statistics(# of packet, byte size, flow duration,… )

counter

Conventional NW node OpenFlow

Flow Table#2

Flow Table#3

OpenFlow switch agent

6Copyright©2015 NTT corp. All Rights Reserved.

Why SDN?6

Differentiate services (Innovation)• Increase user experience• Provide unique service which is not in the

market

Time-to-Market(Velocity)•Not to depend on vendors’ feature roadmap•Develop necessary feature when you need

Cost-efficiency•Reduce OPEX by automated workflows•Reduce CAPEX by COTS hardware

7Copyright©2015 NTT corp. All Rights Reserved.

7

Evaluate the benefits of SDN by implementing

our control plane and switch

8Copyright©2015 NTT corp. All Rights Reserved.

Lagopus switch

• Lagopus overview

• Xilinx FPGA 40GbE NIC

• DPDK extension

• OSS Community

9Copyright©2015 NTT corp. All Rights Reserved.

Goal of Lagopus project

Provide NFV/SDN-aware switch software stackOpenFlow switch agent

100Gbps-capable software dataplane and I/O libs

Extensible switch configuration data store

Expand software-based packet processing to carrier networksHardware acceleration, processing offload

10Copyright©2015 NTT corp. All Rights Reserved.

# of packet to be proceeded for 10Gbps

with 1 CPU core

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

14,000,000

16,000,000

0 256 512 768 1024 1280

# o

f p

ac

ke

ts p

er

sec

on

ds

Packet size (Byte)

Short packet 64Byte

14.88 MPPS, 67.2 ns

• 2Ghz: 134 clocks

• 3Ghz: 201 clocks

Computer packet 1KByte

1.2MPPS, 835 ns

• 2Ghz: 1670 clocks

• 3Ghz: 2505 clocks

11Copyright©2015 NTT corp. All Rights Reserved.

# of CPU cycle for typical packet

processing (2Ghz CPU)

0 1000 2000 3000 4000 5000 6000

L2 forwarding

IP routing

L2-L4 classification

TCP termination

Simple OF processing

DPI

# of required CPU cycles

10Gbps 1Gbps

12Copyright©2015 NTT corp. All Rights Reserved.

Lagopus switch

13Copyright©2015 NTT corp. All Rights Reserved.

Lessons learned from existing vswitch

Not enough performance Packet processing speed < 1Gbps

10K flow entry add > two hours

Flow management processing is too heavy

Develop & extension difficulties Many abstraction layer cause confuse for me

• Interface abstraction, switch abstraction, protocol abstraction Packet abstraction…

Many packet processing codes exists for the same processing

• Userspace, Kernelspace,…

Invisible flow entries cause chaos debugging

14Copyright©2015 NTT corp. All Rights Reserved.

vSwitch requirement from user side

Commodity x86 PC server and NIC support

Gateway function to connect different NW domains (MPLS, Access NW)

10Gbps-wire rate with >= 1M flow entries Flow-aware control

Multi flow table configuration

High performance flow entry operation

Userland-based dataplane Easy software upgrade and deploy

Strong management plane supportsCo-operable upper management systems

15Copyright©2015 NTT corp. All Rights Reserved.

Approach for vSwitch development

Simple is better for everything Packet processing for speed

Protocol processing for implement

Straight forward approach Full scratch (No use of existing vSwitch code)

No OpenFlow 1.0 support

User land packet processing as much as possible

Keep kernel code small

Every component & algorithm can be replaced Flow lookup, library, protocol handling

16Copyright©2015 NTT corp. All Rights Reserved.

Lagopus switch design

Simple modular-based designUnified configuration data store

Multiple agent support

Support multiple data-planes

Dataplane abstraction layer

Multi-threaded dataplane Flexible parallel pipeline

• Multi-core CPU-aware design

• Intel DPDK and lock-free libraries

for I/O

17Copyright©2015 NTT corp. All Rights Reserved.

high-performance user-space

packet processing with Intel DPDK

NIC

skb_buf

Ethernet Driver API

Socket API

vswitch

packet

buffer

Dataplane

1. Interrupt

& DMA

2. system call

(read)

User

space

Kernel

space

Driver

4. DMA

3. system call (write)

NIC

Ethernet Driver API

Socket API

vswitch

packet

buffer

agentagent

1. DMA

Write 2. DMA

READ

DPDKdataplane

Userspacepacket processing

(Event-based)

DPDK apps(polling-based)

18Copyright©2015 NTT corp. All Rights Reserved.

Packet processing using multi core CPUs

Exploit many core CPUs Reduce data copy & move (reference access)

Simple packet classifier for parallel processing in I/O RX

Decouple I/O processing and flow processing Improve D-cache efficiency

Explicit thread assign to CPU core

NIC 1RX

NIC 2RX

I/O RX

CPU0

I/O RXCPU1

NIC 1TX

NIC 2TX

I/O TXCPU6

I/O TXCPU7

Flow lookuppacket processing

CPU2

Flow lookuppacket processing

CPU4

Flow lookuppacket processing

CPU3

Flow lookuppacket processing

CPU5

NIC 3RX

NIC 4RX

NIC 3TX

NIC 4

TX

NIC RX buffer

Ring buffer

Ring buffer NIC TX buffer

19Copyright©2015 NTT corp. All Rights Reserved.

Implementation techniques for speed

Reduce overhead Polling-based packet receiving Reduction of # of access in I/O and memory Huge DTLB for $ miss of memory controller

Bypass in packet processing Direct data copy from NIC buffer to CPU $ Kernel stack bypass of network Fast flow lookup and flow $ mechanism

Reduce memory access and leverage locality Packet batching Novel flow lookup algorithm Thread local storage and lockless-queue, RCU

Explicit CPU and thread assignment Hardware-topology aware resource assignment

20Copyright©2015 NTT corp. All Rights Reserved.

Performance Evaluation

Summary Throughput: 10Gbps wire-rate

Flow entries: > 1M flow entires> 4000 flow entries mods /sec

Evaluation modelsWAN-DC gateway

• MPLS-VLAN mapping

L2 switch

• Mac address switching

21Copyright©2015 NTT corp. All Rights Reserved.

WAN-DC Gateway

0

1

2

3

4

5

6

7

8

9

10

0 200 400 600 800 1000 1200 1400 1600

Throughput(Gbps)

Packetsize(byte)

10flowrules

100flowrules

1kflowrules

10kflowrules

100kflowrules

1Mflowrules

Throughput vs packet size, 1 flow, flow-cache

0

1

2

3

4

5

6

7

8

9

10

1 10 100 1000 10000 100000 1000000

Throughput(Gbps)

flows

10kflowrules

100kflowrules

1Mflowrules

Throughput vs flows, 1518 bytes packet

22Copyright©2015 NTT corp. All Rights Reserved.

L2 switch performance (Mbps)

10GbE x 2 (RFC2889 test)

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

LINC OVS

(netdev)

OVS

(kernel)

Lagopus

(software)

Mb

ps

72

128

256

512

1024

1280

1518

Intel xeon E5-2660 (8cores, 16threads),

Intel X520-DA2, DDR3-1600 64GB

Packet size

23Copyright©2015 NTT corp. All Rights Reserved.

OpenFlow 1.3 Conformance Status

Type Action Set field Match Group Meter Total

# of test scenario(mandatory,

optional)

56(3 , 53)

161(0 , 161)

714(108 , 606)

15(3 , 12)

36(0 , 36)

991(114 , 877)

Lagopus

2014.11.09

56

(3, 56)

161

(0, 161)

714

(108, 606)

15

(3, 12)

34

(0, 34)

980

(114, 866)

OVS (kernel)

2014.08.08

34

(3, 31)

96

(0, 96)

534

(108, 426)

6

(3, 3)

0

(0, 0)

670

(114, 556)

OVS (netdev)

2014.11.05

34

(3, 31)

102

(0, 102)

467

(93, 374)

8

(3, 5)

0

(0, 0)

611

(99, 556)

IVS

2015.02.11

17

(3, 14)

46

(0, 46)

323

(108, 229)

3

(0, 2)

0

(0, 0)

402

(111, 291)

ofswitch

2015.01.08

50

(3, 47)

100

(0, 100)

708

(108, 600)

15

(3, 12)

30

(0, 30)

962

(114, 848)

LINC

2015.01.29

24

(3, 21)

68

(0, 68)

428

(108, 320)

3

(3, 0)

4

(0, 4)

523

(114, 409)

Trema

2014.11.28

50

(3, 47)

159

(0 , 159)

708

(108, 600)

15

(3, 12)

34

(0, 34)

966

(114, 854)

http://osrg.github.io/ryu/certification.html

24Copyright©2015 NTT corp. All Rights Reserved.

40GbE FPGA NIC

25Copyright©2015 NTT corp. All Rights Reserved.

Lessoned learned from development

vSwitch needs hardware acceleration for NFV Focus CPU cores for upper-layer packet processing

Encryption, Packet scheduler are CPU heavy processing

vSwitch has several improvement points vSwitch cannot leverage multicore CPU power

Software-based packet scheduler is too heavy

vSwitch cannot protect from ingress DoS attackPerformance

Flexiblity Availability

Existing SDN hardware switch

+ HW

accelerator

High-performance & scalability with HW

advance features with HW

26Copyright©2015 NTT corp. All Rights Reserved.

Lagopus FPGA NIC

HW

SWx86 (Xeon) + DPDK + Frontend API

VNF1@VM

QSFP+

QSFP+

40G TRX

[LOOK-UP]

40G TRX

vSwitch

VNF2@VM

PCIe Gen3 x8 PCIe Gen3 x8

VNF3@VM

Inter-DC (WAN)Intra-DC (East-West)

[SR-IOV DEMUX][LABELING]

[PARSE]

[TAP-CTRL]

・・・

QoS[TM]

FPGA

[DISPATCH]

DMA Queues x32 (duplex)

Frontend

Xilinx 40G/80G QoS NIC (XNIC)

IA server@DC

Traf

fic

Esca

lati

on

Flow Director function for many core

Packet Mirroring function(NFV tap)

27Copyright©2015 NTT corp. All Rights Reserved.

Throughput

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

64 68 256 512 1024 1512

Thro

ug

hp

ut

(Gp

bs)

Frame Size (Bytes)

Throughput on Each Frame Size

Xeon(R) E5-2680 v2 @ 2.80GHz, 10C/20T, DPDK-1.7.0

(1) RX1, TX1, W2

(2) RX1, TX1, W6

(3) RX1, TX1, W2

(4) RX1, TX1, W6

(5) RX2, TX1, W5

Frame Size

(Bytes)

Line Rate

40G (Gbps)

Measured Results of

Xilinx NIC and Lagopus Interwork

RX --> TX Throughput (Gbps)

Flow Director = OFF Flow Director = ON

Core assignment (1) RX1, TX1, W2 (2) RX1, TX1, W6 (3) RX1, TX1, W2 (4) RX1, TX1, W6 (5) RX2, TX1, W5

64 30.47 2.05 4.96 4.20 5.00 8.15

68 30.90 2.17 5.62 4.41 5.69 8.53

256 37.10 8.18 21.68 16.70 21.88 30.32

512 38.49 16.26 38.46 33.50 38.46 38.47

1024 39.23 32.28 39.23 39.23 39.23 39.23

1512 39.47 39.46 39.47 39.46 39.47 39.46

28Copyright©2015 NTT corp. All Rights Reserved.

High performance virtual NIC

for DPDK-enabled NFV apps

29Copyright©2015 NTT corp. All Rights Reserved.

Concept

DPDK is good for performance, but not for operation and management

Performance acceleration for DPDK apps and legacy apps

Simultaneous restart support of DPDK apps and DPDK vswitch

Isolation for security and trust for guest VM and vswitch Trusted guest VM: zero-copy

Non-trusted guest VM: one-copy

Guest1

QEMU

App

DPDK

Guest2

QEMU

App

DPDK

Virtio-net PMD Virtio-net PMD

Lagopusvswitch

DPDK

PMD

Map memory in guest VMto lagopus memory

Map memory in guest VMto lagopus memory

virtio virtio

virtio

qu

eu

e

virtio

qu

eu

e

PMDvNIC

PMDvNIC

30Copyright©2015 NTT corp. All Rights Reserved.

DPDK VNIC extension (virtq-pmd)

4.8 times faster than existing vhost pmd

Support of simultaneous restart of VM gust and

host DPDK apps

0

500

1000

1500

2000

2500

3000

3500

0 500 1000 1500 2000

Re

qu

ire

d #

of

CP

U C

loc

ks

Packet size (bytes)

Transmission overhead per

packet

copy overhead hole PMD

virtq PMD vhost app

Kernel driver

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

0 500 1000 1500 2000

Gbps

COPY OFF COPY ON

Released as OSS on github

31Copyright©2015 NTT corp. All Rights Reserved.

Expand SDN OSS community

32Copyright©2015 NTT corp. All Rights Reserved.

Activity

Hands-on in Tokyo & Taiwan Total over 400 engineers enjoyed United states?

Tokyo hands-on Taiwan hands-on

33Copyright©2015 NTT corp. All Rights Reserved.

PoC in SDN Japan 2014

Location-aware

bandwidth-controlled

Internet WiFi with Ryu

and Lagopus

Good audience can

connect to the Internet

with guaranteed BW Front area sheet, good

connection.

Back area sheet, poor connection

34Copyright©2015 NTT corp. All Rights Reserved.

What’s next?

35Copyright©2015 NTT corp. All Rights Reserved.

M-plane needs innovation!

No sufficient abstraction in M-plane Proprietary CLI, script

Invisibility leads to inefficiency Still legacy (SNMP is neither SIMPLE nor realtime)

plan

Provision & control

monitor

Analyze &

visualize

Plan NW based onNW abstraction model

Provisioning on NW abstraction model

Feedback to NW abstraction model

Think NW based-onNW abstraction model

36Copyright©2015 NTT corp. All Rights Reserved.

Y2013 Y2014 Y2015

Control plane

Management

plane

Data plane

Lagopus development

OF1.3 agent

High-performance

software dataplane

with DPDK

(static pipeline

configuration)

Switch config/mngmt

Db prototype

OF-CONFIG,

CLI, SNMP

prototypes

Flexible reconfigurable

pipeline in dataplane

Tunnel protocol

(VxLAN, GRE,

MPLS pseudo-wire)

40Gbps-capable

Soft dataplane

Extensible switch resource

Management/configuration

datastore

(CLI, OF-CONFIG, SNMP)

OF1.5 agent

Legacy protocol ctrl support

M-plane extension

(Openconfig)

37Copyright©2015 NTT corp. All Rights Reserved.

Thanks!

Web: http://lagopus.github.io/

Twitter: lagopusvswitch