pfq@ 10th italian networking workshop (bormio)

Running Monitoring Applica0ons on Accelerated Capture Engines

Nicola Bonelli

N. Bonelli, R.G Garroppo, L. Gazzarrini, S. Giordano, G. Procissi, F. Russo, G. Volpi

Agenda

•  Capture engines overview •  What’s new in PFQ (2.0)

•  Accelerated pcap library – PF_RING, PF_RING+DNA, NETMAP, PFQ

•  Pcap-‐perf: a tool for benchmarking pcap apps

•  Experimental results

Speed maXers…

Accelerated Capture Engine

•  Linux is provided with a default capture engine –  the PF_PACKET socket

•  Because of speed, other capture engines emerged: –  2004: PF_RING

•  designed for single core, beXer performance than the then PF_PACKET

–  2011: PFQ •  first to address mul0-‐core architecture and mul0-‐queues NICs (Best Paper Award @PAM2012)

–  2012: PF_RING-‐DNA •  accelerated drivers (Intel)

–  2012: NetMap •  accelerated drivers (Intel,Broadcom) (Best Paper Award @Usenix ATC’12)

… but what happens on these tracks?

What’s new in PFQ 2.0 •  From capture engine to monitoring framework… •  Improved performance

–  ~14.8 Mpps single user-‐space thread

•  Improved features: –  compliant with a plethora of NICs: pfq-‐oma0c – monitoring groups and classes –  in-‐kernel extensible engine for packet steering: dispatching, copying, cloning, filtering

–  na0ve bindings: C, C++11, Haskell (more to come) –  per-‐group filtering: BFP, vlan (un-‐tagging) –  pcap library

Feature comparison PF_PACKET PF_RING 5.x PF_RING-‐DNA NETMAP -‐ 0813 PFQ 2.0

NIC * *, PF-‐AWARE (Intel, Broadcom)

only Intel 1/10G Intel 1/10G, forcedeth

* accelerated

Driver compat. * yes, non accel. no no yes, dynamic

mul0-‐core -‐ Hardware (RSS) Hardware (RSS) Hardware (RSS) Hw RSS + sog

mul0-‐queue yes (poor) yes yes yes yes

na0ve binding C C C C C, C++11, Haskell, Java, Python

groups -‐ -‐ -‐ -‐ yes

class -‐ -‐ -‐ -‐ yes

concurrent mon. yes yes commercial ? -‐ yes

clustering -‐ yes -‐ -‐ yes (MT, group)

steering -‐ -‐ commercial -‐ yes (MT, group)

STM state -‐ -‐ -‐ -‐ work in progress

Feature comparison PF_PACKET PF_RING 5.x PF_RING-‐DNA NETMAP -‐ 0813 PFQ 2.0

Pcap library yes yes yes buggy/incomplete yes

BPF (filters) yes (MT) yes (MT) yes (user-‐space) -‐ yes (MT, group)

vlan filters -‐ yes yes (hw Intel) -‐ yes (MT, group)

vlan untagging -‐ -‐ -‐ -‐ yes (MT, sog.)

Intel hw filters -‐ yes yes -‐ No

bloom filters -‐ -‐ -‐ -‐ work in progress

Accelerated PCAP library •  Pcap library is the standard de-‐facto interface for packet capture •  Accelerated capture engines provide their own pcap library:

–  Both PF_RING and PF_RING-‐DNA provide a complete accelerated version

–  NetMap provides an experimental and incomplete pcap support •  BPF is missing

•  PFQ provides a complete implementa0on –  PFQ C-‐API mapped over pcap interface wherever possible,

implemented as environment variables otherwise –  Clustering is enabled specifying mul0ple NICs in colon-‐separated

fashion, steering by means of PFQ_STEER variable

PFQ_GROUP=10 PFQ_STEER=ipv4-‐addr tcpdump –n –i eth2:eth3 PFQ_GROUP=10 PFQ_STEER=ipv4-‐addr tcpdump –n –i eth2:eth3

Pcap-‐perf

•  Pcap-‐perf is a C++11 applica0on designed for benchmarking capture engines through pcap interfaces

•  Support for mul0-‐threads, BPF filter and plug-‐ins:

plug-‐in kind

Null packet counter

IP checksum light CPU computa0on

MD5 CPU computa0on

SHA256 heavy CPU computa0on

Bloom Filter memory (linear)

Protocol Classifica0on memory tree

TCP/UDP flow counter memory (std::unordered_set)

Test-‐bed and measurements

•  Intel Xeon 6 cores x5650 @2.67Ghz, 16G Ram + Intel 82599 10G (Debian Wheezy) •  Accelerated drivers

–  PF_RING: ixgbe 3.11.33 PF_RING-‐aware –  PF_RING-‐DNA: ixgbe 3.10.16-‐DNA driver –  Netmap: ixgbe driver shipped with the netmap package –  PFQ: intel ixgbe 3.11.33 vanilla, recompiled through pfq-‐oma0c

•  Best Interrupt affinity (MSI-‐X) –  4 or 5 kernel threads (NAPI) bound to fixed core (RSS), 1 or 2 user-‐space threads bound to

other core(s)

•  Traffic is generated with randomized IP addresses, 64/128 bytes long UDP packets –  using both PF_DIRECT and PF_RING-‐DNA

10 Gb link

mascara monsters

Coun0ng packets is useless

(na0ve speed)

uint64_t counter = 0;!! ! !for(;;)!! ! !{!

! ! !counter++;!! ! !}!

1 thread user-‐space (Intel 10G)

pcap library

Pcap library, 1 thread counter

Pcap, 1 thread counter, BPF=udp

Pcap, 1 thread counter, BPF=hXp || udp

pcap-‐perf

pcap-‐perf with BPF = udp

pcap-‐perf (2 threads)

tcpdump

tcpdump –s 64 –i dev –w /ramdisk/dump.pcap (300M@14.8Mpps)

tcpdump –s 138 –i dev –w /ramdisk/dump.pcap (100M@~8Mpps)

tcpdump –i dev –w /ramdisk/dump.pcap vlan (5 Gbps)

tcpdump –i dev –w /ramdisk/dump.pcap ip host 192.168.0.10 (voip call)

Thanks for the aXen0on!

nicola.bonelli@cnit.it

pfq@ 10th italian networking workshop (bormio)

Engineering

atgender 10th newsletter

10th grade

bormio appartamento in edificio del 1200 in centro storico

10th anniversary tornado

disegno - martina angelini 5b bormio

10th issue

10th november, 2015

10th jan 2013_miyazaki

10th science tam

instruction for colleges regarding admission...

comune di bormio · un futuro per bormio: le scuse...

7 prato allo stelvio - passo stelvio - bormio · 2016. 8....

4newhamak emagazine 10th

dispatch- 10th anniversary

10th month dia

bormio il metodo 2 la ricostruzione tridimensionale vittorio...

segantini. classi prime di bormio a.s. 2006/2007

10th february 2010

10th argcafe(20101123)

4 bormio · 2019. 6. 26. · bormio 1225 m bormio 2000 1952...