devconf2017 - can vms networking benefit from dpdk

Post on 07-Feb-2017

37 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Can VMs networking benefit from DPDK?

Virtio/Vhost-user status & updates

Maxime Coquelin – Victor Kaplansky2017-01-27

2

AGENDACan VMs networking benefit from DPDK?

● Overview● Challenges● New & upcoming features

Overview

4

DPDK – project overviewOverview

DPDK is a set of userspace libraries aimed at fast packet processing.

● Data Plane Development Kit● Goal:

● Benefit from software flexibility● Achieving performance close to dedicated HW solutions

5

DPDK – project overviewOverview

● License: BSD● CPU architectures: x86, Power8, TILE-Gx & ARM● NICs: Intel, Mellanox, Broadcom, Cisco,...● Other HW: Crypto, SCSI for SPDK project● Operating systems: Linux, BSD

6

1st release by Intel asZip file

2012 2013

6Wind intiatesdpdk.orgcommunity

DPDK - project historyOverview

Packaged In Fedora

2014 2015

Power8 &TILE-Gxsupport

ARMSupport /Crypto

2016 2017

Moving toLinux Foundation

v1.2v1.3

v1.4v1.5

v1.6v1.7

v1.8v2.0

v2.1v2.2

v16.04

v16.07

v16.11v17.02

v17.05

v17.08

v17.11

v16.11: ~750K LoC / ~6000 commits / ~350 contributors

7

DPDK – comparisonOverview

UserKernel

NIC

Application

Socket

Net stack

Driver

UserKernel

NIC

Application

DPDK

VFIO

8

DPDK – performanceOverview

DPDK uses:● CPU isolation/partitioning & polling

→ Dedicated CPU cores to poll the device● VFIO/UIO

→ Direct devices registers accesses from user-space● NUMA awareness

● → Resources local to the Poll-Mode Driver’s (PMD) CPU● Hugepages

→ Less TLB misses, no swap

9

DPDK – performanceOverview

To avoid:● Interrupt handling

→ Kernel’s NAPI polling mode is not enough● Context switching● Kernel/user data copies● Syscalls overhead

→ More than the time budget for a 64B packet at 14.88Mpps

10

DPDK - componentsOverview

Credits: Tim O’Driscoll - Intel

11

Virtio/VhostOverview

● Device emulation, direct assignment, VirtIO● Vhost: In-kernel virtio device emulation● device emulation code calls to directly call into kernel

subsystems● Vhost worker thread in host kernel● Bypasses system calls from user to kernel space on host

12

Vhost driver model

VM

QEMU Userspace

Kvm.ko

ioeventfd

irqfd

ioeventfd

Vhost

Host Kernel

13

In-kernel device emulation

● In-kernel restricted to virtqueue emulation● QEMU handles control plane, feature negotiation, migration,

etc● File descriptor polling done by vhost in kernel● Buffers moved between tap device and virtqueues by kernel

worker thread

Overview

14

Vhost as user space interface

● Vhost architecture is not tied to KVM● Backend: Vhost instance in user space ● Eventfd is set up to signal backend when new buffers are

placed by guest (kickfd)● Irqfd is set up to signal the guest about new buffers placed by

backend (callfd)● The beauty: backend only knows about guest memory

mapping, kick eventfd and call eventfd● Vhost-user implemented in DPDK in v16.7

Overview

Challenges

16

PerformanceChallenges

DPDK is about performance, which is a trade-off between: ● Bandwidth → achieving line rate even for small packets● Latency → as low as possible (of course)● CPU utilization → $$$

→ Prefer bandwidth & latency at the expense of CPU utilization

→ Take into account HW architectures as much as possible

17

0% packet-loss• Some use-cases of Virtio cannot afford packet loss, like NFV

• Hard to achieve max perf without loss, as Virtio is CPU intensive

→ Scheduling “glitches” may cause packets drop

Migration• Requires restoration of internal state including the backend

• Interface exposed by QEMU must stay unchanged for cross-version migration

• Interface exposed to guest depends on capabilities of third-party application

• Support from the management tool is required

ReliabilityChallenges

18

• Isolation of untrusted guests• Direct access to device from untrusted guests• Current implementations require mediator for guest -to-

guest communication.• Zero-copy is problematic from security point of view

SecurityChallenges

New & upcoming features

20

Pro:• Allows receiving packets larger

than descriptors’ buffer size

Con:• Introduce extra-cache miss in the

dequeue path

Rx mergeable buffersNew & upcoming features

Desc 0

Desc 1

Desc 2

Desc 3

Desc 4

Desc n

Desc n-1

num_buffers = 1

num_buffers = 3

21

Indirect descriptors (DPDK v16.11)New & upcoming features

Desc 0

Desc 1

Desc 2

Desc 3

Desc 4

Desc n

Desc n-1

Desc 0

Desc 1

Desc 2

Desc 3

Desc 4

Desc n

Desc n-1

iDesc 2-0

iDesc 2-1

iDesc 2-2

iDesc 2-3

Direct descriptors chaining Indirect descriptors table

22

Pros:• Increase ring capacity• Improve performance for large number of large requests• Improve 0% packet loss perf even for small requests

→ If system is not fine-tuned

→ If Virtio headers are in dedicated descriptor

Cons:• One more level of indirection

→ Impacts raw performance (~-3%)

Indirect descriptors (DPDK v16.11)New & upcoming features

23

Vhost dequeue 0-copy (DPDK v16.11)New & upcoming features

addrlenflagsnext

descriptordesc's buf mbuf 's buf

addrpaddrlen...

mbuf

Memcpy

addrlenflagsnext

descriptordesc's buf

addrpaddrlen...

mbuf

Defaultdequeuing

Zero-copydequeuing

24

Pros: • Big perf improvement for standard & large packet sizes

→ More than +50% for VM-to-VM with iperf benchs • Reduces memory footprint

Cons:• Performance degradation for small packets

→ But disabled by default• Only for VM-to-VM using Vhost lib API (No PMD support)• Does not work for VM-to-NIC

→ Mbuf lacks release notif mechanism / No headroom

Vhost dequeue 0-copy (DPDK v16.11)New & upcoming features

25

Way for the host to share its max supported MTU• Can be used to set MTU values across the infra• Can improve performance for small packet

→ If MTU fits in rx buffer size, disable Rx mergeable buffers

→ Save one cache-miss when parsing the virtio-net header

MTU feature (DPDK v17.05?)New & upcoming features

26

Vhost-pci (DPDK v17.05?)New & upcoming features

vSwitch

VM1

Virtio

Vhost

VM2

Virtio

Vhost

VM1

Vho

st-p

ci

VM2

Virtio

Traditional VM to VM

communication

Direct VM to VM

communication

27

Vhost-pci (DPDK v17.05?)New & upcoming features

VM1

Vhost-pci driver 1

VM2

Virtio-pci driver

Vhost-pci device 1

Virtio-pci deviceVhost-pci clientVhost-pci server

Vhost-pci protocol

28

Pros:• Performance improvement

→ The 2 VMs share the same virtqueues

→ Packets doesn’t go through host’s vSwitch• No change needed in Virtio’s guest drivers

Vhost-pci (DPDK v17.05?)New & upcoming features

29

Cons:• Security

→ Vhost-pci’s VM maps all Virtio-pci’s VM memory space

→ Could be solved with IOTLB support• Live migration

→ Not supported in current version

→ Hard to implement as VMs are connected to each other through a socket

Vhost-pci (DPDK v17.05?)New & upcoming features

30

IOTLB in kernelNew & upcoming features

VM

IOMMU driver

VM

QEMU VHOST

IOMMU Device IOTLB

iovaIOTLB miss

IOTLB update, invalidate

TLB invalidation

hva

memory

Syscall/eventfd

31

IOTLB for vhost-userNew & upcoming features

VM

IOMMU driver

VM

QEMU Backend

IOMMU IOTLB Cache

iovaIOTLB miss

IOTLB update, invalidate

TLB invalidation

hva

memory

Unix Socket

32

• DPDK support for VM is in active development• 10M pps is real in VM with DPDK• New features to boost performance of VM networking• Accelerate transition to NFV / SDN

Conclusions

33

Q / A

THANK YOU

top related