running neutron at scale - gal sagie & eran gampel - openstack day israel 2016

23
Eran Gampel Gal Sagie Running Neutron At Scale

Upload: openstack-israel

Post on 12-Jan-2017

171 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Eran GampelGal Sagie

Running Neutron At Scale

Page 2: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

OpenStack Challenges• Scalability• Networking does not scale (<500 compute nodes)

• Performance• Networking performance is low (namespace overhead, huge

control plane overhead, …)• Operability• Reference implementation has lots of maintenance problems (e.g.

thousands of concurrent DHCP servers, namespaces, etc.)

Page 3: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Network Node

Compute Node

DHCP Namespace

MetadataProxy

Metdata Agent

L2Agent

L3 Agent

OVS

DVRNamespaces

Linux BridgeLinux BridgeLinux Bridge

RPC

VM

Turning This…Controller Node

Page 4: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Compute Node

Network Node

OVS

VM

Controller Node

Into This

Dragonflow

Controller

Page 5: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Dragonflow Highlights • Integral “Big Tent” project in OpenStack• Designed to Scale with High Throughput and Low Latency• Lightweight and Simple• Easily Extensible• Distributed SDN Control Plane Architecture• Focused on advanced networking services• Distributes Policy Level Abstraction to the Compute Nodes

Page 6: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Neutron-Server

Dragonflow Plugin

DBOVSDragonflow

DBDriver

Compute Node

OVSDragonflow

DBDriver

Compute Node

OVSDragonflowDB

Driver

Compute Node

OVSDragonflowDB

Driver

Compute Node

DB

VM VM..VM VM..

VM VM.. VM VM..

Distributed SDN

Page 7: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

“Under The Hood”

Compute Node Compute Node Compute Node Dragonflow

Network DB

OVS

NeutronServer

Redis

OVSDB-Server

ETCD RethinkDBRAMCloud

Kernel Datapath Module

NIC

User Space

Kernel Space

DB DriversRedis ETCD RethinkDBRMC

Future

Dragonflow PluginRoute Core

API SG

vswitchd

Container

VM Dragonflow ControllerAbstraction Layer

L2 App L3 AppDHCP App

FWaaS LBaaS …FIP/DNAT

Pluggable DB Layer

NB D

B Dr

iver

s

SB DB Drivers

SmartNIC OVSDB

ZooKeeper

ETCD

RMC

Redis

OpenFlow

SG App

ZooKeeper

Zookeeper

Pub/Sub DriversRedis ØMQ

Page 8: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Current Release Features (Mitaka)L2 core API, IPv4, IPv6

GRE/VxLAN/STT/Geneve tunneling protocols Distributed L3 Virtual RouterDistributed DHCP Pluggable Distributed Database

ETCD, RethinkDB, RAMCloud, Redis, ZooKeeperPluggable Publish-Subscribe

ØMQ, RedisSecurity Groups

OVS Flows leveraging connection tracking integrationDistributed DNATSelective Proactive Distribution

Tenant Based

Page 9: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Pluggable Database FrameworkRequirements

HA + ScalabilityDifferent Environments have different requirements

Performance, Latency, Scalability, etc.

Why Pluggable?Long time to productizeMature Open Source alternativesAllow us to focus on the networking services only

Page 10: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Distributed DB

DB Data 3DB Data 2DB Data 1

Full Distribution

Compute Node 1

DragonflowLocal Cache

OVS

Compute Node NDragonflow

OVS

Local Cache

Dragonflow DB DriversRedis ETCD ZookeeperRMC

DB Data 3DB Data 2DB Data 1

DB Data 3DB Data 2DB Data 1

Page 11: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

DistributedDatabase

DB Data 3DB Data 2

DB Data 1

Selective Distribution

Compute Node 1

DragonflowLocal Cache

OVS

DB Data 1

Compute Node NDragonflow

OVS

Local Cache

DB Data 3DB Data 2

Dragonflow DB DriversRedis ETCD ZookeeperRMC

Page 12: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Compute Node

DragonflowLocal Controller

SubscriberRedis ØMQ

Compute Node

DragonflowLocal Controller

SubscriberRedis ØMQ

Compute Node

DragonflowLocal Controller

SubscriberRedis ØMQ

Compute Node

DragonflowLocal Controller

SubscriberRedis ØMQ

Neutron ServerDragonflow

PluginPublisher

Redis ØMQ

Neutron ServerDragonflow

PluginPublisher

Redis ØMQ

Neutron ServerDragonflow

PluginPublisher

Redis ØMQ. . .

DB

. . .

Pluggable Pub/Sub

Page 13: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Example Distributed DHCP

Page 13

Page 14: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Network Node

DHCP namespace

DHCP namespace

DHCP namespace

DHCP namespace

Neutron DHCP Implementation

DHCP namespace

dnsmasq

DHCPAgent

Neutron Server

Message QueueExample• 100 Tenants• 3 vNet / tenant= 300 DHCP Servers

Page 15: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

1 VM Send DHCP_DISCOVER

2 Classify Flow as DHCP, Forward to Controller

3 DHCP App sends DHCP_OFFER back to VM

4 VM Send DHCP_REQUEST

5 Classify Flow as DHCP, Forward to Controller

6 DHCP App populates DHCP_OPTIONS from DB/CFG and send DHCP_ACK

Dragonflow Distributed DHCP

DHCP DISCOVER

VM DHCP SERVER

DHCP OFFER DHCPREQUEST

DHCPACK

13

46

7

Compute Node

Dragonflow

VM

OVS

VM

1 2

br-intqvoXXX qvoXXX

OpenFlow

14

25

7

Dragonflow ControllerAbstraction Layer

L2App

L3App

DHCPApp SG

36

Pluggable DB Layer

DistributedDB

This haseverythingin it

Page 16: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Roadmap

Page 17: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Dragonflow ScalabilityScale(# Compute

Nodes)

today

10,000

2,500

Time to Market

n+2

1,000

Dragonflow& Redis

Dragonflow &RAMCloud &

ØMQOptimized Hybrid

(Reactive/Proactive)Dragonflow4,000

n+1soon

Page 18: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

RoadmapAdditional DB Drivers ZooKeeper, Redis…Selective Proactive DB Pluggable Pub/Sub Mechanism DB Consistency Distributed DNATSecurity GroupNew ApplicationsHierarchical Port Binding (SDN ToR) move to ML2Containers (Kuryr plugin and nested VM support)Topology Service Injection / Service ChainingInter Cloud Connectivity (Border Gateway / L2GW)Optimize Scale and Performance

Page 19: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Rack

ToR

Overlay Virtual Topology

UnderlayHardware

DB

CoreRouterODL

NeutronServer ML2 Mechanism Drivers

ODLDragonflow

Hierarchical Network Management

Page 20: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Newton Release New Applications IGMP Application Distributed Load Balancing (East/West) Brute Force prevention DNS service Distributed Metadata proxy Port Fault Detection

Page 21: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Dragonflow and Containers

Page 22: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Network Services

My Cloud

NATIPS

FW

DHCP

LBL3

L2VPN

Page 23: Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016

Ride the Dragon!

• Documentation • https://wiki.openstack.org/wiki/Dragonflow

• Bugs & blueprints • https://launchpad.net/dragonflow

• DF IRC channel • #openstack-dragonflow• Weekly on Monday at 0900 UTC in #openstack-meeting-4

(IRC)