running neutron at scale - gal sagie & eran gampel - openstack day israel 2016
TRANSCRIPT
Eran GampelGal Sagie
Running Neutron At Scale
OpenStack Challenges• Scalability• Networking does not scale (<500 compute nodes)
• Performance• Networking performance is low (namespace overhead, huge
control plane overhead, …)• Operability• Reference implementation has lots of maintenance problems (e.g.
thousands of concurrent DHCP servers, namespaces, etc.)
Network Node
Compute Node
DHCP Namespace
MetadataProxy
Metdata Agent
L2Agent
L3 Agent
OVS
DVRNamespaces
Linux BridgeLinux BridgeLinux Bridge
RPC
VM
Turning This…Controller Node
Compute Node
Network Node
OVS
VM
Controller Node
Into This
Dragonflow
Controller
Dragonflow Highlights • Integral “Big Tent” project in OpenStack• Designed to Scale with High Throughput and Low Latency• Lightweight and Simple• Easily Extensible• Distributed SDN Control Plane Architecture• Focused on advanced networking services• Distributes Policy Level Abstraction to the Compute Nodes
Neutron-Server
Dragonflow Plugin
DBOVSDragonflow
DBDriver
Compute Node
OVSDragonflow
DBDriver
Compute Node
OVSDragonflowDB
Driver
Compute Node
OVSDragonflowDB
Driver
Compute Node
DB
VM VM..VM VM..
VM VM.. VM VM..
Distributed SDN
“Under The Hood”
Compute Node Compute Node Compute Node Dragonflow
Network DB
OVS
NeutronServer
Redis
OVSDB-Server
ETCD RethinkDBRAMCloud
Kernel Datapath Module
NIC
User Space
Kernel Space
DB DriversRedis ETCD RethinkDBRMC
Future
Dragonflow PluginRoute Core
API SG
vswitchd
Container
VM Dragonflow ControllerAbstraction Layer
L2 App L3 AppDHCP App
FWaaS LBaaS …FIP/DNAT
Pluggable DB Layer
NB D
B Dr
iver
s
SB DB Drivers
SmartNIC OVSDB
ZooKeeper
ETCD
RMC
Redis
OpenFlow
SG App
ZooKeeper
Zookeeper
Pub/Sub DriversRedis ØMQ
Current Release Features (Mitaka)L2 core API, IPv4, IPv6
GRE/VxLAN/STT/Geneve tunneling protocols Distributed L3 Virtual RouterDistributed DHCP Pluggable Distributed Database
ETCD, RethinkDB, RAMCloud, Redis, ZooKeeperPluggable Publish-Subscribe
ØMQ, RedisSecurity Groups
OVS Flows leveraging connection tracking integrationDistributed DNATSelective Proactive Distribution
Tenant Based
Pluggable Database FrameworkRequirements
HA + ScalabilityDifferent Environments have different requirements
Performance, Latency, Scalability, etc.
Why Pluggable?Long time to productizeMature Open Source alternativesAllow us to focus on the networking services only
Distributed DB
DB Data 3DB Data 2DB Data 1
Full Distribution
Compute Node 1
DragonflowLocal Cache
OVS
Compute Node NDragonflow
OVS
Local Cache
Dragonflow DB DriversRedis ETCD ZookeeperRMC
DB Data 3DB Data 2DB Data 1
DB Data 3DB Data 2DB Data 1
DistributedDatabase
DB Data 3DB Data 2
DB Data 1
Selective Distribution
Compute Node 1
DragonflowLocal Cache
OVS
DB Data 1
Compute Node NDragonflow
OVS
Local Cache
DB Data 3DB Data 2
Dragonflow DB DriversRedis ETCD ZookeeperRMC
Compute Node
DragonflowLocal Controller
SubscriberRedis ØMQ
Compute Node
DragonflowLocal Controller
SubscriberRedis ØMQ
Compute Node
DragonflowLocal Controller
SubscriberRedis ØMQ
Compute Node
DragonflowLocal Controller
SubscriberRedis ØMQ
Neutron ServerDragonflow
PluginPublisher
Redis ØMQ
Neutron ServerDragonflow
PluginPublisher
Redis ØMQ
Neutron ServerDragonflow
PluginPublisher
Redis ØMQ. . .
DB
. . .
Pluggable Pub/Sub
Example Distributed DHCP
Page 13
Network Node
DHCP namespace
DHCP namespace
DHCP namespace
DHCP namespace
Neutron DHCP Implementation
DHCP namespace
dnsmasq
DHCPAgent
Neutron Server
Message QueueExample• 100 Tenants• 3 vNet / tenant= 300 DHCP Servers
1 VM Send DHCP_DISCOVER
2 Classify Flow as DHCP, Forward to Controller
3 DHCP App sends DHCP_OFFER back to VM
4 VM Send DHCP_REQUEST
5 Classify Flow as DHCP, Forward to Controller
6 DHCP App populates DHCP_OPTIONS from DB/CFG and send DHCP_ACK
Dragonflow Distributed DHCP
DHCP DISCOVER
VM DHCP SERVER
DHCP OFFER DHCPREQUEST
DHCPACK
13
46
7
Compute Node
Dragonflow
VM
OVS
VM
1 2
br-intqvoXXX qvoXXX
OpenFlow
14
25
7
Dragonflow ControllerAbstraction Layer
L2App
L3App
DHCPApp SG
36
Pluggable DB Layer
DistributedDB
This haseverythingin it
Roadmap
Dragonflow ScalabilityScale(# Compute
Nodes)
today
10,000
2,500
Time to Market
n+2
1,000
Dragonflow& Redis
Dragonflow &RAMCloud &
ØMQOptimized Hybrid
(Reactive/Proactive)Dragonflow4,000
n+1soon
RoadmapAdditional DB Drivers ZooKeeper, Redis…Selective Proactive DB Pluggable Pub/Sub Mechanism DB Consistency Distributed DNATSecurity GroupNew ApplicationsHierarchical Port Binding (SDN ToR) move to ML2Containers (Kuryr plugin and nested VM support)Topology Service Injection / Service ChainingInter Cloud Connectivity (Border Gateway / L2GW)Optimize Scale and Performance
Rack
ToR
Overlay Virtual Topology
UnderlayHardware
DB
CoreRouterODL
NeutronServer ML2 Mechanism Drivers
ODLDragonflow
Hierarchical Network Management
Newton Release New Applications IGMP Application Distributed Load Balancing (East/West) Brute Force prevention DNS service Distributed Metadata proxy Port Fault Detection
Dragonflow and Containers
Network Services
My Cloud
NATIPS
FW
DHCP
LBL3
L2VPN
Ride the Dragon!
• Documentation • https://wiki.openstack.org/wiki/Dragonflow
• Bugs & blueprints • https://launchpad.net/dragonflow
• DF IRC channel • #openstack-dragonflow• Weekly on Monday at 0900 UTC in #openstack-meeting-4
(IRC)