open stack advanced_part

42
Network internals (advanced parts) Giuliano Santandrea – CIRI ICT University of Bologna

Upload: lilliput12

Post on 13-Jul-2015

398 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Open stack advanced_part

Network internals (advanced parts)

Giuliano Santandrea – CIRI ICT

University of Bologna

Page 2: Open stack advanced_part

● Internal-external VLAN translation

● packet captures

● Security groups

● routing

Page 3: Open stack advanced_part

During the VM creation these elements are created in the compute node:◦ qbrZZZ: LB and its mgmt interface

◦ qvbZZZ: veth pair connected to the LB

◦ qvoZZZ: veth pair connected to the OVS bridge “br-int”

◦ tapZZZ: tap interface, connected to the LB

ZZZ: first 11 letters of the Neutron "port" for the VM interface

Page 4: Open stack advanced_part

Subnet creation (network node):◦ tap-YYY: tap interface connected to br-int, inside a

network namespace (YYY are the first 11 letters of the "port" of the DHCP server)

Router creation (network node):◦ tap-AAA tap interface connected to br-ex, inside a

network namespace (AAA are the first 11 letters of the "port" of the router gateway)

◦ tap-BBB tap interface connected to br-int, inside a network namespace (BBB are the first 11 letters of the "port" of the router internal port)

Page 5: Open stack advanced_part

On the physical data network many network virtualization technologies are possibile (VLAN,VXLAN,GRE,..).

Internally OS maps each virtual network to an internal VLAN

The Cesena cluster uses the VLANs. The bridgesin the VNI are configured to do the translationbetween external-internal VLANs

Other example: GRE encapsulation◦ for packets directed to the data network, the bridges

remove internal VLAN tags and encapsulate them with a a tunnel_id

Page 6: Open stack advanced_part

public net

gateway

External net

Mgmt net

Data net

CPU node 1Controller Network node

br-data

br-int

linux

bridge

VM

br-data

br-int

br-ex

br-data

br-int

Internet

Untagged

internal VLAN tag

External VLAN tag

untagged

Page 7: Open stack advanced_part

specificroutingtables

dhcp server

dhcp server

Network namespaces

No traffichere

Page 8: Open stack advanced_part

VMeth0

VLAN access port-based(internal VLAN)

Trunk all

Page 9: Open stack advanced_part

Trunk allVLAN access port-based(internal VLAN)

Page 10: Open stack advanced_part

VMeth0

TCAM (OpenFlow rules):priority=4,in_port=8,dl_vlan=1 actions=mod_vlan_vid:1000,NORMALpriority=2,in_port=8 actions=droppriority=1 actions=NORMAL

• For all packets coming from phy-br-data and tag=1: changetag=1000, then do classic MAC Learning Switching (MLS)• Discard packets coming from phy-br-data • Otherwise MLS (least priority)

VLAN 1 => 1000

Page 11: Open stack advanced_part

VMeth0

priority=3,in_port=17,dl_vlan=1000 actions=mod_vlan_vid:1,NORMALpriority=2,in_port=17 actions=droppriority=1 actions=NORMAL

VLAN1000 => 1

Page 12: Open stack advanced_part
Page 13: Open stack advanced_part

VMeth0

untagged

Internal VLAN tag

external VLAN tag

External VLAN tag

Page 14: Open stack advanced_part

Internal VLAN tag

external VLAN tag

External VLAN tag

No traffic here!No traffic here!

untagged

Page 15: Open stack advanced_part

On the VM we send an Ethernet frame (ARP) in broadcast:◦ sudo arping –bI eth0 10.0.0.9

broadcast allows to bypass MAC learning of the bridges: each bridge will forward theframe to every port!

On the cluster node:◦ tcpdump –nnvei interface

Or if in a netns:◦ sudo ip netns exec <netns> bash #enter in the netns

◦ tcpdump –nnvlei interface #flush with -l

Page 16: Open stack advanced_part
Page 17: Open stack advanced_part

stack@hc01:~/devstack$ sudo tcpdump -nnvei qvb71cbe0bd-6f

tcpdump: WARNING: qvb71cbe0bd-6f: no IPv4 address assigned

tcpdump: listening on qvb71cbe0bd-6f, link-type EN10MB (Ethernet), capture

size 65535 bytes

18:44:23.752905 fa:16:3e:e6:9e:f8 > ff:ff:ff:ff:ff:ff, ethertype ARP

(0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has

10.0.0.9 (ff:ff:ff:ff:ff:ff) tell 10.0.0.66, length 28

18:44:24.752998 fa:16:3e:e6:9e:f8 > ff:ff:ff:ff:ff:ff, ethertype ARP

(0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has

10.0.0.9 (ff:ff:ff:ff:ff:ff) tell 10.0.0.66, length 28

^C

2 packets captured

2 packets received by filter

0 packets dropped by kernel

NO VLAN TAG!!!

Page 18: Open stack advanced_part
Page 19: Open stack advanced_part

root@hc01:/opt/stack# sudo tcpdump -nnvei int-br-data

tcpdump: WARNING: int-br-data: no IPv4 address assigned

tcpdump: listening on int-br-data, link-type EN10MB (Ethernet), capture size 65535 bytes

18:46:41.212436 fa:16:3e:e6:9e:f8 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 1, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Requestwho-has 10.0.0.9 (ff:ff:ff:ff:ff:ff) tell 10.0.0.66, length 28

18:46:42.212633 fa:16:3e:e6:9e:f8 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 1, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Requestwho-has 10.0.0.9 (ff:ff:ff:ff:ff:ff) tell 10.0.0.66, length 28

^C

2 packets captured

2 packets received by filter

0 packets dropped by kernel

Page 20: Open stack advanced_part
Page 21: Open stack advanced_part

root@hc01:/opt/stack# sudo tcpdump -nnvei eth0

tcpdump: WARNING: eth0: no IPv4 address assigned

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes

18:49:57.241431 fa:16:3e:e6:9e:f8 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 1000, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.9 (ff:ff:ff:ff:ff:ff) tell 10.0.0.66, length28

18:49:58.020910 d0:7e:28:90:d9:4b > 01:80:c2:00:00:00, 802.3, length 64: LLC, dsap STP (0x42) Individual, ssap STP (0x42) Command, ctrl 0x03: STP 802.1w, Rapid STP, Flags [Forward], bridge-id 8000.d0:7e:28:90:d9:3d.800d, length 47

message-age 0.00s, max-age 20.00s, hello-time 2.00s, forwarding-delay15.00s

root-id 8000.d0:7e:28:90:d9:3d, root-pathcost 0, port-role Designated

18:49:58.241620 fa:16:3e:e6:9e:f8 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 1000, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.9 (ff:ff:ff:ff:ff:ff) tell 10.0.0.66, length28

^C

3 packets captured

3 packets received by filter

0 packets dropped by kernel

Page 22: Open stack advanced_part
Page 23: Open stack advanced_part

root@hc01:~# tcpdump -nnvei tap8356e24c-67

tcpdump: listening on tap8356e24c-67, link-type EN10MB (Ethernet), capture

size 65535 bytes

^C11:39:28.424480 fa:16:3e:e6:9e:f8 > ff:ff:ff:ff:ff:ff, ethertype ARP

(0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has

10.0.0.9 (ff:ff:ff:ff:ff:ff) tell 10.0.0.66, length 28

11:39:29.424638 fa:16:3e:e6:9e:f8 > ff:ff:ff:ff:ff:ff, ethertype ARP

(0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has

10.0.0.9 (ff:ff:ff:ff:ff:ff) tell 10.0.0.66, length 28

11:39:30.424733 fa:16:3e:e6:9e:f8 > ff:ff:ff:ff:ff:ff, ethertype ARP

(0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has

10.0.0.9 (ff:ff:ff:ff:ff:ff) tell 10.0.0.66, length 28

3 packets captured

3 packets received by filter

0 packets dropped by kernel

NO VLAN TAG!!!

Page 24: Open stack advanced_part
Page 25: Open stack advanced_part

Secgroup: contains firewall rules configured by the user (atthe cloud platform level)◦ During the VM creation we associate one or more secgroups

It is «default deny», we can add rules to allow ingresstraffic

In the default secgroup there are already rules allowingegress traffic

Implementation: iptables rules on the CPU node

Note: it’s implemented by Neutron applying the native kernel filtering functions (netfilter) to bridged tap interfaces, and this works only with LBs. For this reason an additional LB is needed as an intermediate element to interconnect the tap interface to the integration bridge.

Page 26: Open stack advanced_part

Iptables rules (global namespace) on the linuxbridge port

VMeth0

Page 27: Open stack advanced_part

We have enabled ssh and pingin Ingress

Page 28: Open stack advanced_part

For all packets entering the LB, passing through the tap (outbound VM traffic, EGRESS), use the following chains (iptables filter table in the global netns of the compute node):

neutron-openvswi-sg-chain

neutron-openvswi-oXXX

neutron-openvswi-FORWARD

FORWARD

Source: http://goo.gl/lD30Vl

VMeth0

Page 29: Open stack advanced_part

… exiting the LB (inbound traffic,INGRESS)…:

neutron-openvswi-sg-chain

neutron-openvswi-iXXX

neutron-openvswi-FORWARD

FORWARD

Source: http://goo.gl/lD30Vl

VMeth0

Page 30: Open stack advanced_part

We enabled ssh (TCP porta 22) and ping (ICMP), wecan see these rules:

-A neutron-openvswi-sg-chain -m physdev --physdev-out tapb5d4535b-8f --physdev-is-bridged -j neutron-openvswi-ib5d4535b-8

-A neutron-openvswi-ib5d4535b-8 -m state --state INVALID -j DROP-A neutron-openvswi-ib5d4535b-8 -m state --state RELATED,ESTABLISHED -j

RETURN-A neutron-openvswi-ib5d4535b-8 -p tcp -m tcp --dport 22 -j RETURN-A neutron-openvswi-ib5d4535b-8 -p icmp -j RETURN-A neutron-openvswi-ib5d4535b-8 -s 192.168.101.2/32 -p udp -m udp --sport

67 --dport 68 -j RETURN-A neutron-openvswi-ib5d4535b-8 -j neutron-openvswi-sg-fallback

Page 31: Open stack advanced_part
Page 32: Open stack advanced_part
Page 33: Open stack advanced_part
Page 34: Open stack advanced_part

The admin creates a provider network with the allocation pool 10.250.0.50-10.250.0.70 (20 addresses)

It is attached to a virtual router

The virtual router is attached to a user private network

The router ◦ has an address on the provider network (10.250.0.50)

◦ has an address on the user network (192.168.101.1)

◦ acts as a NAT

Page 35: Open stack advanced_part
Page 36: Open stack advanced_part
Page 37: Open stack advanced_part
Page 38: Open stack advanced_part

sudo ip netns exec qrouter-XXX bash

ip address show1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

inet6 ::1/128 scope host

valid_lft forever preferred_lft forever

27: qr-8110d0f8-64: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN

link/ether fa:16:3e:98:f7:dd brd ff:ff:ff:ff:ff:ff

inet 192.168.101.1/24 brd 192.168.101.255 scope global qr-8110d0f8-64

inet6 fe80::f816:3eff:fe98:f7dd/64 scope link

valid_lft forever preferred_lft forever

29: qg-64643e7a-3e: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN

link/ether fa:16:3e:e9:46:1a brd ff:ff:ff:ff:ff:ff

inet 10.250.0.50/24 brd 10.250.0.255 scope global qg-64643e7a-3e

inet6 fe80::f816:3eff:fee9:461a/64 scope link

valid_lft forever preferred_lft forever

Page 39: Open stack advanced_part

ip route showdefault via 10.250.0.3 dev qg-64643e7a-3e

10.250.0.0/24 dev qg-64643e7a-3e proto kernel scope link src 10.250.0.50

192.168.101.0/24 dev qr-8110d0f8-64 proto kernel scope link src 192.168.101.1

sudo iptables -t nat –nvL

Chain neutron-l3-agent-snat (1 references)

pkts bytes target prot opt in out source destination

12 882 neutron-l3-agent-float-snat all -- * * 0.0.0.0/0 0.0.0.0/0

6 426 SNAT all -- * * 192.168.101.0/24 0.0.0.0/0 to:10.250.0.50

Page 40: Open stack advanced_part

In some old OS docs «br-data» is called «br-ethX»

Using GRE tunnel, bridge br-data is called br-tun

Page 41: Open stack advanced_part

Provider network can be currently createdonly via CLI◦ The creation of a provider network require to

specify the physical network (mapped to a virtualbridge, conneted to a physical network)

The netns/dhcp server are not implementedat the their definition time, but only when a VM on that network is created

Page 42: Open stack advanced_part

To ensure connectivity to a VM:

● The tenant user that booted the VM must have enabled the access by inserting the appropriate rules in the secgroups and then attaching the secgroup to the VM

● neutron-plugin must have inserted the correct OpenFlow rules in the OVS bridges (br-int, br-data, br-ex)

● The “dnsmasq” linux process (managed by neutron-dhcp) must be working properly as DHCP server for the VM