* distributed system lab 1 analysis and experimental
TRANSCRIPT
04/13/23 Distributed System Lab 1
Analysis and experimental evaluation of data plane virtualization with Xen
游清權
04/13/23 Distributed System Lab 2
Outline• Introduction
• Virtual Network with Xen– Data path in Xen– Routers data plane virtualization with Xen– Performance problem statement
• Experiments and Analysis
• Related work
• Conclusion and perspectives
04/13/23 Distributed System Lab 3
Introduction
• System virtualization– Isolation– Mobility– Dynamic reconfiguration – Fault tolerance of distributed systems– Increase security due to the isolation
Introduction
• Virtualization could potentially solve main issues of the actual Internet (security, mobility, eliability,configurability)
– Overhead due to the additional layers
• Considering this sharing of resources like the – network interfaces – the processors – the memory(buffer space) – the switching fabric
• It is a challenge to get a predictable, stable and optimal performance!
04/13/23 Distributed System Lab 4
04/13/23 Distributed System Lab 5
Virtual Network with Xen
• Data path in Xen
04/13/23 Distributed System Lab 6
Data path in Xen
• VMs in Xen access the network hardware through the virtualization layer
• domU has a virtual interface for each physical network interface
• Virtual interface be accessed via a split device driver(frontend driver in domU , the backend driver in dom0)
1.Data path in Xen1. Network packets emitted on a VM
2. Copied to a segment of shared memory by the Xen hypervisor , transmitted to dom0
3. Packets are bridged (path 1), routed (path 2) between the virtual interfaces and the physical ones
• The additional path a packet (dashed line)• Overhead:
– Copy the shared memory,– multiplexing and demultiplexing
04/13/23 Distributed System Lab 7
2.Routers data plane virtualization with Xen
• Xen can be used for fully (i. e. control plane and data plane) virtualized software routers
• Figure 2 .Architecture with software routers uploaded into two virtual machines to create virtual routers
• VM not direct access to the physical hardware interfaces
• Packets are forwarded between the virtual interface and corresponding physical interface (multiplexing and demultiplexing)
04/13/23 Distributed System Lab 8
•
04/13/23 Distributed System Lab 9
3.Performance problem statement
04/13/23 Distributed System Lab 10
3.Performance problem statement
• Define the efficiency in terms of throughput
• Fairness of the inter-virtual machine resource sharing is derived from the classical Jain index[6]
• n : Number of VMs sharing the physical resources• Xi: The metric achieved by each virtual machine I
04/13/23 Distributed System Lab 11
Experiments and Analysis
• 1.Experimental setup• All executed on the fully controlled, reservable and
reconfigurable French national testbed Grid’5000 [4].
• End-hosts are IBM eServers 325 – With 2 CPUs AMD Opteron 246 (2.0 GHz/1MB) – With one core each one – 2GB of memory and a 1Gb/s NIC.
04/13/23 Distributed System Lab 12
Experiments and Analysis• Virtual routers host on IBM eServers 326m
– With 2 CPUs AMD Opteron 246 (2.0GHz/1MB),– With one core each one – 2GB of memory and 2 1Gb/s NICs.
• Xen 3.1.0 and 3.2.1 with respectively the modified 2.6.18-3 and 2.6.18.8 linux kernels
• Measurement tools– iperf for TCP throughput– netperf for UDP rate– xentop for the CPU utilization– classical ping utility for latency
04/13/23 Distributed System Lab 13
Experiments and Analysis
• Evaluation of virtual end-hosts– Network performance on virtual end-hosts implemented
with Xen 3.1 and Xen 3.2.
– Some results with Xen 3.1 were not satisfying, dom0 being the bottleneck.
– Second run of the on Xen 3.1 , attributing more CPU time to dom0, (up to 32 times the part attributed to a domU) called Xen 3.1a.
04/13/23 Distributed System Lab 14
Sending performance
• First experiment– TCP sending throughput on 1, 2, 4 and 8 virtual hosts– Figure 3:Throughput per VM , Aggregate throughput.
– 3.1 and 3.2, close to classical linux throughput Rclassical(T/R) = 938Mb/s
– 3.1a and 3.2, aggregated throughput obtained by VMs reaches roughly more than on 3.1
04/13/23 Distributed System Lab 15
04/13/23 Distributed System Lab 16
Sending performance
• Conclude in three cases – The system is efficient and predictable
(Throughput)
• The throughput per VM corresponds to the fair share of the available bandwidth of the link (Rtheoretical/N).
04/13/23 Distributed System Lab 17
Sending performance
• Average CPU utilization for each guest domain Figure 4.
• For a single domU – Two CPUs are used at around 50% in the three
setups (Xen 3.1, 3.1a and 3.2)
• Linux system without virtualization:– only Cclassical(E) = 32% of both CPUs are in use
• With 8 domUs– Both of the CPUs are used at over 70%
04/13/23 Distributed System Lab 18
04/13/23 Distributed System Lab 19
Sending performance
• 3.1a : Increasing dom0’s CPU weight
• Even if virtualization introduces a processing overhead, two processors can allow to achieve a throughput equivalent to the Max theoretical throughput on 8 concurrent VMs using a 1Gb/s link.
• Fairness index is here close to 1 (bandwidth and CPU time are fairly)
04/13/23 Distributed System Lab 20
2.Receiving performance
Figure 5• Xen 3.1: Aggregate throughput decreases slightly
– (according to the number of VM)
• Only 882Mb/s on a single domU • Only 900Mb/s on a set of 8 concurrent domUs
– What corresponds to around 95% of the throughput Rclassical(T/R) = 938Mb/s on a classical linux system.
04/13/23 Distributed System Lab 21
Receiving performance.
• The efficiency Ethroughput – Varies between 0.96 for 8 domUs and 0.94 for a
single domU
• By changing scheduler parameters (Xen3.1a) – Improve the aggregate throughput to reach
about 970Mb/s on 8 virtual machines.
04/13/23 Distributed System Lab 22
Receiving performance
• Xen 3.1, bandwidth between the domUs is very unfair (Growing number of domUs)
• Unfair treatment of the events and has been fixed in Xen 3.2.
• To provide simply dom0 with more CPU time– 3.1a improve fairness in Xen 3.1 by giving dom0
enough time to treat all the events
04/13/23 Distributed System Lab 23
Receiving performance
• Fair resource sharing: – Makes performance much more predictable
• Xen 3.2 is similar to Xen 3.1a– Throughput increases by about 6%
(compared to the default 3.1 version)
04/13/23 Distributed System Lab 24
Receiving performance
04/13/23 Distributed System Lab 25
Receiving performance
• Total CPU cost – Varie between 70% and 75%(Xen3.1 and 3.2)– (important overhead compared to linux system without virtualization)
– Network reception takes Cclassical(R) = 24%
• Notice that on default Xen 3.1– The efficiency in terms of throughput decreases, but the
available CPU time is not entirely consumed – Unfairness
04/13/23 Distributed System Lab 26
Receiving performance
• Proposal improves fairness but increases CPU• Xen 3.2
– DomUs CPU sharing is fair (dom0’s CPU decreases slightly)
– Less total CPU overhead and achieving however better throughput
• Conclude:– Important improvements have been implemented in
Xen 3.2 to decrease the excessive dom0 CPU overhead.
04/13/23 Distributed System Lab 27
Receiving performance
04/13/23 Distributed System Lab 28
3.Evaluation of virtual routers
• Forwarding performance of virtual routers with 2 NICs– UDP receiving throughput over VMs – Sending Max sized packets on Max link speed over the
virtual routers and the TCP throughput is measured. – Further Latency over virtual routers is measured
• Xen 3.2a – Xen 3.2 in its default configuration – Increased weight parameter for dom0 in CPU
scheduling
04/13/23 Distributed System Lab 29
3.Evaluation of virtual routers• Forwarding performance.
04/13/23 Distributed System Lab 30
3.Evaluation of virtual routers
• Performance of virtual routers– Generate UDP traffic over one or several virtual routers (1
to 8) sharing a single physical machine• Max(1500 bytes) • min (64 bytes)
• Figure 7 ,obtained UDP bit rate and TCP throughput• Packet loss rate with Max sized packets on each VM
1 − Rtheoretical/(N × Rtheoretical)• Classical linux router Rclassical(F) = 957Mb/s
04/13/23 Distributed System Lab 31
3.Evaluation of virtual routers
• Details the UDP packet rates and the loss rates per domU with Max and min sized packets.
04/13/23 Distributed System Lab 32
3.Evaluation of virtual routers• Aggregate UDP some cases a bit higher than theoretical value
– Due to little variation in the start times of the different flows
• Resource sharing is fair – Performance of this setup is predictable
• With min sized packets on 4 or 8 virtual routers , dom0 becomes too overloaded
• Giving a bigger CPU part to dom0 (Xen 3.2a)– Overall TCP throughput increases
04/13/23 Distributed System Lab 33
3.Evaluation of virtual routers
• Virtual router(VR) latency.– Concurrent virtual routers sharing the same
physical machine are either idle or stressed forwarding Max rate TCP flows.
04/13/23 Distributed System Lab 34
Related work
• Performance of virtual packet transmission in Xen is a crucial subject and has been treated in several papers
04/13/23 Distributed System Lab 35
Conclusion and perspectives
• Virtualization mechanisms are costly– Additional copy – I/O scheduling of virtual machines sharing the
physical devices
• Virtualizing the data plane by forwarding packets in domU becomes a more and more promising approach
04/13/23 Distributed System Lab 36
Conclusion and perspectives
• End-host throughput improved in Xen 3.2 compared to 3.1• Virtual routers act similar to classical linux routers
forwarding big packets.
• Latency is impacted by the number of concurrent virtual routers .
• Our next goal is to evaluate the performance on 10 Gbit/s links and implement virtual routers on the Grid’5000 platform.
04/13/23 Distributed System Lab 37