· web viewif we gather all flow information at beginning, we have to handle problems to construct...
TRANSCRIPT
國立臺灣大學電機資訊學院資訊網路與多媒體研究所碩士論文
Graduate Institute of Networking and Multimedia
College of Electrical Engineering and Computer Science
National Taiwan University
Master Thesis
自我監控的交換器監控軟體定義網路Self-Monitoring SDN Switch Network Measurement田顏禎
Yen-Chen, Tien
指導教授:周承復 博士Advisor: Cheng-Fu Chou, Ph.D.
中華民國 104 年 7 月July, 2015
致謝
中文摘要
軟體定義網路使網路能夠程式化。擁有全域觀的中央控制器能透根據網路的狀態來動態調整任何一個交換器內的轉遞規則。現存的監控方式皆使用網路流的狀態搭配網路架構和網路流的路徑來計算網路內各個連結的使用量。網路的流量變化非常劇烈,為了能更快速地做出對應,部分研究使用傳輸控制協定內的序列號來計算網路流的速度和使用埠映射同步來降低處理時間。然而,並非網路協定內序列號等價於網路流所傳送的位元數而埠映射同步產生過多額外的流量。
在這個研究中,我們提出使用埠計數器和 OpenFlow 內的狀態要求訊息和狀態回復訊息的兩階段式的網路偵測。我們讓交換器使用埠計數器計算連接上交換器的連結使用量,並將結果傳送給中央控制器。根據網路內各個連結的使用量,中央控制器判斷壅塞的連結傳送狀態要求訊息來獲取網路流的資料來進行轉遞規則的變動。
關鍵字網路監測,軟體定義網路,流量規劃
I
Abstract
Software-defined networking architecture with OpenFlow makes network
programmable that centralized controller can decide forwarding rule in every switch
dynamically according to network status. Existing measurements gather flow-level
information, and using topology and flow path to construct link utilization table. To
react rapidly to network changing, some works use TCP sequence numbers to quickly
calculate flow rate and leverage port mirroring to decrease latency. However, not all
protocol has sequence numbers as byte counter of the flow, and port mirroring cause
large overhead.
In this paper, we present a two-tier measurement architecture that uses both port
counter and OpenFlow StatsReq (request) and StatsRes (response) message to extract
global view of the network. We let switch use port counter to monitor itself, and send
the link state message to the controller. Based on the link utilization table, updated by
switch message, the controller finds the congested link and queries flows information
to change forwarding behavior.
II
Keywords
Networking Measurement; Software-Defined Networking; Traffic Engineering
Content中文摘要............................................................................................................................I
Abstract............................................................................................................................II
Content............................................................................................................................III
List of Figures...................................................................................................................V
List of Tables..................................................................................................................VII
Chapter 1. Introduction...................................................................................................1
Chapter 2. Related Work...............................................................................................4
2.1 Sampling-based method....................................................................................4
2.1.1 sFlow.................................................................................................................4
2.1.2 OpenSample.....................................................................................................5
2.1.3 Planck...............................................................................................................6
2.2 OpenFlow-based methods.................................................................................7
2.2.1 Hedera..............................................................................................................7
2.2.2 FlowSense.........................................................................................................8
Chapter 3. Design...........................................................................................................9
3.1 Collecting link state..........................................................................................9
3.1.1 Port counter....................................................................................................10
3.1.2 Push-based......................................................................................................12
3.2 Handling congestion.......................................................................................12
3.2.1 Flow-level information....................................................................................14
3.2.2 Control loop....................................................................................................16
Chapter 4. Evaluations.................................................................................................17
4.1 Small testbed...................................................................................................18
III
4.1.1 Accuracy.........................................................................................................20
4.1.2 Processing time (Delay)...................................................................................21
4.1.3 Measurement Overhead.................................................................................23
4.1.4 Oversubscribed port mirroring......................................................................26
4.2 Traffic engineering.........................................................................................27
4.2.1 Topology.........................................................................................................28
4.2.2 Workload........................................................................................................28
4.2.3 Comparison....................................................................................................29
4.2.3 Result..............................................................................................................30
Chapter 5. Conclusion.....................................................................................................31
References.......................................................................................................................32
IV
List of FiguresFigure 1: Switches send link utilization messages to controller...........................10Figure 2: An example message created by switch................................................11Figure 3: Controller queries the flow information on the link.............................13Figure 4: Pseudocode of control loop...................................................................15Figure 5: A small testbed create by Open vSwitch and Mininet..........................17Figure 6: Throughput estimation by Planck (port mirroring) and port counter.18Figure 7: Port mirroring measures bandwidth of each flow................................19Figure 8: Link bandwidth compares to bandwidth sum of two flow...................19Figure 9: Traffic created by the measurement for 200Mbps flow........................22Figure 10: Traffic create by the measurement for 1Gbps flow............................22Figure 11: Three independent flow environment.................................................23Figure 12: Port mirroring calculate rate of flow 1...............................................24Figure 13: Port mirroring calculate link rate of flow 1 traversing......................24Figure 14: Port mirroring calculate rate of flow 1...............................................25Figure 15: Port mirroring calculate link rate of flow 1 traversing......................25Figure 16: Three-tier k=4 fat-tree topology.........................................................27Figure 17: result for traffic engineering...............................................................30
V
List of TablesTable 1: Query delay............................................................................................13Table 2: Processing time between two messages..................................................19
VI
Chapter 1. Introduction
Traditional datacenter networks provide an ever increasing array of
applications, ranging from scientific computing to financial services and big-data
analytics. In order to support these services, data centers are constructed to have large
bisection bandwidth and use Equal Cost MultiPath (ECMP) load balancing to avoid
congestion [1, 2]. However, ECMP balances load poorly because it randomly hashes to
decide path of flows, where hash collisions can cause significant imbalance, and split
network traffic purely by local information without the knowledge of the utilization of
each path [3]. The design of data centers adds substantial cost, and results in poorly
utilized networks, so new network architecture is needed.
Software-defined networking (SDN) replaces the distributed, per-switch control
planes of traditional networks with a (logically) centralized control plane (controller)
that programs the forwarding behavior of data plane (switches) in a given network [4].
OpenFlow [5], the de facto standard used to implement SDN architecture, provides
common APIs to abstract the underlying infrastructure details and communicate
interface between control layer and data plane. Network operators and administrators
1
can programmatically configure the simplified abstraction, which opens the opportunity
to reduce the complexity of distributed configuration and ease the network management
tasks [6]. The idea is currently attracting significant attention from both academia and
industry [7].
SDN architecture enables to change the forwarding behavior dynamically of each
switch, such as install or modify a rule in flow table in the switch, thus a set of
requirements such as fast and reliable big data delivery and deadline guarantees become
more feasible. In order to satisfy these goals, the centralized controller continuously
gathers the network state and computes to install or change rules to the switches. All
network control based on the information of network measurements, so there are some
necessities for monitoring techniques. First, collecting statistics need to be accuracy
enough to provide global visibility into the network. Second, data center traffic is very
volatile and burst [8, 9], and rule can be installed by SDN controller in tens of
milliseconds [11], so the data must be near-real-time, i.e., latencies on the order of
milliseconds.
Existing monitoring approaches measure the network by flow level, since that
forwarding behavior at the granularity of a flow in SDN such as a rule in switch’s flow
table. The centralized controller uses all flow information with knowledge of the
topology and flow paths, controller can find which link is used by which flows. Link
2
utilization can calculate by summing the rate of all flows traversing a given link.
Unfortunately, using flow-level information to construct link utilization table faces
some problems. First, error estimation of a flow will affect the state’s accuracy of all
links in the flow path. Second, the link utilization is the sum of all flow throughputs in
the link, so waiting all information of flows traversing the link is inevitable. Thus we
ask some questions: Can we get utilization of a link more easily? Do we really need all
flow information when the link is stable? Can we have enough information to do traffic
engineering?
In this paper, we present a two-tier measurement architecture that answers these
questions. First, we use the property of port counter, which tracks the number of bytes
and packets that are sent and received for each port, to estimate the utilization of one
link rather than using all flow-level information in one link to construct a link state. We
let switch monitor all link utilization by itself with port counter every time t, and push
the link state to controller. Controller uses these data to update link utilization table.
Second, if controller finds some links are congested, controller need flow-level
information to make decision. We let the controller use OpenFlow StatsReq (request)
message to query the flow details in the congested link. With utilization of all links and
flow throughput in congested link, controller can use alternate paths of these flows to
find if there is another path with lowest utilization and reroute the flow.
3
Chapter 2. Related Work
Network measurement is too broad to fully discuss, so we focus on measurement
methods that are useful in discovering link utilization. These methods can be classified
as sampling-based methods and OpenFlow-based methods. All of these methods collect
flow-level information.
2.1 Sampling-based method
Collector receives sample packets from the switches, and uses them to estimate
flow status.
2.1.1 sFlow
sFlow [21], an industry standard for packet sampling that the switch captures one
out on every N packets on each input port and send the sFlow message to the collector.
The sFlow message contains the sample packet’s header with metadata which includes
4
the switch ID, the timestamp about capture time and forwarding information such as
input and output port.
After gathering samples, the collector uses the samples to estimate the number of
packets and byte in each flow by simply multiplying the number of sampled bytes and
packets by the sample rate, N. For example, collector received 1000 packets with the
sampling rate 50, and collector found there are 500 packets is flow A. Therefore
collector can assume that there are 50000 packets in the network, and 25000 packets is
flow A, and use the average size of samples of flow A to compute the how many bytes
flow A transmit in the network.
However, by using binomial model and central limit theorem, the expected error in
percent of sFlow can be estimated as: error% ≤ 196 •√ 1s
where s is the samples get for
class s. In order to improve accuracy, it must increase the sampling rate or increase the
sampling period.
2.1.2 OpenSample
OpenSample [4] shows that sFlow has its limit that the maximum rate is between
300 to 350 samples per second. This limit is a consequence of the switch’s control CPU
5
being overwhelmed to strips off the packet payload and adds metadata. If all 350
samples are in the same flow, the expected error is 10%. The result shows that the only
way for sFlow to improve accuracy is to increase the sampling period which means to
have longer processing time to collect state.
For the purpose of low-latency measurement, OpenSample purposes an idea that
extracting flow statistics using protocol-specific information, especially for TCP flows.
Every TCP packet carries a sequence number indicating the specific byte range the
packet carries, and sFlow send the message with the header of the sample packet, so
collector can get the information about the TCP sequence number. If the collector
receives one sFlow message with sequence number s1 at time t1, and receives another
sFlow message at time t2 from the same flow, such that t1< t2. The collector can infer
the flow’s throughput by (s2 –s1)/ (t2 –t1). Therefore, if the collector gets two distinct
message of same TCP flow, it can estimate the rate of the flow accurately and reduce the
processing time to 100ms.
2.1.3 Planck
Planck [20] uses IBM RackSwitch G8264 [22] to run sFlow, and finds the
maximum rate of sFlow is about 300 samples per second which is similar to result in
6
OpenSample. Placnk try to use other feature in switch to break the limits of sFlow.
Planck leverages the port mirror feature found in most commodity switches. Port
mirroring send a copy of network packets on one switch port to another switch port.
Because port mirroring copy whole packet, Planck also use the property of TCP
sequence number. By collecting two distinct copy packets of the same flow, Planck can
estimate the flow throughput. Because port mirroring doesn’t have to remove the
payload and encapsulate the metadata it decrease the latency to 7ms timescales on a
1Gbps commodity switch and 4ms timescales on a 10Gbps switch. Port mirroring
copies all traffic in one link, so the overhead is the same size as original traffic.
2.2 OpenFlow-based methods
Controller uses communicate interface, which is provided by OpenFlow, to get
state of the network.
2.2.1 Hedera
OpenFlow supports StatsReq (request) message and StatsRes (response) message.
Controller sends request message to switches to query the flow status, and the switch
sends reply message including flows’ byte count and duration (flow living time). After
7
receiving reply from the switch, controller can calculate the throughput of flows.
Hedera uses the property of fat-tree topology [12], and proposed a dynamic flow
scheduling system. Controller sends StatsReq (request) message to all edge switches,
because all flow crosses at least one edge switch. After receiving all the reply, controller
uses the information about all flow source and destination to estimate flows’ demand. If
a flow throughput is beyond the estimate demand, controller try to find a new path to
reroute the flow. However, polling-base is much slower than push-based, Hedera shown
that its one control loop take 5 seconds.
2.2.2 FlowSense
FlowSense [23] calculate flow rate by PacketIn message and FlowRemoved
message which all flow has to send to the controller. On the arrival of the first packet of
a new flow, the switch send PacketIn message to ask the controller about the flow rule.
When the flow entry expires, the switch sends a FlowRemoved message to controller.
The FlowRemoved message contains the duration and the number of packets and
number bytes that using the rule.
Controller adds an entry to active flow table when received the PacketIn message,
and sets a check point when received FlowRemoved message. If all flows are expired,
8
the controller can calculate the utilization at each check point. Although FlowSense can
collect statistics with no overhead, the global view created by FlowSense is actually
history view. Besides, if there are some background flow which actives all the time,
FlowSense will never get the information. Thus, the estimation is quite far from the
actual value.
Chapter 3. Design
Our two-tier measurement architecture classifies the network state into two
situations. First, collecting link state: let switch report each link utilization by
calculating port counter. Second, handling congestion: Controller finds some of links
are congested, thus controller need go deep into the links to get flow-level information
to schedule. The remainder of this section discusses each of situation what our
components do in turn.
3.1 Collecting link state
The architecture of collecting link state is illustrated in Fig. 1. We let switch
9
monitor itself by port counters, and send link statistics message to the controller. We use
a single, centralized controller maintains a link utilization table, and uses these
messages to construct global view of the networks. We describe our design strategies as
follows.
Figure 1: Switches send link utilization messages to controller
3.1.1 Port counter
Although Flow-based programmable networks network need flow-level
information to make decision of each flow, using flow-level state to construct link
utilization is hard to deploy as previously discussed in introduction. Thus we see the
opportunity in port counter, which tracks the number of bytes and packets that are sent
and received for each port. Let’s focus on one link’s port counter. If we get counter C1
10
of the link at time T1, and counter C2 of the link at time T2, such that T1 < T2. Upon
getting the counter C2, we can infer the link’s throughput information by computing
(C2 -C1) / (T2 -T1), and compute the link utilization is using the throughput of the link
divided by the link’s maximum speed.
The rate estimation of a link is similar to OpenSample and Planck which use TCP
sequence numbers to make the flow estimation more accuracy. Using protocol specific
information is not always workable for other type of flows, while using port counter
always work no matter what kind of the traffic is, because it only records the number of
bytes and number of packets are sent and received for each port. Furthermore, link state
is independent to other link, so we don’t have to wait other information to combine, and
error estimation about one link doesn’t affect other link state. Therefore, using port
counter not only calculates link utilization easily, but also to gives more flexibility to
construct global view of the network.
11
Figure 2: An example message created by switch
3.1.2 Push-based
As we mentioned above, if we get two counter of one link, we can calculate the
link utilization of the link. We can use OpenFlow StatsReq (request) message to request
port status continuously to estimate link utilization of each link. However, polling-based
method is much slower than push-based methods. Previous work has demonstrated that
routes can be installed in switches by an SDN controller in tens of milliseconds [10, 11],
so spending more time collecting statistics will limit the minimum processing time of
the control loop. Therefore, we let switch use property of port counters to compute all
link status by itself, and report the results to the controller.
After calculating all link utilization in a switch, the switch encapsulates all link
12
information as a message, which is composed of switch ID, transmitting utilization and
receiving utilization of each link, and sends the message directly to the controller. Fig. 2
shows an example message. When controller receives the message from the switch, it
checks the ID of the switch, and using the information of each link in the message to
update the global link utilization table.
3.2 Handling congestion
When congestion happened, we need to get flow information to make decision.
Fig. 3 illustrates what controller do to handle the situation. The controller uses
OpenFlow StatsReq (request) message to request flows detail on the link, and calculate
all flows rate by StatsRes (response) message. With the flows throughput and global
view of link status to decide whether to change forwarding behavior.
13
Figure 3: Controller queries the flow information on the link
Average standard deviation99%
Conf. levelQuery all edge switches 44ms 80ms 249ms
Query one link 24ms 15ms 62msTable 1: Query delay
3.2.1 Flow-level information
Controller has global view, created by link state message, and controller knows
some of links are congested, so what statistics controller need for traffic engineering is
that the flow detail on those links. Unlike choosing port counter to get link state, how to
14
get the flow information needs more considerations. If we gather all flow information at
beginning, we have to handle problems to construct global view. If we let switch push
both link state and all flow details to controller, some of the information are useless, but
both switch and controller have to put effort into those data. We choose to leverage
OpenFlow StatsReq (request) message to query the flow state to the switch which
provides the information the controller need. However, polling-based methods are much
slower, so we have to find how to query can have less response time.
We seem the time that controller sends all query messages and receives all reply as
query delay. We build a k=4 three-tier fat-tree topology [12], which with 20 switcches,
and let controller to query flows state in all edge switches and compare to query only
one link to get flows information in one edge switch for 1000 times that query delay has
an approximate normal distribution by central limit theorem. Table 1 shows the average,
standard deviation and 99% confidence interval of query delay. According to the Table
1, if we poll all edge switches to get all flows’ information to construct link utilization
table, it takes hundreds of milliseconds to handle statistics. But if we only query one
link, the waiting time can less than one hundred milliseconds to get flow information in
one link. Therefore, we let the controller query at the granularity of a link when we find
some links are congested.
15
Figure 4: Pseudocode of control loop
3.2.2 Control loop
In collecting link state, controller just receives message from switches and update
the entries of the link utilization table. The control loop discussed here is that controller
finds which links are congested, queries flows on the link, and makes decision. Our
16
pseudocode of control loop, presented in Figure 4, and we simply describe the method
as follows.
Controller periodically checks the global view table to find congested link, which
the utilization is bigger than threshold. In previous work [8] shows that links have
average utilization above 70% can seem as congested link, so we choose 70% as the
threshold. After that, controller sends flow StatsReq message to query the flow on the
link. When controller receives the reply of one link, we generate temporary global view
table without the flows throughput in congested link. Using temporary global view table
and shortest path set of these flows to find whether there is a new best path or not. If
controller finds a new best path, controller changes the route of the flow
Chapter 4. Evaluations
In this section, we present the results of our experimental evaluations. We choose
Mininet [18] 2.2 version which uses Linux containers to emulate hosts and Open
vSwitch (OVS) [14] to emulate switches to run emulation. We update the Open vSwitch
17
to 2.3.1 version in Mininet, which supports sFlow and port mirroring.
Figure 5: A small testbed create by Open vSwitch and Mininet
4.1 Small testbed
We use Mininet to create a small test environment, shown in Fig 5, with one sender
and one receiver, because of port mirroring need another port to duplicate traffic, we
add another link to Open vSwitch as a collector. We use iPerf [15] to generate traffic.
We compare three different measurement approaches: (i) OpenSample, which is based
18
on sFlow (N=50) with TCP sequence number to improve accuracy. (ii) Planck, which
uses port mirroring with TCP sequence. (iii) Port counter, we implement the port
counter message architecture with Open vSwitch.
Figure 6: Throughput estimation by Planck (port mirroring) and port counter
19
Figure 7: Port mirroring measures bandwidth of each flow
20
Figure 8: Link bandwidth compares to bandwidth sum of two flow
4.1.1 Accuracy
We let host A use iPerf to generate 300 Mbps and 900 Mbps TCP traffic to host B.
We only compare to Planck (port mirroring), because OpenSample also uses TCP
sequence numbers, both estimate the rate by subtracting the two sequence numbers and
dividing by the time between two packets.
In order not to produce jittery rate estimates due to TCP’s burst behavior, we
calculate the difference of TCP sequence number when the time interval is bigger than
0.5 second in port mirroring, and we set the timer of the switch to generate a message
21
every 0.5 second. Because iPerf can generate nearly same average rate traffic at
granularity of a second, we can have a smooth result with 0.5 second time interval
shown in Fig 6. We can see that if there is only one flow in the link, port counter can
estimate the flow throughput as flow-level measurement.
Next, we let host A use iPerf to create two flows to host B. Two flow share the link
bandwidth, and we measure rate of each flow by port mirroring, shown in Fig 7. We
sum of two flow information to calculate the throughput of the link, and we compare
with link state calculated by port counter, shown in Fig 8. The result shows that link
information created by port counter is nearly close to combing all flow information
traversing a link.
Averagestandard deviation
99%Conf. level
OpenSamplesFlow(N=50)
5ms 34.86ms 94.2ms
PlanckPort mirroring
0.16ms 2.94ms 7.6ms
Port counter 0.13ms 2.48ms 6.5msTable 2: Processing time between two messages
4.1.2 Processing time (Delay)
The time used to process data to estimate the status can see as the delay of view.
We experiment by letting host A use iPerf to generate 1Gbps TCP traffic to host B. We
uses two messages’ time interval as the processing time in three methods, and we run
22
each method for 2000 times to calculate average processing time and standard
deviation. After that, the delay can be approximately normally distributed by central
limit theorem, and we use the 99% confidence interval as the worst processing time.
For Planck (port mirroring), two messages are two port mirroring packet found in
collector. For OpenSample (sFlow), two messages are two sFlow packets of the same
flow. For port counter, two messages are two link state messages sent to the controller.
We show the results in Table2. As previous discussed in section 2, sFlow only generates
300 to 350 samples per second, so the delay is more than Planck and port counter. We
can also see that port counter’s processing time is close to port mirroring with 1Gbps
traffic.
23
Figure 9: Traffic created by the measurement for 200Mbps flow
24
Figure 10: Traffic create by the measurement for 1Gbps flow
4.1.3 Measurement Overhead
We use the traffic create by measurement as the measurement overhead. We let
host A use iPerf to generate 200 Mbps and 1Gbps Mbps TCP traffic to host B. Fig 7
shows the result of 200 Mbps flow and Fig 8 shows the result of 1Gbps flow. We can
see that using port mirroring create much more traffic than others, because port
mirroring duplicate all traffic for a single port to a monitoring port, the overhead is the
same as the size of the original flow. sFlow only forwards the sampled packet’s header
encapsulated with metadata, so the overhead decreases dramatically, but it still increase
25
with the flow rate increasing. While using port counter, port counter only generate link
state message by the timer, we can see that port counter generate nearly same overhead
no matter what rate the flow is.
Figure 11: Three independent flow environment
26
Figure 12: Port mirroring calculate rate of flow 1
27
Figure 13: Port mirroring calculate link rate of flow 1 traversing
28
Figure 14: Port mirroring calculate rate of flow 1
29
Figure 15: Port mirroring calculate link rate of flow 1 traversing
4.1.4 Oversubscribed port mirroring
In Planck, multiple ports are mirrored to a single monitoring port. To test
oversubscribed port mirroring, we build an environment with three sources, three
destinations and one mirroring collector, shown in fig 11. We generate traffic by iPerf.
At beginning, we let flow A sent by source A to destination A, 7 seconds later flow B
sent by source B to destination B. 14 seconds after A was sent, flow C is sent by source
C to destination C.
First, for each flow, we generate 100 Mbps UDP traffic, and the result is shown in
Fig 12 and Fig 13. UDP protocol doesn’t have sequence number to calculate throughput,
so we use the mirroring packet length to estimate the bandwidth, and we use iPerf local
information as the ground truth. The sum of three flows is less than 1Gbps, so the
estimation of port mirroring is similar to ground truth. All link has only one flow, so
port counter can use link state as flow estimation.
Second, for each flow, we generate 400 to 500 Mbps UDP traffic, and the results is
shown in Fig 14 and Fig 15. The sum of three flows is over than 1Gbps, so the
estimation using packet length becomes inaccuracy. All ports mirrored to a single port
introduce a problem that all traffic exceed the capacity of the monitoring port, and the
30
protocol is UDP which doesn’t have sequence to improve the accuracy. Port counter
only tracks the number of bytes and packets that are sent and received for each port, so
it can estimate precisely even the flow is UDP.
4.2 Traffic engineering
In this experiment, we want to see if our measurement and control loop scenario is
feasible to do traffic engineering. We implement port counter message system with
Open vSwitch, and the control loop as modules for POX [16] OpenFlow controller. We
use Mininet to emulate the environment and use iPerf to generate workload.
31
Figure 16: Three-tier k=4 fat-tree topology
4.2.1 Topology
We use Mininet to create a three-tier k=4 fat-tree topology, which has 16 hosts and
20 switches, we set bandwidth of each link to 1Gbps, where bandwidth limit of a link
only support 0 to 1Gbps in Mininet version 2.2. Fig 9 shows the detail about the
topology.
4.2.2 Workload
We use the workloads similar to previous related work [4, 13, 17, 19]. A brief
32
description of the workloads as follows.
Staggered Prob (EdgeP, PodP): A host sends to another host in the same edge
switch with probability EdgeP, and to its same pod with probability PodP, and to the rest
of the network with probability 1-EdgeP - PodP.
Stride(4): The node with index x sends a flow to the node with index (x + 8) mod
(num_hosts).
Random bijective: A host sends to any other host in the network with uniform
probability and every node is exactly the source of one flow
Random X3: A host sends to any other three hosts in the network with uniform
probability.
We run iPerf on each host to generate workload. Flow speed is bounded by the
minimum link bandwidth in flow path. iPerf tries to saturate link, so if there is no flow
sharing a link in the flow path, the speed will be1Gbps.
4.2.3 Comparison
To make a comparison with our method, we run other four different routing
algorithms for each of the workloads. They are Non-blocking, Planck, Hedera, ECMP,
respectively.
33
Non-blocking: We let all 16 hosts connect to one 16-port Open vSwitch, and the
topology is used to represent an optimal non-blocking network as the upper bound;
Planck method: Traffic engineering work that based on port mirroring; Hedera: Traffic
engineering work that based on OpenFlow message to poll information; ECMP: Equal
Cost MultiPath (ECMP) load balancing decides flow path by hash function based on the
IP 5-tuple (IP src, dst, proto and src, dst ports). Because Hedera, Planck and our
methods use Equal Cost MultiPath (ECMP) load balancing to initial flow paths, we
choose ECMP as the baseline.
Figure 17: result for traffic engineering
34
4.2.3 Result
We run each workload in 50 seconds, and calculate normalized average bisection
bandwidth of all links in the topology. Fig 10 shows the result of the experiment.
We can see that our measurement and control loop have similar performance to
port mirroring case. In Staggered Prob workload, 50% of traffic are in the same pod and
same edge, and in stride 4 workload, there are many disjoint path at initial where ECMP
can have 66% bisection bandwidth, so fast control loop doesn’t have many benefit.
When the paths become more complex and flows become more such as Random
bijective and Random X3, faster control loop can have about 10% more bandwidth than
polling method.
Chapter 5. Conclusion
We present the two-tier measurement architecture. First, switch uses port counter
to monitor its link status and sends the message to the controller. Using port counter to
calculate link utilization regardless of the type of flows and results for each link is
independent. Controller can have the global view of the network quickly and with
nearly fixed low overhead. Second, when controller finds some links are congested, it
35
sends flow stats request to query flows on the link to solve link state that with little
direct insight into the flows. We show that combining link state and flow state in a
proper way can result in a good performance and low overhead measurement.
References
[1] Al-Fares, Mohammad, Alexander Loukissas, and Amin Vahdat. "A scalable,
commodity data center network architecture." ACM SIGCOMM Computer
Communication Review 38.4 (2008): 63-74.
[2] Greenberg, Albert, et al. "VL2: a scalable and flexible data center network." ACM
SIGCOMM computer communication review. Vol. 39. No. 4. ACM, 2009.
36
[3] Alizadeh, Mohammad, et al. "CONGA: Distributed congestion-aware load
balancing for datacenters." Proceedings of the 2014 ACM conference on SIGCOMM.
ACM, 2014.
[4] Suh, Junho, et al. "Opensample: A low-latency, sampling-based measurement
platform for sdn." ICDCS, 2014.
[5] “OpenFlow” https://www.opennetworking.org/sdn-resources/openflow
[6] Kim, Hyojoon, and Nick Feamster. "Improving network management with
software defined networking." Communications Magazine, IEEE 51.2 (2013): 114-119.
[7] Nunes, Bruno, et al. "A survey of software-defined networking: Past, present, and
future of programmable networks." Communications Surveys & Tutorials, IEEE 16.3
(2014): 1617-1634.
[8] Kandula, Srikanth, et al. "The nature of data center traffic: measurements &
analysis." Proceedings of the 9th ACM SIGCOMM conference on Internet measurement
conference. ACM, 2009.
[9] Benson, Theophilus, Aditya Akella, and David A. Maltz. "Network traffic
characteristics of data centers in the wild." Proceedings of the 10th ACM SIGCOMM
conference on Internet measurement. ACM, 2010.
[10] Ferguson, Andrew D., et al. "Participatory networking: An API for application
control of SDNs." ACM SIGCOMM Computer Communication Review. Vol. 43. No. 4.
37
ACM, 2013.
[11] Stephens, Brent, et al. "PAST: Scalable Ethernet for data centers." Proceedings of
the 8th international conference on Emerging networking experiments and technologies.
ACM, 2012.
[12] M. Al-Fares, A. Loukissas, and A. Vahdat. A Scalable, Commodity Data Center
Network Architecture. In SIGCOMM, 2008.
[13] A. R. Curtis, J. C. Mogul, J. Tourrilhes, P. Yalagandula, P. Sharma, and S.
Banerjee. DevoFlow: Scaling Flow Management for High-Performance Networks. In
SIGCOMM, 2011.
[14] “Open vSwitch” http://openvswitch.org/
[15] “iPerf” https://iperf.fr/
[16] “Pox” http://www.noxrepo.org/pox/about-pox/
[17] M. Al-fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat.
Hedera:Dynamic Flow Scheduling for Data Center Networks. In NSDI, 2010.
[18] “Mininet” http://mininet.org/
[19] N. Handigol, B. Heller, V. Jeyakumar, B. Lantz, and N. McKeown,“Reproducible
network experiments using container-based emulation,”in CoNEXT, 2012.
[20] Rasley, Jeff, et al. "Planck: millisecond-scale monitoring and control for
commodity networks." Proceedings of the 2014 ACM conference on SIGCOMM. ACM,
2014.
38
[21] “IBM BNT RackSwitch G8264”
http://www.redbooks.ibm.com/abstracts/tips0815.html
[22] “sFlow” http://sflow.org/
[23] Yu, Curtis, et al. "Flowsense: Monitoring network utilization with zero
measurement cost." Passive and Active Measurement. Springer Berlin Heidelberg,
2013.
39