· web viewif we gather all flow information at beginning, we have to handle problems to construct...

57
國國 國國國國 Graduate Institute of Networking and Multimedia College of Electrical Engineering and Computer Science National Taiwan University Master Thesis 國國 Self-Monitoring SDN Switch Network Measurement 國國國 Yen-Chen, Tien 國國國國國 Advisor: Cheng-Fu Chou, Ph.D. 國國國國 104 國 7 國

Upload: others

Post on 06-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

國立臺灣大學電機資訊學院資訊網路與多媒體研究所碩士論文

Graduate Institute of Networking and Multimedia

College of Electrical Engineering and Computer Science

National Taiwan University

Master Thesis

自我監控的交換器監控軟體定義網路Self-Monitoring SDN Switch Network Measurement田顏禎

Yen-Chen, Tien

指導教授:周承復 博士Advisor: Cheng-Fu Chou, Ph.D.

中華民國 104 年 7 月July, 2015

Page 2:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

致謝

Page 3:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

中文摘要

軟體定義網路使網路能夠程式化。擁有全域觀的中央控制器能透根據網路的狀態來動態調整任何一個交換器內的轉遞規則。現存的監控方式皆使用網路流的狀態搭配網路架構和網路流的路徑來計算網路內各個連結的使用量。網路的流量變化非常劇烈,為了能更快速地做出對應,部分研究使用傳輸控制協定內的序列號來計算網路流的速度和使用埠映射同步來降低處理時間。然而,並非網路協定內序列號等價於網路流所傳送的位元數而埠映射同步產生過多額外的流量。

在這個研究中,我們提出使用埠計數器和 OpenFlow 內的狀態要求訊息和狀態回復訊息的兩階段式的網路偵測。我們讓交換器使用埠計數器計算連接上交換器的連結使用量,並將結果傳送給中央控制器。根據網路內各個連結的使用量,中央控制器判斷壅塞的連結傳送狀態要求訊息來獲取網路流的資料來進行轉遞規則的變動。

關鍵字網路監測,軟體定義網路,流量規劃

I

Page 4:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Abstract

Software-defined networking architecture with OpenFlow makes network

programmable that centralized controller can decide forwarding rule in every switch

dynamically according to network status. Existing measurements gather flow-level

information, and using topology and flow path to construct link utilization table. To

react rapidly to network changing, some works use TCP sequence numbers to quickly

calculate flow rate and leverage port mirroring to decrease latency. However, not all

protocol has sequence numbers as byte counter of the flow, and port mirroring cause

large overhead.

In this paper, we present a two-tier measurement architecture that uses both port

counter and OpenFlow StatsReq (request) and StatsRes (response) message to extract

global view of the network. We let switch use port counter to monitor itself, and send

the link state message to the controller. Based on the link utilization table, updated by

switch message, the controller finds the congested link and queries flows information

to change forwarding behavior.

II

Page 5:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Keywords

Networking Measurement; Software-Defined Networking; Traffic Engineering

Content中文摘要............................................................................................................................I

Abstract............................................................................................................................II

Content............................................................................................................................III

List of Figures...................................................................................................................V

List of Tables..................................................................................................................VII

Chapter 1. Introduction...................................................................................................1

Chapter 2. Related Work...............................................................................................4

2.1 Sampling-based method....................................................................................4

2.1.1 sFlow.................................................................................................................4

2.1.2 OpenSample.....................................................................................................5

2.1.3 Planck...............................................................................................................6

2.2 OpenFlow-based methods.................................................................................7

2.2.1 Hedera..............................................................................................................7

2.2.2 FlowSense.........................................................................................................8

Chapter 3. Design...........................................................................................................9

3.1 Collecting link state..........................................................................................9

3.1.1 Port counter....................................................................................................10

3.1.2 Push-based......................................................................................................12

3.2 Handling congestion.......................................................................................12

3.2.1 Flow-level information....................................................................................14

3.2.2 Control loop....................................................................................................16

Chapter 4. Evaluations.................................................................................................17

4.1 Small testbed...................................................................................................18

III

Page 6:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

4.1.1 Accuracy.........................................................................................................20

4.1.2 Processing time (Delay)...................................................................................21

4.1.3 Measurement Overhead.................................................................................23

4.1.4 Oversubscribed port mirroring......................................................................26

4.2 Traffic engineering.........................................................................................27

4.2.1 Topology.........................................................................................................28

4.2.2 Workload........................................................................................................28

4.2.3 Comparison....................................................................................................29

4.2.3 Result..............................................................................................................30

Chapter 5. Conclusion.....................................................................................................31

References.......................................................................................................................32

IV

Page 7:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

List of FiguresFigure 1: Switches send link utilization messages to controller...........................10Figure 2: An example message created by switch................................................11Figure 3: Controller queries the flow information on the link.............................13Figure 4: Pseudocode of control loop...................................................................15Figure 5: A small testbed create by Open vSwitch and Mininet..........................17Figure 6: Throughput estimation by Planck (port mirroring) and port counter.18Figure 7: Port mirroring measures bandwidth of each flow................................19Figure 8: Link bandwidth compares to bandwidth sum of two flow...................19Figure 9: Traffic created by the measurement for 200Mbps flow........................22Figure 10: Traffic create by the measurement for 1Gbps flow............................22Figure 11: Three independent flow environment.................................................23Figure 12: Port mirroring calculate rate of flow 1...............................................24Figure 13: Port mirroring calculate link rate of flow 1 traversing......................24Figure 14: Port mirroring calculate rate of flow 1...............................................25Figure 15: Port mirroring calculate link rate of flow 1 traversing......................25Figure 16: Three-tier k=4 fat-tree topology.........................................................27Figure 17: result for traffic engineering...............................................................30

V

Page 8:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

List of TablesTable 1: Query delay............................................................................................13Table 2: Processing time between two messages..................................................19

VI

Page 9:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Chapter 1. Introduction

Traditional datacenter networks provide an ever increasing array of

applications, ranging from scientific computing to financial services and big-data

analytics. In order to support these services, data centers are constructed to have large

bisection bandwidth and use Equal Cost MultiPath (ECMP) load balancing to avoid

congestion [1, 2]. However, ECMP balances load poorly because it randomly hashes to

decide path of flows, where hash collisions can cause significant imbalance, and split

network traffic purely by local information without the knowledge of the utilization of

each path [3]. The design of data centers adds substantial cost, and results in poorly

utilized networks, so new network architecture is needed.

Software-defined networking (SDN) replaces the distributed, per-switch control

planes of traditional networks with a (logically) centralized control plane (controller)

that programs the forwarding behavior of data plane (switches) in a given network [4].

OpenFlow [5], the de facto standard used to implement SDN architecture, provides

common APIs to abstract the underlying infrastructure details and communicate

interface between control layer and data plane. Network operators and administrators

1

Page 10:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

can programmatically configure the simplified abstraction, which opens the opportunity

to reduce the complexity of distributed configuration and ease the network management

tasks [6]. The idea is currently attracting significant attention from both academia and

industry [7].

SDN architecture enables to change the forwarding behavior dynamically of each

switch, such as install or modify a rule in flow table in the switch, thus a set of

requirements such as fast and reliable big data delivery and deadline guarantees become

more feasible. In order to satisfy these goals, the centralized controller continuously

gathers the network state and computes to install or change rules to the switches. All

network control based on the information of network measurements, so there are some

necessities for monitoring techniques. First, collecting statistics need to be accuracy

enough to provide global visibility into the network. Second, data center traffic is very

volatile and burst [8, 9], and rule can be installed by SDN controller in tens of

milliseconds [11], so the data must be near-real-time, i.e., latencies on the order of

milliseconds.

Existing monitoring approaches measure the network by flow level, since that

forwarding behavior at the granularity of a flow in SDN such as a rule in switch’s flow

table. The centralized controller uses all flow information with knowledge of the

topology and flow paths, controller can find which link is used by which flows. Link

2

Page 11:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

utilization can calculate by summing the rate of all flows traversing a given link.

Unfortunately, using flow-level information to construct link utilization table faces

some problems. First, error estimation of a flow will affect the state’s accuracy of all

links in the flow path. Second, the link utilization is the sum of all flow throughputs in

the link, so waiting all information of flows traversing the link is inevitable. Thus we

ask some questions: Can we get utilization of a link more easily? Do we really need all

flow information when the link is stable? Can we have enough information to do traffic

engineering?

In this paper, we present a two-tier measurement architecture that answers these

questions. First, we use the property of port counter, which tracks the number of bytes

and packets that are sent and received for each port, to estimate the utilization of one

link rather than using all flow-level information in one link to construct a link state. We

let switch monitor all link utilization by itself with port counter every time t, and push

the link state to controller. Controller uses these data to update link utilization table.

Second, if controller finds some links are congested, controller need flow-level

information to make decision. We let the controller use OpenFlow StatsReq (request)

message to query the flow details in the congested link. With utilization of all links and

flow throughput in congested link, controller can use alternate paths of these flows to

find if there is another path with lowest utilization and reroute the flow.

3

Page 12:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Chapter 2. Related Work

Network measurement is too broad to fully discuss, so we focus on measurement

methods that are useful in discovering link utilization. These methods can be classified

as sampling-based methods and OpenFlow-based methods. All of these methods collect

flow-level information.

2.1 Sampling-based method

Collector receives sample packets from the switches, and uses them to estimate

flow status.

2.1.1 sFlow

sFlow [21], an industry standard for packet sampling that the switch captures one

out on every N packets on each input port and send the sFlow message to the collector.

The sFlow message contains the sample packet’s header with metadata which includes

4

Page 13:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

the switch ID, the timestamp about capture time and forwarding information such as

input and output port.

After gathering samples, the collector uses the samples to estimate the number of

packets and byte in each flow by simply multiplying the number of sampled bytes and

packets by the sample rate, N. For example, collector received 1000 packets with the

sampling rate 50, and collector found there are 500 packets is flow A. Therefore

collector can assume that there are 50000 packets in the network, and 25000 packets is

flow A, and use the average size of samples of flow A to compute the how many bytes

flow A transmit in the network.

However, by using binomial model and central limit theorem, the expected error in

percent of sFlow can be estimated as: error% ≤ 196 •√ 1s

where s is the samples get for

class s. In order to improve accuracy, it must increase the sampling rate or increase the

sampling period.

2.1.2 OpenSample

OpenSample [4] shows that sFlow has its limit that the maximum rate is between

300 to 350 samples per second. This limit is a consequence of the switch’s control CPU

5

Page 14:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

being overwhelmed to strips off the packet payload and adds metadata. If all 350

samples are in the same flow, the expected error is 10%. The result shows that the only

way for sFlow to improve accuracy is to increase the sampling period which means to

have longer processing time to collect state.

For the purpose of low-latency measurement, OpenSample purposes an idea that

extracting flow statistics using protocol-specific information, especially for TCP flows.

Every TCP packet carries a sequence number indicating the specific byte range the

packet carries, and sFlow send the message with the header of the sample packet, so

collector can get the information about the TCP sequence number. If the collector

receives one sFlow message with sequence number s1 at time t1, and receives another

sFlow message at time t2 from the same flow, such that t1< t2. The collector can infer

the flow’s throughput by (s2 –s1)/ (t2 –t1). Therefore, if the collector gets two distinct

message of same TCP flow, it can estimate the rate of the flow accurately and reduce the

processing time to 100ms.

2.1.3 Planck

Planck [20] uses IBM RackSwitch G8264 [22] to run sFlow, and finds the

maximum rate of sFlow is about 300 samples per second which is similar to result in

6

Page 15:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

OpenSample. Placnk try to use other feature in switch to break the limits of sFlow.

Planck leverages the port mirror feature found in most commodity switches. Port

mirroring send a copy of network packets on one switch port to another switch port.

Because port mirroring copy whole packet, Planck also use the property of TCP

sequence number. By collecting two distinct copy packets of the same flow, Planck can

estimate the flow throughput. Because port mirroring doesn’t have to remove the

payload and encapsulate the metadata it decrease the latency to 7ms timescales on a

1Gbps commodity switch and 4ms timescales on a 10Gbps switch. Port mirroring

copies all traffic in one link, so the overhead is the same size as original traffic.

2.2 OpenFlow-based methods

Controller uses communicate interface, which is provided by OpenFlow, to get

state of the network.

2.2.1 Hedera

OpenFlow supports StatsReq (request) message and StatsRes (response) message.

Controller sends request message to switches to query the flow status, and the switch

sends reply message including flows’ byte count and duration (flow living time). After

7

Page 16:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

receiving reply from the switch, controller can calculate the throughput of flows.

Hedera uses the property of fat-tree topology [12], and proposed a dynamic flow

scheduling system. Controller sends StatsReq (request) message to all edge switches,

because all flow crosses at least one edge switch. After receiving all the reply, controller

uses the information about all flow source and destination to estimate flows’ demand. If

a flow throughput is beyond the estimate demand, controller try to find a new path to

reroute the flow. However, polling-base is much slower than push-based, Hedera shown

that its one control loop take 5 seconds.

2.2.2 FlowSense

FlowSense [23] calculate flow rate by PacketIn message and FlowRemoved

message which all flow has to send to the controller. On the arrival of the first packet of

a new flow, the switch send PacketIn message to ask the controller about the flow rule.

When the flow entry expires, the switch sends a FlowRemoved message to controller.

The FlowRemoved message contains the duration and the number of packets and

number bytes that using the rule.

Controller adds an entry to active flow table when received the PacketIn message,

and sets a check point when received FlowRemoved message. If all flows are expired,

8

Page 17:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

the controller can calculate the utilization at each check point. Although FlowSense can

collect statistics with no overhead, the global view created by FlowSense is actually

history view. Besides, if there are some background flow which actives all the time,

FlowSense will never get the information. Thus, the estimation is quite far from the

actual value.

Chapter 3. Design

Our two-tier measurement architecture classifies the network state into two

situations. First, collecting link state: let switch report each link utilization by

calculating port counter. Second, handling congestion: Controller finds some of links

are congested, thus controller need go deep into the links to get flow-level information

to schedule. The remainder of this section discusses each of situation what our

components do in turn.

3.1 Collecting link state

The architecture of collecting link state is illustrated in Fig. 1. We let switch

9

Page 18:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

monitor itself by port counters, and send link statistics message to the controller. We use

a single, centralized controller maintains a link utilization table, and uses these

messages to construct global view of the networks. We describe our design strategies as

follows.

Figure 1: Switches send link utilization messages to controller

3.1.1 Port counter

Although Flow-based programmable networks network need flow-level

information to make decision of each flow, using flow-level state to construct link

utilization is hard to deploy as previously discussed in introduction. Thus we see the

opportunity in port counter, which tracks the number of bytes and packets that are sent

and received for each port. Let’s focus on one link’s port counter. If we get counter C1

10

Page 19:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

of the link at time T1, and counter C2 of the link at time T2, such that T1 < T2. Upon

getting the counter C2, we can infer the link’s throughput information by computing

(C2 -C1) / (T2 -T1), and compute the link utilization is using the throughput of the link

divided by the link’s maximum speed.

The rate estimation of a link is similar to OpenSample and Planck which use TCP

sequence numbers to make the flow estimation more accuracy. Using protocol specific

information is not always workable for other type of flows, while using port counter

always work no matter what kind of the traffic is, because it only records the number of

bytes and number of packets are sent and received for each port. Furthermore, link state

is independent to other link, so we don’t have to wait other information to combine, and

error estimation about one link doesn’t affect other link state. Therefore, using port

counter not only calculates link utilization easily, but also to gives more flexibility to

construct global view of the network.

11

Page 20:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Figure 2: An example message created by switch

3.1.2 Push-based

As we mentioned above, if we get two counter of one link, we can calculate the

link utilization of the link. We can use OpenFlow StatsReq (request) message to request

port status continuously to estimate link utilization of each link. However, polling-based

method is much slower than push-based methods. Previous work has demonstrated that

routes can be installed in switches by an SDN controller in tens of milliseconds [10, 11],

so spending more time collecting statistics will limit the minimum processing time of

the control loop. Therefore, we let switch use property of port counters to compute all

link status by itself, and report the results to the controller.

After calculating all link utilization in a switch, the switch encapsulates all link

12

Page 21:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

information as a message, which is composed of switch ID, transmitting utilization and

receiving utilization of each link, and sends the message directly to the controller. Fig. 2

shows an example message. When controller receives the message from the switch, it

checks the ID of the switch, and using the information of each link in the message to

update the global link utilization table.

3.2 Handling congestion

When congestion happened, we need to get flow information to make decision.

Fig. 3 illustrates what controller do to handle the situation. The controller uses

OpenFlow StatsReq (request) message to request flows detail on the link, and calculate

all flows rate by StatsRes (response) message. With the flows throughput and global

view of link status to decide whether to change forwarding behavior.

13

Page 22:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Figure 3: Controller queries the flow information on the link

Average standard deviation99%

Conf. levelQuery all edge switches 44ms 80ms 249ms

Query one link 24ms 15ms 62msTable 1: Query delay

3.2.1 Flow-level information

Controller has global view, created by link state message, and controller knows

some of links are congested, so what statistics controller need for traffic engineering is

that the flow detail on those links. Unlike choosing port counter to get link state, how to

14

Page 23:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

get the flow information needs more considerations. If we gather all flow information at

beginning, we have to handle problems to construct global view. If we let switch push

both link state and all flow details to controller, some of the information are useless, but

both switch and controller have to put effort into those data. We choose to leverage

OpenFlow StatsReq (request) message to query the flow state to the switch which

provides the information the controller need. However, polling-based methods are much

slower, so we have to find how to query can have less response time.

We seem the time that controller sends all query messages and receives all reply as

query delay. We build a k=4 three-tier fat-tree topology [12], which with 20 switcches,

and let controller to query flows state in all edge switches and compare to query only

one link to get flows information in one edge switch for 1000 times that query delay has

an approximate normal distribution by central limit theorem. Table 1 shows the average,

standard deviation and 99% confidence interval of query delay. According to the Table

1, if we poll all edge switches to get all flows’ information to construct link utilization

table, it takes hundreds of milliseconds to handle statistics. But if we only query one

link, the waiting time can less than one hundred milliseconds to get flow information in

one link. Therefore, we let the controller query at the granularity of a link when we find

some links are congested.

15

Page 24:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Figure 4: Pseudocode of control loop

3.2.2 Control loop

In collecting link state, controller just receives message from switches and update

the entries of the link utilization table. The control loop discussed here is that controller

finds which links are congested, queries flows on the link, and makes decision. Our

16

Page 25:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

pseudocode of control loop, presented in Figure 4, and we simply describe the method

as follows.

Controller periodically checks the global view table to find congested link, which

the utilization is bigger than threshold. In previous work [8] shows that links have

average utilization above 70% can seem as congested link, so we choose 70% as the

threshold. After that, controller sends flow StatsReq message to query the flow on the

link. When controller receives the reply of one link, we generate temporary global view

table without the flows throughput in congested link. Using temporary global view table

and shortest path set of these flows to find whether there is a new best path or not. If

controller finds a new best path, controller changes the route of the flow

Chapter 4. Evaluations

In this section, we present the results of our experimental evaluations. We choose

Mininet [18] 2.2 version which uses Linux containers to emulate hosts and Open

vSwitch (OVS) [14] to emulate switches to run emulation. We update the Open vSwitch

17

Page 26:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

to 2.3.1 version in Mininet, which supports sFlow and port mirroring.

Figure 5: A small testbed create by Open vSwitch and Mininet

4.1 Small testbed

We use Mininet to create a small test environment, shown in Fig 5, with one sender

and one receiver, because of port mirroring need another port to duplicate traffic, we

add another link to Open vSwitch as a collector. We use iPerf [15] to generate traffic.

We compare three different measurement approaches: (i) OpenSample, which is based

18

Page 27:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

on sFlow (N=50) with TCP sequence number to improve accuracy. (ii) Planck, which

uses port mirroring with TCP sequence. (iii) Port counter, we implement the port

counter message architecture with Open vSwitch.

Figure 6: Throughput estimation by Planck (port mirroring) and port counter

19

Page 28:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Figure 7: Port mirroring measures bandwidth of each flow

20

Page 29:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Figure 8: Link bandwidth compares to bandwidth sum of two flow

4.1.1 Accuracy

We let host A use iPerf to generate 300 Mbps and 900 Mbps TCP traffic to host B.

We only compare to Planck (port mirroring), because OpenSample also uses TCP

sequence numbers, both estimate the rate by subtracting the two sequence numbers and

dividing by the time between two packets.

In order not to produce jittery rate estimates due to TCP’s burst behavior, we

calculate the difference of TCP sequence number when the time interval is bigger than

0.5 second in port mirroring, and we set the timer of the switch to generate a message

21

Page 30:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

every 0.5 second. Because iPerf can generate nearly same average rate traffic at

granularity of a second, we can have a smooth result with 0.5 second time interval

shown in Fig 6. We can see that if there is only one flow in the link, port counter can

estimate the flow throughput as flow-level measurement.

Next, we let host A use iPerf to create two flows to host B. Two flow share the link

bandwidth, and we measure rate of each flow by port mirroring, shown in Fig 7. We

sum of two flow information to calculate the throughput of the link, and we compare

with link state calculated by port counter, shown in Fig 8. The result shows that link

information created by port counter is nearly close to combing all flow information

traversing a link.

Averagestandard deviation

99%Conf. level

OpenSamplesFlow(N=50)

5ms 34.86ms 94.2ms

PlanckPort mirroring

0.16ms 2.94ms 7.6ms

Port counter 0.13ms 2.48ms 6.5msTable 2: Processing time between two messages

4.1.2 Processing time (Delay)

The time used to process data to estimate the status can see as the delay of view.

We experiment by letting host A use iPerf to generate 1Gbps TCP traffic to host B. We

uses two messages’ time interval as the processing time in three methods, and we run

22

Page 31:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

each method for 2000 times to calculate average processing time and standard

deviation. After that, the delay can be approximately normally distributed by central

limit theorem, and we use the 99% confidence interval as the worst processing time.

For Planck (port mirroring), two messages are two port mirroring packet found in

collector. For OpenSample (sFlow), two messages are two sFlow packets of the same

flow. For port counter, two messages are two link state messages sent to the controller.

We show the results in Table2. As previous discussed in section 2, sFlow only generates

300 to 350 samples per second, so the delay is more than Planck and port counter. We

can also see that port counter’s processing time is close to port mirroring with 1Gbps

traffic.

23

Page 32:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Figure 9: Traffic created by the measurement for 200Mbps flow

24

Page 33:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Figure 10: Traffic create by the measurement for 1Gbps flow

4.1.3 Measurement Overhead

We use the traffic create by measurement as the measurement overhead. We let

host A use iPerf to generate 200 Mbps and 1Gbps Mbps TCP traffic to host B. Fig 7

shows the result of 200 Mbps flow and Fig 8 shows the result of 1Gbps flow. We can

see that using port mirroring create much more traffic than others, because port

mirroring duplicate all traffic for a single port to a monitoring port, the overhead is the

same as the size of the original flow. sFlow only forwards the sampled packet’s header

encapsulated with metadata, so the overhead decreases dramatically, but it still increase

25

Page 34:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

with the flow rate increasing. While using port counter, port counter only generate link

state message by the timer, we can see that port counter generate nearly same overhead

no matter what rate the flow is.

Figure 11: Three independent flow environment

26

Page 35:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Figure 12: Port mirroring calculate rate of flow 1

27

Page 36:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Figure 13: Port mirroring calculate link rate of flow 1 traversing

28

Page 37:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Figure 14: Port mirroring calculate rate of flow 1

29

Page 38:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Figure 15: Port mirroring calculate link rate of flow 1 traversing

4.1.4 Oversubscribed port mirroring

In Planck, multiple ports are mirrored to a single monitoring port. To test

oversubscribed port mirroring, we build an environment with three sources, three

destinations and one mirroring collector, shown in fig 11. We generate traffic by iPerf.

At beginning, we let flow A sent by source A to destination A, 7 seconds later flow B

sent by source B to destination B. 14 seconds after A was sent, flow C is sent by source

C to destination C.

First, for each flow, we generate 100 Mbps UDP traffic, and the result is shown in

Fig 12 and Fig 13. UDP protocol doesn’t have sequence number to calculate throughput,

so we use the mirroring packet length to estimate the bandwidth, and we use iPerf local

information as the ground truth. The sum of three flows is less than 1Gbps, so the

estimation of port mirroring is similar to ground truth. All link has only one flow, so

port counter can use link state as flow estimation.

Second, for each flow, we generate 400 to 500 Mbps UDP traffic, and the results is

shown in Fig 14 and Fig 15. The sum of three flows is over than 1Gbps, so the

estimation using packet length becomes inaccuracy. All ports mirrored to a single port

introduce a problem that all traffic exceed the capacity of the monitoring port, and the

30

Page 39:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

protocol is UDP which doesn’t have sequence to improve the accuracy. Port counter

only tracks the number of bytes and packets that are sent and received for each port, so

it can estimate precisely even the flow is UDP.

4.2 Traffic engineering

In this experiment, we want to see if our measurement and control loop scenario is

feasible to do traffic engineering. We implement port counter message system with

Open vSwitch, and the control loop as modules for POX [16] OpenFlow controller. We

use Mininet to emulate the environment and use iPerf to generate workload.

31

Page 40:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Figure 16: Three-tier k=4 fat-tree topology

4.2.1 Topology

We use Mininet to create a three-tier k=4 fat-tree topology, which has 16 hosts and

20 switches, we set bandwidth of each link to 1Gbps, where bandwidth limit of a link

only support 0 to 1Gbps in Mininet version 2.2. Fig 9 shows the detail about the

topology.

4.2.2 Workload

We use the workloads similar to previous related work [4, 13, 17, 19]. A brief

32

Page 41:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

description of the workloads as follows.

Staggered Prob (EdgeP, PodP): A host sends to another host in the same edge

switch with probability EdgeP, and to its same pod with probability PodP, and to the rest

of the network with probability 1-EdgeP - PodP.

Stride(4): The node with index x sends a flow to the node with index (x + 8) mod

(num_hosts).

Random bijective: A host sends to any other host in the network with uniform

probability and every node is exactly the source of one flow

Random X3: A host sends to any other three hosts in the network with uniform

probability.

We run iPerf on each host to generate workload. Flow speed is bounded by the

minimum link bandwidth in flow path. iPerf tries to saturate link, so if there is no flow

sharing a link in the flow path, the speed will be1Gbps.

4.2.3 Comparison

To make a comparison with our method, we run other four different routing

algorithms for each of the workloads. They are Non-blocking, Planck, Hedera, ECMP,

respectively.

33

Page 42:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

Non-blocking: We let all 16 hosts connect to one 16-port Open vSwitch, and the

topology is used to represent an optimal non-blocking network as the upper bound;

Planck method: Traffic engineering work that based on port mirroring; Hedera: Traffic

engineering work that based on OpenFlow message to poll information; ECMP: Equal

Cost MultiPath (ECMP) load balancing decides flow path by hash function based on the

IP 5-tuple (IP src, dst, proto and src, dst ports). Because Hedera, Planck and our

methods use Equal Cost MultiPath (ECMP) load balancing to initial flow paths, we

choose ECMP as the baseline.

Figure 17: result for traffic engineering

34

Page 43:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

4.2.3 Result

We run each workload in 50 seconds, and calculate normalized average bisection

bandwidth of all links in the topology. Fig 10 shows the result of the experiment.

We can see that our measurement and control loop have similar performance to

port mirroring case. In Staggered Prob workload, 50% of traffic are in the same pod and

same edge, and in stride 4 workload, there are many disjoint path at initial where ECMP

can have 66% bisection bandwidth, so fast control loop doesn’t have many benefit.

When the paths become more complex and flows become more such as Random

bijective and Random X3, faster control loop can have about 10% more bandwidth than

polling method.

Chapter 5. Conclusion

We present the two-tier measurement architecture. First, switch uses port counter

to monitor its link status and sends the message to the controller. Using port counter to

calculate link utilization regardless of the type of flows and results for each link is

independent. Controller can have the global view of the network quickly and with

nearly fixed low overhead. Second, when controller finds some links are congested, it

35

Page 44:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

sends flow stats request to query flows on the link to solve link state that with little

direct insight into the flows. We show that combining link state and flow state in a

proper way can result in a good performance and low overhead measurement.

References

[1] Al-Fares, Mohammad, Alexander Loukissas, and Amin Vahdat. "A scalable,

commodity data center network architecture." ACM SIGCOMM Computer

Communication Review 38.4 (2008): 63-74.

[2] Greenberg, Albert, et al. "VL2: a scalable and flexible data center network." ACM

SIGCOMM computer communication review. Vol. 39. No. 4. ACM, 2009.

36

Page 45:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

[3] Alizadeh, Mohammad, et al. "CONGA: Distributed congestion-aware load

balancing for datacenters." Proceedings of the 2014 ACM conference on SIGCOMM.

ACM, 2014.

[4] Suh, Junho, et al. "Opensample: A low-latency, sampling-based measurement

platform for sdn." ICDCS, 2014.

[5] “OpenFlow” https://www.opennetworking.org/sdn-resources/openflow

[6] Kim, Hyojoon, and Nick Feamster. "Improving network management with

software defined networking." Communications Magazine, IEEE 51.2 (2013): 114-119.

[7] Nunes, Bruno, et al. "A survey of software-defined networking: Past, present, and

future of programmable networks." Communications Surveys & Tutorials, IEEE 16.3

(2014): 1617-1634.

[8] Kandula, Srikanth, et al. "The nature of data center traffic: measurements &

analysis." Proceedings of the 9th ACM SIGCOMM conference on Internet measurement

conference. ACM, 2009.

[9] Benson, Theophilus, Aditya Akella, and David A. Maltz. "Network traffic

characteristics of data centers in the wild." Proceedings of the 10th ACM SIGCOMM

conference on Internet measurement. ACM, 2010.

[10] Ferguson, Andrew D., et al. "Participatory networking: An API for application

control of SDNs." ACM SIGCOMM Computer Communication Review. Vol. 43. No. 4.

37

Page 46:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

ACM, 2013.

[11] Stephens, Brent, et al. "PAST: Scalable Ethernet for data centers." Proceedings of

the 8th international conference on Emerging networking experiments and technologies.

ACM, 2012.

[12] M. Al-Fares, A. Loukissas, and A. Vahdat. A Scalable, Commodity Data Center

Network Architecture. In SIGCOMM, 2008.

[13] A. R. Curtis, J. C. Mogul, J. Tourrilhes, P. Yalagandula, P. Sharma, and S.

Banerjee. DevoFlow: Scaling Flow Management for High-Performance Networks. In

SIGCOMM, 2011.

[14] “Open vSwitch” http://openvswitch.org/

[15] “iPerf” https://iperf.fr/

[16] “Pox” http://www.noxrepo.org/pox/about-pox/

[17] M. Al-fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat.

Hedera:Dynamic Flow Scheduling for Data Center Networks. In NSDI, 2010.

[18] “Mininet” http://mininet.org/

[19] N. Handigol, B. Heller, V. Jeyakumar, B. Lantz, and N. McKeown,“Reproducible

network experiments using container-based emulation,”in CoNEXT, 2012.

[20] Rasley, Jeff, et al. "Planck: millisecond-scale monitoring and control for

commodity networks." Proceedings of the 2014 ACM conference on SIGCOMM. ACM,

2014.

38

Page 47:  · Web viewIf we gather all flow information at beginning, we have to handle problems to construct global view. If we let switch push both link state and all flow details to controller,

[21] “IBM BNT RackSwitch G8264”

http://www.redbooks.ibm.com/abstracts/tips0815.html

[22] “sFlow” http://sflow.org/

[23] Yu, Curtis, et al. "Flowsense: Monitoring network utilization with zero

measurement cost." Passive and Active Measurement. Springer Berlin Heidelberg,

2013.

39