qos-aware vm placement in multi-domain service level ...klu/qos-aware vm placement in... · iaas...

8
QoS-Aware VM Placement in Multi-Domain Service Level Agreements Scenarios Kuan Lu * , Ramin Yahyapour * , Philipp Wieder * , Constantinos Kotsokalis , Edwin Yaqub * , Ali Imran Jehangiri * * Gesellschaft f¨ ur wissenschaftliche Datenverarbeitung mbH G¨ ottingen, Germany Email: {kuan.lu, ramin.yahyapour, philipp.wieder, edwin.yaqub, ali.jehangiri}@gwdg.de IT & Media Center of Dortmund University of Technology, Germany Email: [email protected] Abstract—Virtualization technologies of Infrastructure-as-a- Service enable the live migration of running Virtual Machines (VMs) to achieve load balancing, fault-tolerance and hardware consolidation in data centers. However, the downtime / service unavailability due to live migration may be substantial with relevance to the customers’ expectations on responsiveness, as the latter are declared in established Service Level Agree- ments (SLAs). Moreover, it may cause significant (potentially exponential) SLA violation penalties to its associated higher- level domains (Platform-as-a-Service and Software-as-a-Service). Therefore, VM live migration should be managed carefully. In this paper, we present the OpenStack version of the Generic SLA Manager, alongside its strategies for VM selection and allocation during live migration of VMs. We simulate a use case where IaaS (OpenStack-SLAM) and PaaS (OpenShift) are combined; and assess performance and efficiency of the aforementioned VM placement strategies, when a multi-domain SLA pricing & penalty model is involved. We find that our proposal is efficient in managing trade-offs between the operational objectives of service providers (including financial considerations) and the customers’ expected QoS requirements. Index Terms—Live Migration, Virtual Machines, IaaS, PaaS, Availability, SLA pricing, SLA penalties, Resource Allocation. I. I NTRODUCTION In Infrastructure-as-a-Service (IaaS), through virtualization technologies (e.g., VMWare [1], Xen [2]), physical resources of data centers can be partitioned into flexible and scalable virtual computing units, namely Virtual Machines (VMs). However, large-scale data centers introduce large power- consumption costs. Thus, an efficient technique that dynam- ically reconfigures the IT infrastructure to reduce the total power consumption becomes necessary. As such, VM con- solidation emerges to execute the VMs on as few servers as possible, to concentrate the workloads and to limit efficiently the number of physical servers powered on [22]. VM consolidation is usually treated as an objective of the service provider (SP). From customer’s perspective, an auto- mated negotiation may be used to accommodate heterogeneous requirements against an SP’s capabilities and acceptable usage terms. The result of such a negotiation is a Service Level Agreement (SLA), an electronic contract that establishes all relevant aspects of the service. During the SLA negotiation, all terms must be evaluated before a final agreement is reached. In order to commit to the requested Quality-of-Service (QoS) terms (e.g., service performance and availability), SPs have to assess their own resource management strategies so as to trade- off profit making with a guaranteed delivery of service(s) and avoid penalties in case the agreement is violated at runtime. An aggressive consolidation of VMs however may lead to performance degradation when the service faces increasing demand, resulting in unexpected rise of resource utilization. Through VM live migration, both VM consolidation and ser- vice performance can be coordinated and balanced. However, short downtimes of services are unavoidable during VM live migration due to the overheads of moving the running VMs. Hence, the respective service interruptions in IaaS reduce the overall service availability and it is possible that the customer’s expectations on responsiveness are not met [27]. Moreover, it might bring exponential service violation penalties to its associated domains (e.g., Software-as-a-Service (SaaS) and Platform-as-a-Service (PaaS)). By way of an example, the solutions in [22] provide availability from 99.7% to 99.93%. For e-commerce and other industrial use cases, a service availability value below 99.9% is usually considered unac- ceptable [26]. In order to provide high availability, to avoid service violation and the subsequent penalties, the number of live migrations should be monitored and controlled. In this paper we present the OpenStack version of the Generic SLA Manager (GSLAM), which could be potentially used into project PaaSage [16]. We apply this software sys- tem to combine IaaS (OpenStack-SLAM) and PaaS (Open- Shift [17]) in a use case that features multi-domain SLA management. Via the introduction of a pricing and penalty model that considers such multi-domain scenarios, we apply our resource allocation strategies for VM selection and alloca- tion during live migration. We simulate the full scenario (using the CloudSim [9] platform) and illustrate the suitability of our proposal for the efficient management of VM live migration. Thereby, the agreed service availability is not violated without paying the extra penalties and a trade-off between the SP’s objectives and the customers’ expected QoS requirements can also be achieved successfully. The remainder of this paper is structured as follows. In Section II, we discuss related work. Section III presents the OpenStack version of the GSLAM. In Section IV, a formal model of SLA pricing and penalty is provided. Section V gives a description of our approach to process resource al- location and how it achieves important resource management

Upload: others

Post on 12-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: QoS-Aware VM Placement in Multi-Domain Service Level ...klu/QoS-Aware VM Placement in... · IaaS (OpenStack-SLAM) and PaaS (OpenShift) are combined; ... short downtimes of services

QoS-Aware VM Placement in Multi-DomainService Level Agreements Scenarios

Kuan Lu∗, Ramin Yahyapour∗, Philipp Wieder∗, Constantinos Kotsokalis†, Edwin Yaqub∗, Ali Imran Jehangiri∗∗Gesellschaft fur wissenschaftliche Datenverarbeitung mbH Gottingen, Germany

Email: {kuan.lu, ramin.yahyapour, philipp.wieder, edwin.yaqub, ali.jehangiri}@gwdg.de†IT & Media Center of Dortmund University of Technology, Germany

Email: [email protected]

Abstract—Virtualization technologies of Infrastructure-as-a-Service enable the live migration of running Virtual Machines(VMs) to achieve load balancing, fault-tolerance and hardwareconsolidation in data centers. However, the downtime / serviceunavailability due to live migration may be substantial withrelevance to the customers’ expectations on responsiveness, asthe latter are declared in established Service Level Agree-ments (SLAs). Moreover, it may cause significant (potentiallyexponential) SLA violation penalties to its associated higher-level domains (Platform-as-a-Service and Software-as-a-Service).Therefore, VM live migration should be managed carefully. Inthis paper, we present the OpenStack version of the Generic SLAManager, alongside its strategies for VM selection and allocationduring live migration of VMs. We simulate a use case whereIaaS (OpenStack-SLAM) and PaaS (OpenShift) are combined;and assess performance and efficiency of the aforementionedVM placement strategies, when a multi-domain SLA pricing &penalty model is involved. We find that our proposal is efficient inmanaging trade-offs between the operational objectives of serviceproviders (including financial considerations) and the customers’expected QoS requirements.

Index Terms—Live Migration, Virtual Machines, IaaS, PaaS,Availability, SLA pricing, SLA penalties, Resource Allocation.

I. INTRODUCTION

In Infrastructure-as-a-Service (IaaS), through virtualizationtechnologies (e.g., VMWare [1], Xen [2]), physical resourcesof data centers can be partitioned into flexible and scalablevirtual computing units, namely Virtual Machines (VMs).However, large-scale data centers introduce large power-consumption costs. Thus, an efficient technique that dynam-ically reconfigures the IT infrastructure to reduce the totalpower consumption becomes necessary. As such, VM con-solidation emerges to execute the VMs on as few servers aspossible, to concentrate the workloads and to limit efficientlythe number of physical servers powered on [22].

VM consolidation is usually treated as an objective of theservice provider (SP). From customer’s perspective, an auto-mated negotiation may be used to accommodate heterogeneousrequirements against an SP’s capabilities and acceptable usageterms. The result of such a negotiation is a Service LevelAgreement (SLA), an electronic contract that establishes allrelevant aspects of the service. During the SLA negotiation,all terms must be evaluated before a final agreement is reached.In order to commit to the requested Quality-of-Service (QoS)terms (e.g., service performance and availability), SPs have to

assess their own resource management strategies so as to trade-off profit making with a guaranteed delivery of service(s) andavoid penalties in case the agreement is violated at runtime.An aggressive consolidation of VMs however may lead toperformance degradation when the service faces increasingdemand, resulting in unexpected rise of resource utilization.

Through VM live migration, both VM consolidation and ser-vice performance can be coordinated and balanced. However,short downtimes of services are unavoidable during VM livemigration due to the overheads of moving the running VMs.Hence, the respective service interruptions in IaaS reduce theoverall service availability and it is possible that the customer’sexpectations on responsiveness are not met [27]. Moreover,it might bring exponential service violation penalties to itsassociated domains (e.g., Software-as-a-Service (SaaS) andPlatform-as-a-Service (PaaS)). By way of an example, thesolutions in [22] provide availability from 99.7% to 99.93%.For e-commerce and other industrial use cases, a serviceavailability value below 99.9% is usually considered unac-ceptable [26]. In order to provide high availability, to avoidservice violation and the subsequent penalties, the number oflive migrations should be monitored and controlled.

In this paper we present the OpenStack version of theGeneric SLA Manager (GSLAM), which could be potentiallyused into project PaaSage [16]. We apply this software sys-tem to combine IaaS (OpenStack-SLAM) and PaaS (Open-Shift [17]) in a use case that features multi-domain SLAmanagement. Via the introduction of a pricing and penaltymodel that considers such multi-domain scenarios, we applyour resource allocation strategies for VM selection and alloca-tion during live migration. We simulate the full scenario (usingthe CloudSim [9] platform) and illustrate the suitability of ourproposal for the efficient management of VM live migration.Thereby, the agreed service availability is not violated withoutpaying the extra penalties and a trade-off between the SP’sobjectives and the customers’ expected QoS requirements canalso be achieved successfully.

The remainder of this paper is structured as follows. InSection II, we discuss related work. Section III presents theOpenStack version of the GSLAM. In Section IV, a formalmodel of SLA pricing and penalty is provided. Section Vgives a description of our approach to process resource al-location and how it achieves important resource management

Page 2: QoS-Aware VM Placement in Multi-Domain Service Level ...klu/QoS-Aware VM Placement in... · IaaS (OpenStack-SLAM) and PaaS (OpenShift) are combined; ... short downtimes of services

objectives. Then, in Section VI, we validate our mechanismsperforming discrete event simulations. Finally, we concludethe paper in Section VII.

II. RELATED WORK

Many publications discuss the topic of SLA management forIT clouds, but most of them are looking at it from a conceptualand architectural point of view, e.g., [3] [4] [5]. To the best ofour knowledge, there is no prior work for SLA management inOpenStack that considers multi-domain pricing and penalties,and that can be further applied to develop QoS-aware resourceallocation strategies.

Many works also discuss SLA violations, however, thereare few to model the consequences of the violations, namely,SLA penalties. Currently, although some approaches describepenalties [6] [7] [8], they do not satisfy all of the followingrequirements for formulating complex penalty expressions ina single unambiguous model:

• Able to present the chain effect on penalties among multi-domains due to the violation in IaaS layer.

• Full flexibility regarding QoS levels agreed and/orachieved, without being constrained (e.g., by pre-specified classes of service).

• Openness and applicability to different domains, withoutdependence on specific languages or taxonomies.

As regards VM live migration, there are certain approachesalready widely utilized. Through shared storage, the process oflive migration is reduced to only copying memory state andCPU registers from source host to destination [24], withouttransferring the whole VM. In contrast to the pure stop-and-copy strategy of offline VM migration, live migration fine-tunes the migration into several rounds of pre-copying andone last round of stop-and-copy with much less downtime.Nevertheless, based on the impact of VM migration in [25],there are still issues to address. Firstly, the more iterations inthe pre-copying phase, the less data need to be transferred inthe final stop-and-copy phase, and the downtime becomes less.However, the short downtime comes at the cost of longer totalmigration time, which leads to significant influence on serviceperformance, system load and network [26]. Secondly, thedowntime will never be eliminated, because many workloadsusually include some memory pages that are updated fre-quently, named Writable Working Sets (WWS) [24]. Clearly,it is wise not to maintain it until the last phase. Moreover,in stop-and-copy, the performance of VM live migration isaffected by many factors, such as workload type, hypervisortype and so on [27].

Thus, no visible downtime is only an ideal goal. Thenumber of migrations should be controlled, as it has impacton the performance of the other running VMs in the sourceand destination hosts, which is mainly proportional to theamount of memory allocated to the VM. In this paper, the VMperformance violation happens, when the VM experiences theCPU utilization of 100%. And the performance degradationhappens during the VM migrations.

Service availability is one of the most important QoSmetrics in IT clouds. In [20], authors outline that the serviceavailability guaranteed by three large cloud providers (Ama-zon, Google and Rackspace Cloud) is more than 99.9% inorder to obtain good reputation in today’s competitive market.Therefore, in upcoming sections we propose to provide a basicservice availability of 99.9% and an advanced availability of99.99% in the scope of VM live migration. The later oneimplies that the SP has to pay special attention (e.g., extraresources) on the service in order to avoid SLA violations.

For VM consolidation, in [21] [22] [23], the authors mainlydiscussed:

• Resource management, either in simulation or some pro-totype implementations within the common cloud mid-dleware; no SLA management is introduced.

• Using either artificial workloads or partial historical in-formation from the workloads in various projects.

• Using VM live migration to leverage the consolidationand service performance; however, the migration wasmisused without carefully taking service availability intoconsideration.

Therefore, in this paper, our goals are the following:• A proof-of-concept prototype implementation of Open-

Stack SLA management (IaaS), which aims to be con-nected with PaaS and SaaS layers by providing SLAlifecycle, service customization and automatic scalability.

• By using the workloads information at GWDG, ourstrategies are compared with others in several aspects.

• The influence of SLA chain penalties in multi-domainsis introduced into VM consolidation. Also, availability-oriented VM allocation strategies that control VM livemigration and leverage the objectives between the SP andthe customer.

III. OPENSTACK SLA MANAGEMENT FRAMEWORK

Based on the Generic SLA Manager (GSLAM) in ourprevious project SLA@SOI [11], an OpenStack-SLAM is pre-sented. The GSLAM (Figure 1) provides a generic architecturethat can be used across different domains and use cases tomanage the entire SLA life cycle, including activities such asSLAs modeling [13], negotiation, provisioning, resources out-sourcing, monitoring and adjustment [14]. Through OpenStackNova API Woorea [19], the GSLAM is able to implement itscorresponding Infrastructure Service Manager (ISM) [15] by

• Querying the status of infrastructure during SLA nego-tiation, based on which the SP is able to generate thecorresponding offer / counter-offer for the customer;

• Providing SLA terms (i.e. pricing, penalty, service avail-ability and performance) monitoring mechanism togetherwith OpenStack;

• Deploying the VM allocation strategies within OpenStackNova so as to maximize the profit and minimize SLAviolation as well as energy consumption;

• Creating, customizing and deploying the agreed services;• Reconfiguring and removing the services on demand.

Page 3: QoS-Aware VM Placement in Multi-Domain Service Level ...klu/QoS-Aware VM Placement in... · IaaS (OpenStack-SLAM) and PaaS (OpenShift) are combined; ... short downtimes of services

PaaS / SaaS

Generic SLA Manager

Service Demand

Business Evaluation

SLA(Re-) Negotiation

SLA Modeling & Service ManagerSLA

Business & Penalty Modeling

Query Reply

Hypervisors

Infrastructure

Physical Machine

VM VM VM

Mapping

Forward

CloudSim

Simulation

Data Center

VM Allocation Algorithms Service Engine

Planning and Optimization

Availability

Service(Re-)Provisioning

DownTime Estimator

OutsourcingService Selection

Infrastructure

Physical Machine

VM VM VM

OpenStack

NovaVM Allocation

Woorea API

Failure Rate Price Fixing

Rate Cloud Broker

Keystone

Glance

Swift

KeyPairs

sub-contract

Migration

MappingMigration

Fig. 1. Integration of the Generic SLA Manager and OpenStack

PaaSage aims to deliver an open and integrated PaaS fordifferent (e.g., industrial, e-science and public sector) usecases to support model-based lifecycle management of cloudapplications. Using OpenShift, one of the most popular PaaSimplementations, it could auto-scale its cloud PaaS frameworkfor Java, Perl, PHP, Python and Ruby applications delivered ina shared-hosting model [17]. PaaS permits many applicationsoffered by multiple development teams to co-exist on the sameset of hosts in a safe and reliable fashion. In addition to that,the platform offers a variety of opportunities for multi-tenantdeployments. Thus, an application that is intended to workfor a single organizational unit can also be deployed in sucha manner that many organizations or end-users are able touse it [18]. Therefore, end-users benefit from application man-agement (instead of VM level management) while applicationproviders can bring their applications into the PaaS cloud withminimal effort.

As Figure 2 illustrates, OpenShift can be treated as acustomer of the OpenStack-SLAM asking for infrastructuresupport. Our target is to automatically scale up and downthe virtual resources (VMs) for the PaaS domain as needed.The SLAM not only provides the VMs, it is also able tocustomize the VMs using pre-defined scripts so as to providethe “OpenShift-ready” VMs in one click. Specifically, let ussuppose a PaaS SP starts SLA negotiation with an IaaS SP.When the IaaS SP has sufficient resources, it will send acounter-offer back to the PaaS SP with a timeout. Once theoffer is accepted within the timeout, then the VM will becreated, and the SLAM will automatically log into the VM bymatching the public key pair with its private key. Finally, thepre-installed scripts will be executed on the target VM. Theexecution includes three steps in general, namely:

• Installing the PaaS broker-specific packages.• Assigning a public IP for the VM.• Configuration of Mongo database / ActiveMQ / other

components associated with the PaaS broker [17].

PaaSBroker SLA Manager OpenStack

alt[not enough resources]

[else]

request query

rejection

reply

set timeoutbook

offer reply

[accept]

[rejection]

accept offercreate VM with KeyPair

reply

agreement

altalt

[within timeout]

[timeout]

replyreject rollback, cancel booking

cancelrollback, cancel booking

timeout

set up Openshift_ready VM with pre-defined scripts

Fig. 2. Sequence diagram for the negotiation between PaaS and theOpenStack SLA Manager

Thus, the VM can be recognized and controlled by the PaaSbroker. Similarly, in case the host is detected under-loaded, theinfrastructure can easily be scaled down by removing the VMs.Here, PaaS and IaaS layers are technically interconnected. InSection IV, we will see how are they mutually influenced interm of economical aspect during SLA violation.

IV. MODELING OF SLA PRICING AND PENALTY

According to our work in [10], IaaS SPs are able to computethe minimum implementation costs as part of price quotationstowards customers, in order to remain competitive. On thesame time, profit and SLA violation probability constraintsare used to decide whether the problem can be satisfied atall, and what is the decision space based on which implemen-tation costs can be calculated. Furthermore, outsourcing via

Page 4: QoS-Aware VM Placement in Multi-Domain Service Level ...klu/QoS-Aware VM Placement in... · IaaS (OpenStack-SLAM) and PaaS (OpenShift) are combined; ... short downtimes of services

subcontracts was included as part of the decision process, toachieve additional profit but also to sustain customers whenlocal resources are not sufficient. Here, we assume that thecorresponding planning and optimization strategies in [10] arefully applied but not explained in detail in order to keep thepaper reasonably concise. Therefore, let us assume an IaaSservice i, and an SLA that governs consumption of this serviceby a certain customer. We have:

Ci = CiImple + Pri (1)

CiImple = CiI + CiE (2)

CiI = CiEnergy + CiUtility (3)

where the cost Ci of service i is the sum of internal cost CiI(i.e. internally utilized resources, energy cost) and external costCiE (i.e. sub-contracted resources) as well as profit Pri.

In PaaS, a container includes a set of resources that allowsusers to run their applications. By delivering such a computingplatform (e.g., operating system, program execution environ-ment), many containers can be run simultaneously on one VM(see Figure 3). We assume on each VM there are n containers.

Container

VM

...

App App App

...

... ...

... ... ...

IaaS

PaaS

SaaS

Users

ContainerContainer

Fig. 3. Hierarchical structure of the services in different domains

Therefore, the cost of each PaaS service p is:

Cp =Ci

n+ Prp (4)

SaaS developers can implement and deploy their appli-cations on a cloud platform (e.g., a container) without thecost and complexity of buying and managing the underlyinghardware and software layers. Similarly, we assume on eachcontainer m applications are allocated. Then, the cost of aSaaS service s is defined as following:

Cs =Cp

m+ Prs (5)

Cp and Cs apply only based on the assumption that the pay-ment has no implementation costs other than the infrastructureand platform environment for service execution.

Meanwhile, an SLA should also contain a set of penaltyclauses specifying the responsibilities in case the SPs fail todeliver the pre-agreed QoS terms. Thus, we will use a variationof our previously described penalty model [12] as outlined inEquation 6:

Rx =∑k

GWk · V Rk (6)

Rx is the penalty ratio associated with the cost of service x,where GWk is the weight of one specific guarantee being vi-olated, for this specific combination of guarantees. This valuemay be arbitrarily high. It allows the negotiating customer toexpress the importance of honoring certain guarantees in thispenalty function. V Rk is the violation ratio: the relationshipbetween achieved quality and planned quality. It indicates howfar the offered quality has drifted from the agreed quality ofa specific service parameter. Therefore, the penalty of IaaSservice i is:

P si (QoSs1 , ..., QoS

st ) = Ci ·Ri (7)

The penalty of all PaaS services p is:

n · P sp (QoSs1 , ..., QoSst ) = n · Cp ·Rp (8)

By applying Equation 4, we have:

n · Cp ·Rp = (Ci + n · Prp) ·Rp (9)

The penalty of all SaaS services s is:

m · n · P ss (QoSs1 , ..., QoSst ) = m · n · Cs ·Rs (10)

By applying Equation 5, we have:

m · n · Cs ·Rs = (Ci + n · Prp + n ·m · Prs) ·Rs (11)

The violation of some QoS terms on the IaaS layer will auto-matically affect the other domains. For example, unavailabilityof a VM will unquestionably enforce its inner PaaS and SaaSservices to be unavailable. For all these QoS terms in the threelayers, we have Ri = Rp = Rs = R. Thus, the extra penaltiesof the PaaS layer comparing with the IaaS layer is:

(9)− (7) = n · Prp ·R (12)

Similarly, the extra penalties of SaaS layer comparing withIaaS layer is:

(11)− (7) = (n · Prp +m · n · Prs) ·R (13)

Hence, a slight availability violation in IaaS will lead toexponential influences on its associated domains (PaaS andSaaS). An IaaS SP, in order to compliant with the SLAs, hasto make optimal reaction and adjustment while the service isrunning.

Page 5: QoS-Aware VM Placement in Multi-Domain Service Level ...klu/QoS-Aware VM Placement in... · IaaS (OpenStack-SLAM) and PaaS (OpenShift) are combined; ... short downtimes of services

V. SLA-BASED RESOURCE ALLOCATION ANDPROVISIONING

Through VM live migration, both VM consolidation andservice performance can be coordinated. Nevertheless, shortdowntimes of service migration are unavoidable due to theoverheads of moving the VMs. Hence, the respective serviceinterruptions in IaaS reduce the overall service availability andthis could also be the main cause of the chain effect on penal-ties between domains. Here, the term downtime is used to referto periods when a service is unavailable and fails to provideor perform its primary function to customers. The downtimecan be further classified to be planned and unplanned. Sinceunplanned downtime, e.g., failure of the system, is complicatedand uncertain in a simulation environment, in our work, weonly consider the planned downtime for evaluating the serviceavailability. The downtime that is introduced by VM livemigration is a kind of planned downtime. Thus, the availabilityis formulated as below:

Service Availability =Ta

Ta + Tb× 100 (14)

where Ta is service uptime and Tb is service downtime.We are focusing on how to manage the number of live

migrations so as to control availability according to the es-tablished SLA during resource allocation. The optimization ofresource allocation problem in a data center can be executedin two steps: initial selection of VMs that need to be migrated,then the chosen VMs will be placed on the hosts using a VMallocation algorithm.

A. VM Selection

Algorithm 1 VM Selection-AVInput: req availabilityOutput: selectedVM

migratableVms ← getMigratableVms()totalTime ← 86400 // 24 hours in secondsvmSize ← migratableVms.getSize()vm, downtime, totalDowntime, availability ← NULLsortByAvailability(migratableVms) // descending orderfor v ← 1 to vmSize do

vm ← migratableVms[i]downtime ← vm.downTimeEstimator()totalDowntime ← vm.totalDowntimeavailability ← 1- totalDowntime+downtimetotalT imeif availability>req availability then

selectedVM ← vmbreak

elsecontinue

end ifend forselectedVM.updateDownTime()return selectedVM

In Algorithm AV, the input value is the requested availabilityof customer and the output value is final selected VM that willbe migrated. Firstly, all the migratable VMs will be selectedby removing the VMs that are already in migration. Thus,the selected VMs will be sorted in descending order of theircurrent service availabilities. Then, the availability of each VMin the sorted VMs list will be re-calculated to check whetherthe availability is still greater than the requested availabilityor not, when the VM is migrated. Finally, if such a VM canbe found, then we will migrate it and update the downtimerecord of this VM. Otherwise, no VM will be migrated. Thecomplexity of the selection part of algorithm is O(n), n beingthe number of migratable VMs.

B. VM Allocation

Algorithm 2 VM Allocation-AVLInput: optimalHostUtility, selectedVMOutput: allocatedHost

minimalDiff ← Double.MAX VALUEhosts ← getHostList(); host, hostUtility, diff ← NULLfor h ← 1 to hosts.size do

host ← hosts[i]if excludedHosts.contains(host) then

continueend ifhostUtility ← host.getUtilizationOfCpu()if host.isSuitableForVm(selectedVM) then

if hostUtility!=0 &overUtilizedAfterAllocated(host, selectedVM) then

continueend ifdiff ← Math.abs(hostUtility-optimalHostUtility)if diff < minimalDiff then

minimalDiff ← diffallocatedHost ← host

end ifend if

end forreturn allocatedHost

In Algorithm AVL, the inputs are optimal host utilities andselected VMs, the outputs are the hosts to where each selectedVM will be migrated. First of all, the over-loaded host(s) andthe host(s), which is(are) going to be over-loaded after allo-cating the migrated VM, will not be considered. Then, a host,the utility of which is the closest to optimalHostUtility,will be selected. Here, the optimalHostUtility is not a fixedvalue and will be discussed in Section VI. We want to findthe relationship between the utilization of target allocation hostand the value of QoS terms. Specifically, by considering theservice availability constraint, our goal is to allocate the VMto a host that provides the least increase of power consumptionand service performance violation due to this allocation. Thecomplexity of the allocation part of algorithm is O(n), n beingthe number of hosts.

Page 6: QoS-Aware VM Placement in Multi-Domain Service Level ...klu/QoS-Aware VM Placement in... · IaaS (OpenStack-SLAM) and PaaS (OpenShift) are combined; ... short downtimes of services

99.90 99.92 99.94 99.96 99.99

5055

6065

7075

8085

Availability(%)

Ene

rgy

Con

sum

ptio

n(kw

/h)

99.90 99.92 99.94 99.96 99.99

0.00

0.05

0.10

0.15

Availability(%)

Per

form

ance

Deg

rada

tion

of M

igra

tion

(%)

99.90 99.92 99.94 99.96 99.99

05

1015

Availability(%)

SLA

vio

latio

n Ti

me

per A

ctiv

e H

ost (

%)

(a) (b) (c)

99.90 99.92 99.94 99.96 99.99

0.000

0.005

0.010

0.015

0.020

0.025

0.030

Availability(%)

SLA

Vio

latio

n(%

)

0.5% 1% 2.5% 5% 10% 25% 50% 100%

1416

1820

2224

Target Host Utilization

Ove

rall

SLA

Per

form

ance

Vio

latio

n

0.5% 1% 2.5% 5% 10% 25% 50% 100%

110

120

130

140

150

Target Host Utilization

Ene

rgy

Con

sum

ptio

n(kw

/h)

(d) (e) (f)

Fig. 4. (a-d) Relationship between SLA availability, energy and SLA performance. (e-f) Optimal target host utilization for allocating the migrated VM(s)

C. VM Live Migration Downtime EstimatorAs discussed in Section II, the overall duration and short

downtime that are introduced by VM live migration areessential properties while implementing service availabilityin an SLA. In this section, we introduce a VM live migra-tion downtime estimator into CloudSim. As modeled in [26]and [27], using migration bounds, the downtime of VM livemigration is defined in lower and upper bounds as follows:

mig overhead ≤ td ≤ mig overhead+VMSize

LinkSpeed(15)

In order to estimate better the downtime value, the au-thors summarized four main factors that affect the downtime,namely: available link speed, average page dirty rate, VMmemory size and migration overheads. The link speed andpage dirty rate are proportional to the access traffic of theserver applications in a day. The access traffic reaches thehighest value at noon and in the morning and late night it willreach the lowest value. Therefore, we assume the probability ofdetermining live migration downtime is consistent with normaldistribution as following:

f(x) =1√2πσ2

e−(x−µ)2/2σ2

(16)

where 0 ≤ x ≤ 24 (hour), expected value µ = 12 andvariance σ2 = 1.9, which means at 12 o’clock the server

application usually reaches the highest access traffic. Forinstance, a VM loading server application workloads has 1024MB memory and 1 Gbps migration link. As such, the lowerand upper bounds of migration are around 314 ms and 9497.8ms respectively [26].

VI. EXPERIMENTAL RESULTS

We choose CloudSim as our simulation platform in orderto validate the approaches in Sections V-A and V-B. Accord-ing to [21], CloudSim is able to model and trace the en-ergy consumption and SLA performance with automatic hostover/under-loading detection, which reduces the preparation ofour simulation work mainly to focusing on SLA availability-based VM selection and allocation strategies.

Based on the Cloud infrastructure and workloads at GWDGin Germany, a virtual data center is simulated, including 120virtual hosts. 81 VMs (2 Euro for each) are created, in which244 containers (3 containers for each VM) are generated withthe corresponding 732 application workloads (3 workloads foreach container). 1 kw/h of electricity costs 0.2 Euro. Eachcontainer and application workload will make a 2-Euro profitrespectively. The whole simulation time is 24 hours. Once thecloud environment in CloudSim is setup, it will automaticallyallocate the workloads into the VMs. The interval of utilizationmeasurements is 5 minutes.

Page 7: QoS-Aware VM Placement in Multi-Domain Service Level ...klu/QoS-Aware VM Placement in... · IaaS (OpenStack-SLAM) and PaaS (OpenShift) are combined; ... short downtimes of services

As it was already defined in CloudSim, SLA violationTime per Active Host (SLATAH) is the percentage of time,during which active hosts have experienced the CPU utilizationof 100%. And Performance Degradation due to Migrations(PDM) is the overall performance degradation by VMs due tomigrations. As such, SLA performance violation is:

SLAV = SLATAH · PDM [22] (17)

By applying the Algorithm V-A into the CloudSim, we wantto test how the SLA availability constraint affects energyconsumption and SLA performance. Therefore, the differencebetween 99.9% and 99.99% is equally divided into 40 in-tervals, and we set each interval as the req availability ofAlgorithm V-A. The results are illustrated in Figure 4 (a-d). In Figure 4 (a), when we set SLA availability constraintas 100%, meaning VM live migration is not applied, 88 of120 hosts are always in active, thus all VMs have sufficientresources for their applications without SLA performanceviolation. However, this leads to huge energy consumption(around 85 kwh, see upper-right corner of the Figure). OnceVM live migration is applied, VM consolidation is alwaysconsidered in order to save on energy consumption. Thus,for example, at 99.99% constraint, 50 hosts are turned intoenergy saving mode and the energy consumption decreasesdramatically (around 52 kwh). On the contrary, when theavailability constraint is not strict (e.g., 99.9%), the SLATAHvalue (Figure 4 (c)) is relatively low, because as long as anover-load situation is detected, it will be resolved by migratingthe VM(s) to the other host(s). Nevertheless, the correspondingPDM value (Figure 4 (b)) is very high, because:

• In under-loaded situations, if all the VMs on this hostcan be migrated to other host(s), the number of VMmigrations is increased. However, this leads to “circularflow” for some VMs, meaning they are migrated betweenhosts back and forth. Hence, although server shuttingdown count increases, energy is actually not saved.

• Extra migrations will definitely lead to performancedegradation, therefore the PDM value is also increased.

By applying Equation 17, the final SLA performance violationis as illustrated in Figure 4 (d). Therefore, from the firstexperiment, we find that VM live migration can efficientlyreduce the energy cost of data center. However, the number ofmigrations should be balanced in order to achieve the desiredservice availability and performance requests from customers.

By applying Algorithm V-B into CloudSim, we strive tofind which host will be the optimal destination for allocatingthe migrated VM(s). The host utility between 0% and 100% isdivided equally into 200 intervals, and we set each interval asthe optimalHostUtility in Algorithm V-B. By default, 80%is the threshold utilization. As illustrated in Figure 4 (e-f),migrating a VM to a host whose utility is the closest to thethreshold utility, will lead to the least energy consumption andSLA performance violation. Because otherwise, some VMsfrom the source host will be migrated back and forth untilone of them cannot be moved anymore. This not only increases

the number of VM migrations unnecessarily and blocks rea-sonable future migrations due to the availability constraint,but also introduces further energy consumption. As such, inAlgorithm V-B we can replace the optimalHostUtility withthresholdHostUtility.

In our final experiment, by taking the results of the ex-periments above, we set 99.99% as target service availabilityand 80% as target host CPU threshold utilization during VMlive migration. Using our workloads, we compare the VMallocation algorithms (THR, IQR, MAD, LR and LRR) andthe VM selection algorithms (MMT, RS and MC) of CloudSimwith our approach, namely AVL/AV. Consequently, the results(Figure 5 (a)) show that on average the current resourceallocation algorithms in CloudSim are able to provide serviceavailability for each VM from around 99.7% to 99.93%. In ourapproach, the service availability is always kept as 99.99%, theSLA violation is 0.00155 % and the energy consumption is54.22 kw/h. Whereas, the other approaches in CloudSim in-troduce around 37 to 51 kw/h energy consumption and higherSLA performance violation (Figure 5 (b)). Although AVL/AVintroduces slightly more energy than the other approaches,using VM consolidation in general still saves much moreenergy (85 kwh in non-consolidation case).

The penalty model in Section IV applies when SLA vi-olation happen. The VM allocation algorithms, using THR,IQR or MAD as VM selection strategy, will lead to anavailability lower than 99.9%. In this case, the penalty of allthree approaches will be increased up to the full cost. Thereby,we selected all the representative approaches to compare withours in order to find the penalty chain influences in three clouddomains. In Figure 5 (c), our approach introduces the fewestpenalties. Whereas by using other approaches, penalties areincreased exponentially in PaaS and SaaS layers. Especially,the THR approach will return the full cost of the service dueto the availability violation.

VII. CONCLUSIONS AND FUTURE WORK

In this paper we present an OpenStack version of theGeneric SLA Manager (GSLAM), which could be furtherapplied into project PaaSage. By combining IaaS (OpenStack-SLAM) and PaaS (OpenShift) as a use case, applying amulti-domain SLA pricing & penalty model and introducing aresource allocation strategy, we show experimentally that wecan manage VM live migration more efficiently than currentstate of the art.

In the future, based on the above simulation results, wewould like to take this work one step further by using the cloudemulation tool “Emusim” to automatically extract informationfrom various types of workloads (e.g., CPU intensive, memoryintensive, network intensive etc.) via emulation and then usesthis information to generate the corresponding simulationmodel. Thus, the results of emulation can be used to provethe correctness and accuracy of simulation.

In order to set performance monitoring policies in an SLAto alert the SLAM when a server nears the threshold ofsatisfactory performance as agreed upon with the customer,

Page 8: QoS-Aware VM Placement in Multi-Domain Service Level ...klu/QoS-Aware VM Placement in... · IaaS (OpenStack-SLAM) and PaaS (OpenShift) are combined; ... short downtimes of services

THR/RS

THR/MC

THR/MMT

IQR/RS

IQR/MMT

MAD/RS

MAD/MC

MAD/MMT

LRR/RS

LRR/MC

LRR/MMT

LR/RS

LR/MC

LR/MMT

AVL/AV

99.5

99.6

99.7

99.8

99.9

100.0

Ava

ilabi

lity

(%)

THR/RS

THR/MC

THR/MMT

IQR/RS

IQR/MMT

MAD/RS

MAD/MC

MAD/MMT

LRR/RS

LRR/MC

LRR/MMT

LR/RS

LR/MC

LR/MMT

AVL/AV

0

50

100

Ene

rgy

Con

sum

ptio

n (k

w/h

) and

SLA

Voi

latio

n (*

0.00

001) Energy_Consumption

SLA_Violation

IaaS PaaS SaaS

0

400

800

Cloud Domains

Penalties(Euro)

AVL_AVTHR_MCLRR_RSLR_MMT

(a) (b) (c)

Fig. 5. Comparison with other strategies in availability, energy consumption, SLA violation and penalty

a light-weight hierarchical system will be chosen to representand monitor the SLAs in a fault-tolerant fashion. As such, theSLA violation can be avoided maximumly by resolving theproblem in the right component.

ACKNOWLEDGMENT

The research leading to these results is supported byGesellschaft fur wissenschaftliche Datenverarbeitung mbHGottingen (GWDG) in Germany.

REFERENCES

[1] (2013) VMware Virtualization Technology, [Online]. Available:www.VMware.com

[2] (2013) Xen Virtualization Technology, [Online]. Available: http://xen.org[3] A. Kertesz, G. Kecskemeti, I. Brandic, “An SLA-based Resource Vir-

tualization Approach for on-demand Service Provision”. In: VTDC 09:Proceedings of the 3rd international workshop on Virtualization technolo-gies in distributed computing, pp. 27-34, 2009.

[4] V. Stantchev, C. Schropfer, “Negotiating and Enforcing QoS and SLAsin Grid and Cloud Computing”. In: Advances in Grid and PervasiveComputing, Springer Berlin, Heidelberg, pp. 25-35, 2009.

[5] I. Brandic, D. Music, S. Dustdar, “Service Mediation and NegotiationBootstrapping as First Achievements Towards Self-adaptable Grid andCloud Services”. In: Proceedings of the 6th International Conference onAutonomic Computing, pp. 18, 2009.

[6] J. Kosinski, D. Radziszowski, K. Zielinski, S. Zielinski, G. Przybylski, P.Niedziela, “Definition and Evaluation of Penalty Functions in SLA Man-agement Framework”. In: Fourth International Conference on Networkingand Services, ICNS 2008, pp. 176-181, 2008.

[7] M. Becker, N. Borrisov, V. Deora, O. Rana, D. Neumann, “Using k-Pricing for Penalty Calculation in Grid Market”. In: Proceedings of the41st International Conference on System Sciences, pp. 97-97, 2008.

[8] O. F. Rana, M. Warnier, T. B. Quillinan, F. Brazier, D. Cojocarasu, “Man-aging Violations in Service Level Agreementst”. In: Grid Middleware andServices, Springer-Verlag US, pp. 349-358, 2008.

[9] R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, R. Buyya,“CloudSim: A Toolkit for Modeling and Simulation of Cloud ComputingEnvironments and Evaluation of Resource Provisioning Algorithms,”Software: Practice and Experience (SPE), Volume 41, Number 1, pp.23-50, ISSN: 0038-0644, Wiley Press, New York, USA, 2011.

[10] K. Lu, T. Roblitz, P. Chronz, C. Kotsokalis, “SLA-Based Planningfor Multi-Domain Infrastructure as a Service,” Cloud Computing andServices Science, Pages: 243-257, ISSN: 1865-4924, Springer, 2011.

[11] (2011) SLA@SOI, [Online]. Available: http://sla-at-soi.eu/[12] C. Kotsokalis, J. L. Rueda, S. G. Gomez, A. E. Chimeno “Penalty Man-

agement in the SLA@SOI Project” In: Wieder. P., Butler, J., Yahyapour R.(Eds.) Service Level Agreements For Cloud Computing, Springer-Verlag,pp. 116-119, 2011.

[13] K. T. Kearney, F. Torelli, C. Kotsokalis, “SLA: An Abstract Syntax forService Level Agreements”, 11th IEEE/ACM International Conferenceon Grid Computing (GRID), pp. 217-224, 2010.

[14] M. A. R. Gonzalez, P. Chronz, K. Lu, E. Yaqub, B. Fuentes, A. Castro,H. Foster, J. L. Rueda, A. E. Chimeno, “G-SLAM The Anatomy of theGeneric SLA Manager” In: Wieder. P., Butler, J., Yahyapour R. (Eds.)Service Level Agreements For Cloud Computing, Springer-Verlag, pp.167-186, 2011.

[15] J. Kennedy, A. Edmonds, V. Bayon, P. Cheevers, K. Lu, M. Stopar, D.Murn, “SLA-Enabled Infrastructure Management” In: Wieder. P., Butler,J., Yahyapour R. (Eds.) Service Level Agreements For Cloud Computing,Springer-Verlag, pp. 271-287, 2011.

[16] (2012) PaaSage, [Online]. Available: http://www.paasage.eu/[17] (2013) Openshift, [Online]. Available: https://openshift.redhat.com/app/[18] (2013) Apprenda, [Online]. Available: http://docs.apprenda.com/[19] (2012) Woorea, OpenStack Java SDK, [Online]. Available:

https://github.com/woorea/openstack-java-sdk[20] G. S. Machado and B. Stillerm, “Investigations of an SLA Support

System for Cloud Computing (SLACC)”. in Proc. Praxis der Informa-tionsverarbeitung und Kommunikation (PIK), pp. 80-86, 2011.

[21] A. Beloglazov, J. Abawajy, B. Rajkumar, “Energy-aware ResourceAllocation Heuristics for Efficient Management of Data Centers for CloudComputing”. Future Generation Computer Systems, Volume 28, Issue 5,pp. 755-768, 2012.

[22] A. Beloglazov, B. Rajkumar, “Optimal Online Deterministic Algorithmsand Adaptive Heuristics for Energy and Performance Efficient DynamicConsolidation of Virtual Machines in Cloud Data Centers”. Concurrencyand Computation: Practice and Experience, ISSN: 1532-0626, WileyPress, New York, USA, DOI: 10.1002/cpe.1867, 2011.

[23] A. Corradi, M. Fanelli, L. Foschini, “VM consolidation: A real casebased on OpenStack Cloud”. Future Generation Computer Systems, 2012.

[24] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt,A. Warfield, “Live Migration of Virtual Machines”. NSDI’05 Proceedingsof the 2nd conference on Symposium on Networked Systems Design andImplementation, pp. 273-286, 2005.

[25] F. Hermenier, “Entropy a Consolidation Manager for Clusters”. In Pro-ceedings of the 2009 ACM SIGPLAN/SIGOPS international conferenceon Virtual execution environments, pp. 41-50, 2009.

[26] S. Akoush, R. Sohan, A. Rice, A. W. Moore, A. Hopper, “Predictingthe Performance of Virtual Machine Migration”. 2010 IEEE InternationalSymposium on Modeling, Analysis and Simulation of Computer andTelecommunication Systems (MASCOTS), pp. 37-46, 2010.

[27] F. Salfner, P. Troeger, M. Richly, “Dependable Estimation of Downtimefor Virtual Machine Live Migration,” International Journal On Advancesin Systems and Measurements, volume 5, numbers 1 and 2, 2012.