using transient thermal models to predict cyberphysical phenomena in data centers

16
Sustainable Computing: Informatics and Systems 3 (2013) 132–147 Contents lists available at SciVerse ScienceDirect Sustainable Computing: Informatics and Systems jo u r n al hom epa ge: www.elsevier.com/locate/suscom Using transient thermal models to predict cyberphysical phenomena in data centers Georgios Varsamopoulos, Michael Jonas, Joshua Ferguson, Joydeep Banerjee, Sandeep K.S. Gupta , Impact Lab School of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, Arizona, United States a r t i c l e i n f o Article history: Received 11 September 2012 Accepted 29 January 2013 Keywords: Data center modeling Cyber-physical systems Lightweight transient models a b s t r a c t Designing and configuring the layout of data centers, as well as testing thermal-aware decision (e.g., scheduling) algorithms has been hindered by the use of CFD simulations, which are considerably slow and are not that flexible in integrating cyber behavior (e.g., workload scheduling). Fast thermal mapping techniques on the other hand may rely on wrong assumptions (e.g., steady state) and thusly produce incorrect conclusions or they can be unusable because of cyber-physical interactions that rely on tran- sient phenomena (e.g., transient hot spots that cause throttling). In this paper, we propose to speed up the evaluation of designs and algorithms with the use of a heat transfer model that captures transient behavior. We demonstrate its physical relevance, provide a methodology in yielding its parameters from experiments, and show how it can be combined with heat generation and cooling models to create a complete minimal data center system model, which can be simulated in only a small fraction of a CFD simulation time. © 2013 Elsevier Inc. All rights reserved. 1. Introduction Energy-aware computing has grown dramatically in the recent years. From longer cell phone battery lifetimes to low-power screens and power-scaling CPU’s, the net effect has been an increase in the operating efficiency of computing devices that has led to greener devices. Due to their $7.4 billion in annual electricity use [1], data centers are an important topic in green computing research. In many data centers, the majority of their energy is con- sumed by support infrastructure rather than on the computing equipment itself. Researchers have developed numerous energy- aware approaches to reduce these costs [2–12]. Many of these improvements rely on a thermal model to predict the tempera- ture at places of interest throughout data center facilities. These temperature predictions can be used in making proper manage- ment decisions (e.g., which server to assign an incoming workload). Developed as a part of the NSF CRI project #0855277 “BlueTool” (http://impact.asu.edu/BlueTool/). Work in this paper was also funded in part by NSF grants #0834797 and #1218505. Corresponding author at: School of Computing, Informatics and Decision Sys- tems Engineering, Arizona State University, Tempe, AZ 85287-8809, United States. Tel.: +1 480 965 3806. E-mail addresses: [email protected] (G. Varsamopoulos), [email protected] (M. Jonas), [email protected] (J. Ferguson), [email protected] (J. Banerjee), [email protected] (S.K.S. Gupta). 1 http://impact.asu.edu/. However, the thermal models used in these approaches only pre- dict steady state temperatures, ignoring critical temporal aspects of thermal behavior. As data center technology shifts to a higher- density deployments, e.g., containerized and contained-aisle data centers, which typically exhibit lower air-to-equipment mass ratios compared to traditional big-room data centers, transient condi- tions tend to rise faster and be more extreme. The most prominent conditions that warrant the use of transient models are: The cooling delay problem [13,14]: High density data centers can increase localized temperatures much faster than conventional data centers. If the cooling equipment (chiller) is slow at detecting and responding to the temperature rise, the servers may quickly reach high temperatures before the chiller catches up. Oscillating behavior of cooling: Most conventional vapor- compression chillers have few discrete compression states (i.e., modes), e.g., off (no compression), low (one activated com- pressor), and high (two activated compressors or one stronger activated compressor). With such configuration the chiller can remove heat at almost fixed rates; thus, if the data center produces heat at some rate between two cooling rates, the chiller will have to oscillate between those two rates. Since oscillations cannot be captured by steady-state models, transient models are needed for such cases. Cool-off and heat-up periods due to thermal capacitance of equip- ment: Most CFD simulations ignore the thermal capacitance of solid materials because it would slow down the simulation and 2210-5379/$ see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.suscom.2013.01.008

Upload: sandeep-ks

Post on 02-Jan-2017

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Using transient thermal models to predict cyberphysical phenomena in data centers

Ui

GSS

ARA

KDCL

1

ysigurseaittm

(N

tT

m(

2h

Sustainable Computing: Informatics and Systems 3 (2013) 132– 147

Contents lists available at SciVerse ScienceDirect

Sustainable Computing: Informatics and Systems

jo u r n al hom epa ge: www.elsev ier .com/ locate /suscom

sing transient thermal models to predict cyberphysical phenomenan data centers�

eorgios Varsamopoulos, Michael Jonas, Joshua Ferguson, Joydeep Banerjee,andeep K.S. Gupta ∗, Impact Lab

chool of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, Arizona, United States

a r t i c l e i n f o

rticle history:eceived 11 September 2012ccepted 29 January 2013

eywords:ata center modeling

a b s t r a c t

Designing and configuring the layout of data centers, as well as testing thermal-aware decision (e.g.,scheduling) algorithms has been hindered by the use of CFD simulations, which are considerably slowand are not that flexible in integrating cyber behavior (e.g., workload scheduling). Fast thermal mappingtechniques on the other hand may rely on wrong assumptions (e.g., steady state) and thusly produceincorrect conclusions or they can be unusable because of cyber-physical interactions that rely on tran-

yber-physical systemsightweight transient models

sient phenomena (e.g., transient hot spots that cause throttling). In this paper, we propose to speed upthe evaluation of designs and algorithms with the use of a heat transfer model that captures transientbehavior. We demonstrate its physical relevance, provide a methodology in yielding its parameters fromexperiments, and show how it can be combined with heat generation and cooling models to create acomplete minimal data center system model, which can be simulated in only a small fraction of a CFDsimulation time.

. Introduction

Energy-aware computing has grown dramatically in the recentears. From longer cell phone battery lifetimes to low-powercreens and power-scaling CPU’s, the net effect has been an increasen the operating efficiency of computing devices that has led toreener devices. Due to their $7.4 billion in annual electricityse [1], data centers are an important topic in green computingesearch. In many data centers, the majority of their energy is con-umed by support infrastructure rather than on the computingquipment itself. Researchers have developed numerous energy-ware approaches to reduce these costs [2–12]. Many of thesemprovements rely on a thermal model to predict the tempera-

ure at places of interest throughout data center facilities. Theseemperature predictions can be used in making proper manage-

ent decisions (e.g., which server to assign an incoming workload).

� Developed as a part of the NSF CRI project #0855277 “BlueTool”http://impact.asu.edu/BlueTool/). Work in this paper was also funded in part bySF grants #0834797 and #1218505.∗ Corresponding author at: School of Computing, Informatics and Decision Sys-

ems Engineering, Arizona State University, Tempe, AZ 85287-8809, United States.el.: +1 480 965 3806.

E-mail addresses: [email protected] (G. Varsamopoulos),[email protected] (M. Jonas), [email protected]

J. Ferguson), [email protected] (J. Banerjee), [email protected] (S.K.S. Gupta).1 http://impact.asu.edu/.

210-5379/$ – see front matter © 2013 Elsevier Inc. All rights reserved.ttp://dx.doi.org/10.1016/j.suscom.2013.01.008

© 2013 Elsevier Inc. All rights reserved.

However, the thermal models used in these approaches only pre-dict steady state temperatures, ignoring critical temporal aspectsof thermal behavior. As data center technology shifts to a higher-density deployments, e.g., containerized and contained-aisle datacenters, which typically exhibit lower air-to-equipment mass ratioscompared to traditional big-room data centers, transient condi-tions tend to rise faster and be more extreme. The most prominentconditions that warrant the use of transient models are:

• The cooling delay problem [13,14]: High density data centers canincrease localized temperatures much faster than conventionaldata centers. If the cooling equipment (chiller) is slow at detectingand responding to the temperature rise, the servers may quicklyreach high temperatures before the chiller catches up.

• Oscillating behavior of cooling: Most conventional vapor-compression chillers have few discrete compression states (i.e.,modes), e.g., off (no compression), low (one activated com-pressor), and high (two activated compressors or one strongeractivated compressor). With such configuration the chiller canremove heat at almost fixed rates; thus, if the data centerproduces heat at some rate between two cooling rates, the chillerwill have to oscillate between those two rates. Since oscillationscannot be captured by steady-state models, transient models are

needed for such cases.

• Cool-off and heat-up periods due to thermal capacitance of equip-ment: Most CFD simulations ignore the thermal capacitance ofsolid materials because it would slow down the simulation and

Page 2: Using transient thermal models to predict cyberphysical phenomena in data centers

uting:

snttdwp(s

athmd

1

t

Fas

G. Varsamopoulos et al. / Sustainable Comp

because it does not affect much the converging steady state. Thecharacteristic of such simulations is that they suggest that thedata center reaches the steady state within only a small fractionof time as it would in reality. This can be a problem when consid-ering dynamic workload, as conventional CFD simulations willpredict that, in times of high workload, the data center will cooldown rather quickly, consequently predicting the equipment willspend less time heated (or overheated) than in reality.

In the aforementioned examples, a data center equipment maypend considerably more time overheated than predicted whichot only may have an effect on the long-term prediction of life-ime, but also on short-term performance degradation due toemperature-triggered down-throttling of CPUs. The latter con-ition constitutes a cyber-physical implication loop: increase inorkload (i.e., computing behavior) causes increase in heat (i.e.,hysical behavior) which in turn causes performance degradationi.e., computing behavior), something that conventional steady-tate simulators cannot predict.

On the other hand, transient CFD simulations take a considerablemount of time, typically greater than steady-state CFD simula-ions. Such long stretches of simulation time, which can take severalours or days, are not suitable for online model-based decisionaking (e.g., how can we place incoming workload so as to avoid

own-throttling?). The research question that this paper tackles is:

Is there a lightweight model that can describe transient ther-mal behavior of data centers and how can it be combined withthe cooling of CRACs, the power consumption of servers, andfurther with the performance of servers, to make energy andperformance predictions?

.1. Overview of the paper and of contributions

This paper proposes an analytical transient heat transfer modelo be used instead of CFD simulations to speed up the evaluation and

ig. 1. (a) Instead of spending considerable time testing alternative scenarios for design ond use them instead to speed up the process (the width of a rectangle is indicative of timulators such as GDCSim.

Informatics and Systems 3 (2013) 132– 147 133

decision making in initial designing or modifying the configurationof a data center, and also to speed up the testing of thermal-awarealgorithms. The main idea is presented in Fig. 1. Designing and con-figuring a data center for operation requires the examination ofseveral alternative scenarios. As CFD simulations take considerableamount of time, we propose to instead use a model (Section 3) thatpredicts three aspects of thermal behavior: division (spatial distribu-tion): how the heat produced by each computing server is split andcirculated into each other server or to each chiller; temporal distri-bution: how the heat portion of one server to another is distributedover time, which also captures the hysteresis: how long the heattakes to travel from one unit to another. Then, we apply the method-ology in Section 4 to yield the contribution curves and weights,analytically fit those curves to a convex weighted sim of gammadistributions (Appendix B), and use the analytical formulation tofeed an ODE solver.

The benefits of using the proposed approach instead of CFD aretwofold, in run times (Section 5.3) that take a small fraction ofCFD run times, and in the ability to introduce logic and triggersthat dynamically change the behavior of the data center system,something that is hard to implement in CFD. This methodologycan be incorporated into data center simulators such as GDC-Sim [15] so that it can be used to test the effectiveness of severalthermal-aware algorithms. The rest of the paper is organized asfollows:

Section 2 covers the preliminary material on the physicaldesign of data centers on which data centers are based (Section2.1), reviews the state of the art of thermal modeling (Section2.2) and establishes a critical drawback of assuming steadystate when using discontinuous cooling models (Section 2.3),demonstrates the heat capacitance problem (Section 2.4 using anexperiment on BlueCenter, the data center of the BlueTool project

(http://impact.asu.edu/BlueTool/), and demonstrates the benefitsof using our proposed approach.

Section 3 formally defines our transient heat circulation model(Section 3.1), which states that the heat observed at a moment

r configuration decision making, (b) we propose to yield analytical approximationshe time its procedure takes). This simulation approach can be incorporated to DC

Page 3: Using transient thermal models to predict cyberphysical phenomena in data centers

1 ting:

apscats3

eTasthTf

cii(wiM

maCt

oa(l

2

2

ataatsFf

fastamoditnombt

cpf

where cp is the specific heat of air and f is the air flow rate throughthe server. In CFD simulation software such as the commercially

34 G. Varsamopoulos et al. / Sustainable Compu

t the air inlet of some equipment is the cumulative integral ofast produced heat as it was spread over time (Eq. (5)). Then aymmetry property is presented which establishes that the heatontribution curve c(t) that describes how heat accumulates to

moment from all the past is a reflection of the heat distribu-ion curve c(t) that describe how a heat produced at a momentpreads over time toward a destination air inlet (Eq. (6)) (Section.2).

Section 4 provides a methodology for calculating the param-ters of our model for a given physical or modeled data center.he methodology involves inducing heat spikes at the air outletsnd then observing the temperature effect at the air inlets of theervers. The methodology is applied to a CFD model of a hypo-hetical containerized model (Section 4.2). The results show thateat distribution c(t) curves resemble exponential fading functions.his property is leveraged to analytically fit these curves to gammaunctions (as later demonstrated in Appendix B).

Section 5 presents a validation of the model through: (i) aomparison a CFD simulation on OpenFOAM [16] with a MATLABmplementation of the analytical model (Section 5.1), (ii) test-ng whether linearity of the model manifests in physical modelsSection 5.2), including also a study on the accuracy of the modelhen fan speeds change (Section 5.2.1), and (iii) a small scalabil-

ty test on the run times of the transient model impementation onATLAB (Section 5.3).Section 6 discusses how to analytically combine the transient

odel with heat generation and cooling models. The combinationlong with the analytical fitting of the curves is used in Appendix

to create a linear system model of the data center that can be fedo an ODE solver.

Section 7 provides some conclusions on the models and meth-ds provided in this paper, with a discussion on future work. Theppendix contains more detailed information on the experimentsAppendix A), curve fitting (Appendix B) and MATLAB ODE formu-ation (Appendix C).

. Preliminaries

.1. Data center operation

A typical data center is organized as follows: All servers are situ-ted on a raised floor plenum. These servers are organized into rowshat separate aisles. These aisles alternate between cold air intakeisles and hot air exhaust aisles. Cold air from the computer roomir conditioner (CRAC) is supplied to the servers through perforatediles in the raised floor. This cold air is then passed through theervers, where it absorbs heat and is ejected into the hot air aisle.inally the hot air is returned to the AC through a ceiling intakeorming the cycle.

Cooling inefficiency in this system has a number of sources, butor this paper we focus on heat recirculation (i.e., the return of hotir to the air inlets of computing equipment) and the notion of hotpots, i.e., localized increased temperatures due to heat recircula-ion. CRACs are incapable of provisioning varying amounts of coolir to specific, dynamic locations in the data center. As a result,uch of the equipment is cooled far below red line temperatures in

rder to ensure that all equipment are below this threshold. This isue to hot spots, or locations where inlet temperatures are dramat-

cally higher than the rest. Hot spots force the CRAC to over-coolhe rest of the data center while trying to bring the hot spots toormal temperature. This negatively impacts CRAC efficiency not

nly through higher utilization, but by reducing the CRAC’s perfor-ance due to a lower Coefficient of Performance (CoP), i.e., the ratio

etween heat removed over work (energy) expended to removehat heat.

Informatics and Systems 3 (2013) 132– 147

Knowledge of the heat recirculation, CRAC efficiency and themagnitude of over-cooling can be used to save energy in a datacenter by predicting how much energy would be spent for upcom-ing scenarios. By placing load intelligently with respect to itsthermal impacts, thermal aware scheduling can allow the CRACto act with higher efficiency by increasing the required sup-ply temperature. To the degree that the temperature of a datacenter can be accurately predicted for any load, the amount ofover-cooling shrinks and efficiency is gained. This highlights thepotential of having a better thermal model, a mapping of howload effects the temperature throughout the data center. As longas the inlet of all equipment is kept below its manufacturerspecified red-line temperature and system performance is notoverly affected, this efficiency gain causes no adverse effects onthe performance, e.g., no SLA (Service-Level Agreement) viola-tions in Internet applications [17]. Most models and simulatorsof data centers operate under the thermal isolation assump-tion, i.e., that the heat passing through the data center walls isnegligible.

2.2. Existing thermal modeling techniques (steady-state)

While the previous section motivates the need for an accuratethermal model. This section will briefly review existing methodsused for thermal modeling, compare it against empirical behavior,and motivate the need for a transient model.

Thermal modeling of data centers requires three components(see Fig. 2):

1 Heat generation: Heat generation describes how power con-sumed by a server is emitted as heat into the data center. Acommon and practical assumption is that mechanical (sonic) andelectromagnetic energy produced is negligible compared to theheat produced, thus:

Qcomputing ≈ Pcomputing = P.

In a steady state, the heat rate that is output at a server’s air outletequals to the heat at the server’s air inlet plus the heat produced:

Qoutlet = Qinlet + P,

which can be translated into the temperature domain as:

T− = T+ + P(outlet temp = inlet temp + temp rise),

Fig. 2. Thermal modeling of a data center requires three components: a heat gener-ation model, which describes how power consumed is introduced as heat into thedata center, a heat circulation (heat flow and mixing) model, which describes howheat is distributed among the data center equipment, and a cooling model, whichdescribes how cooling is introduced into the data center.

Page 4: Using transient thermal models to predict cyberphysical phenomena in data centers

uting: Informatics and Systems 3 (2013) 132– 147 135

2

3

2

tmsucsiwpm

h

54

56

58

60

62

64

62 64 66 68 70 72

Out

put t

empe

ratu

re (F

loor

Wes

t 1)

recorded pairaveragelinear fit

F(e

G. Varsamopoulos et al. / Sustainable Comp

available FloVENT or the open source OpenFOAM [16], heat gen-eration is modeled as (forced) convection through the servers.

Heat circulation (heat flow and mixing): Heat circulationdescribes how heat is transferred around the room by the air. InCFD software, this is done by modeling the room as a 3D grid ofNavier–Stokes equations that express physics laws such as con-servation of energy, momentum, mass and pressure in a definedgeometry. Although CFD simulations can predict the tempera-ture at all points of the grid, it is of practical interest to predictthe temperature at a few points only, specifically at the equip-ment’s air inlets and outlets. Fast approximate solvers such asused in [18,19,10,20] all rely on the following steady state assump-tion: for any constant input of heat the data center will reach a steadythermal state such that the temperature anywhere in the data cen-ter becomes invariant to time. These approaches frequently usedFloVENT for model building but once a model is constructed theyheuristically approximate temperatures. For example, the workin [10,20] constructs a heat recirculation matrix D that is used topredict the steady-state temperature at the air inlets of serversgiven the power consumption pi of each server, plus the suppliedcool temperature Tout from the CRAC:

⎡⎢⎢⎢⎢⎣

T1

T2

...

Tn

⎤⎥⎥⎥⎥⎦ = Tout + D

⎡⎢⎢⎢⎢⎣

p1

p2

...

pn

⎤⎥⎥⎥⎥⎦ . (1)

Cooling: Cooling models describe how heat is removed from theair. It is analogous to heat generation. In Fig. 3(a), the constantcooling model the outlet of the CRAC supplies cool air at a con-stant temperature, regardless of the input air temperature. Thepower removed by this model scales linearly with inlet temper-ature. Previous research routinely assumes a constant coolingmodel because given this model the steady state assumption isguaranteed to hold [18,20].

.3. Shortcomings of steady-state cooling assumptions

The constant cooling model behavior is only true for certainypes of CRAC units. The most crucial difference with the research

odels is that the step-linear behavior does not allow for steadytate solutions to occur for a majority of scheduling patterns. Tonderstand this first consider that data centers are designed to belosed systems such that the only points of energy exchange in aystem are known and considered. This is known as the thermalsolation assumption [21]. Fig. 3(b) shows the linear cooling model

hich is simply a function of the form y = mx + b. With m = 1 the

ower removed by the CRAC is constant. As m approaches 0 thisodel converges to the constant model.Consider how the total energy of a system evolves when the

eat input to the system from chassis is near-constant and the heat

ig. 3. Various cooling models assumed in the literature. (a) the constant model assumesb) the linear model assumes a linear reduction of temperature with a constant shift; (cmpirically observed behavior [3].

Input temperature (Ceiling West 2)

Fig. 4. Temperature sensor measurements at the CRAC inlet and outlet of a real datacenter over 2 weeks time [3].

removed from the system follows the empirical step-linear coolingmodel in Fig. 4.

From this perspective, in order for a data center to reach a steadythermal state the incoming energy must equal the outgoing energy.Here we show that this is not commonly the case. The incomingenergy can be derived using the sum of the power of all heat sources

Pin =n∑

i=0

pi.

Pin is assumed to be constant for a particular schedule. The outgo-ing energy can be inferred using the temperature drop across theCRAC so long as the airflow mass is known. Thus an equilibriumtemperature exists:

Pin = PAC = (TAC − TAC,out)cpfAC ⇒ TAC − TAC,out = Pin

cpfAC⇒

TAC − (mTAC + b) = Pin

cpfAC⇒ TAC = (Pin/cpfAC) + b

1 − m,

(2)

where cp is the specific heat of air and fAC is the airflow rate of theAC.

In the case for which m = 1, the data center can only reach equi-librium at exactly one value of Pin = − bcpfAC (from Eq. (2)). Sincetypically the slope of the step-linear cooling model is near 1 we canonly conclude that in the majority of cases, no steady state is possible.Hence, predictions made by steady state thermal models may befalse and, generally speaking, not conservative. This leads to possi-ble critical situations where red-line temperatures are exceeded astemperature fluctuates.

This is a problem for fast approximate solvers that do not sup-port step-linear cooling models. This is the empirical behavior formany types of CRACs. It extracts a near constant amount of heatwhile the inlet temperature is within each temperature range.

a constant temperature of supplied cool air, regardless of the input temperature;) the step-linear model is a set of discontinuous linear functions that best reflects

Page 5: Using transient thermal models to predict cyberphysical phenomena in data centers

136 G. Varsamopoulos et al. / Sustainable Computing: Informatics and Systems 3 (2013) 132– 147

Fig. 5. A rendering of the CFD model of BlueCenter used for validations in this paper.Tnr

pncrea

2

flftTsta

cs

Fig. 6. CRAC inlet temperature from experimental measurements at BlueCenter vsCFD simulation on OpenFOAM. The CFD simulations suggest that the room quickly

Fp

he BlueCenter features a cold-hot-cold aisle layout, with isolated aisles commu-icating with doors. In the experiments, the doors were kept open to allow heatecirculation.

In summary, steady state solvers cannot provide accurate tem-erature estimates in situations where thermal steady states doot exist [22]. Where inaccuracies occur steady state solvers are notonservative in their estimates [23]. When heat spikes occur on aegular basis they can cause heat induced performance throttling,quipment shutdown, and equipment damage over time. These arell costly consequences that a transient thermal model can prevent.

.4. Heat capacitance in real systems

In typical CFD simulations of data centers, only the air (i.e., theuid) is being simulated. Solid equipment only acts as thickness-

ree walls, some of which may have some porosity (e.g., ventilationiles), and they do not have any thermal exchange with the fluid.his saves a lot of simulation time and it can find good steady-stateolutions. Most CFD simulation models of data centers are modelinghe air cooling portion, i.e., they focus on the temperature of the

ir.

An experiment was conducted in BlueCenter, a small dataenter at ASU, part of the BlueTool project (a CFD model of it ishown in Fig. 5). Initially 144 servers were turned on and the

ig. 7. Emulation of CFD output using a preliminary version of our transient model. CFDredict the converging state.

converges to the equilibrium state which is not the case in reality.

temperature at the inlet of the CRAC was brought to 40 ◦C. TheCRAC was then turned on. The inlet and outlet temperatures ofthe CRAC were recorded for two hours using sensors attached atthe airducts of the CRAC (section (a) in Fig. 6). At the end of twohours, 48 servers were turned off. The remaining 96 servers wereallowed to continue running for two more hours after which 48servers were turned off (section (b) in Fig. 6). The remaining 48servers were again allowed to run for two more hours (section (c)in Fig. 6). There was no other variation in power output.

The geometry of the BlueCenter was generated using theBlueSim CFD simulator and the experimental setup was simulatedand the predicted CRAC inlet temperature was recorded. These pre-dicted values were then compared with the experimental valuesand the results are shown in Fig. 6. Although the CFD simulationsuggests that the data center quickly converges to the steady-state temperature (within a couple of minutes), the steady stateis achieved in a couple of hours. This phenomenon is due to ther-mal capacitance existing in the materials, mainly solid objects. For atransient model to be accurate and usable, the thermal capacitance

must be taken into account, otherwise the model may exhibit largeprediction errors.

simulations need to be done separately for each power consumption level, and

Page 6: Using transient thermal models to predict cyberphysical phenomena in data centers

G. Varsamopoulos et al. / Sustainable Computing: Informatics and Systems 3 (2013) 132– 147 137

F low

r

2c

mmAfidmetaasa

Fta

ig. 8. Simulation of the same three-server data center with a 2-mode CRAC (1 kWeach far wider range of temperatures than with a converging simulation.

.5. Benefits of a transient cyberphysical simulation of dataenters

To demonstrate the benefits of having a light-weight transientodel, we combined our proposed model with a server thermalodel (i.e., heat generation) and a CRAC model (i.e., cooling) as inppendix C, and we generated three figures (Figs. 7–9). The firstgure (Fig. 7) depicts a 2500-second simulation of a hypotheticalata center with three servers and one CRAC, using our transientodel on MATLAB using ode45. The servers work at 66% (1000 W

ach) for 500 s, then they go to 33% (500 W each) for another 500 s,hen they go to 100% (1500 W each) for 500 s, and finally run for

nother 1000 s at 1000 W each. There is strong recirculation (onlybout 25% of the heat produced reaches the CRAC before any othererver). Air travel delays are in the range of 1–4 s from a server tonother equipment (server or CRAC) and 5–8 s from the CRAC to

ig. 9. Simulation of the same three-server data center with a 2-mode CRAC (1 kW lowhrottling, which occur in periods of high workload, which can have significant effect on

lso provide several insights into performance events that are unpredictable with conven

/ 5 kW high). It is clear that transient simulation predicts that the data center can

other equipment. This simulation scenario tries to emulate whata CFD simulator would produce: a converging steady state at eachstage.

The next figure (Fig. 8) shows that a transient model can produceresults for non-converging (oscillating) behavior, in this exampledue to a two-mode CRAC, for the same workload schedule. Thefigure shows how the inlet temperatures variate when the CRACthermostat is set to 28 ◦C. The proposed model can produce theseresults in under one minute.

The benefits of an analytical and computationally lightweighttransient model can be shown in the last figure (Fig. 9). In this sce-nario, and for the same workload schedule as above, we enhanced

the model with a throttling function that reduces the server powerwhen its inlet temperature exceeds 30 ◦C. The results show thatthere can be several time windows of throttling, which will causeperformance degradation.

/ 5 kW high) and with throttling emulation. The simulation predicts windows ofperformance. A performance model integrated with such a transient simulator cantional technologies.

Page 7: Using transient thermal models to predict cyberphysical phenomena in data centers

1 ting:

3

twittottmtimim

ttpcf

3

wptaoaf

c

AaaT

T

w

TN

will start arriving after time � and will continue contributing someportion of itself until +∞, i.e., it will contribute in the future interval(t + �, + ∞). The insight of the model is that air and heat that leaves aserver outlet spreads itself out in arriving at a new location. Assume

38 G. Varsamopoulos et al. / Sustainable Compu

. A new transient model for heat circulation

This section will provide both an intuitive and a formal defini-ion of the transient thermal model motivated above. Intuitively,e propose that a useful estimate of the temperature at a point of

nterest can be provided by creating a weighted average of con-ributing temperatures of each source, where each contributionemperature is a weighted average of its own outlet temperaturesver time. Essentially, heat added to a room differs with respect toime and location in the room, and, as the air mixes, the resultingemperature averages according to the relative air masses. If we

onitor the temperature at every point where it is modified theemperature everywhere else in the room can only be an averag-ng of measured temperatures at various times past. Therefore this

odel requires the thermal isolation assumption. This is primar-ly to avoid the need for a bias factor so that we can constrain the

odel such that the sum of all weights is 1.The other key assumption of this model is that the airflows in

he data center are controlled and dominated by the fans such thathe portion of air arriving everywhere over time is invariant to tem-erature. This ignores changes in airflow due to pressure changesaused by temperature such as hotter air rising faster (a likely causeor slight inaccuracy in our model).

.1. Model definition

Our model, formally defined, uses the symbols of Table 1,here source temperatures are initialized as steady state tem-eratures. The heart of the model is a collection of n × nemporal contribution curves, cij(t), each one denoting how heatrrives to the air inlet of a sink (receiving) server j from the airutlet of a source server i. (For notational convenience, the CRAClso is included in the ij enumeration). A cij curve conforms to theollowing:

ij(t) : (−∞, 0] → [0, 1], with

∫ 0

−∞cij(t)dt = 1. (3)

key concept of the model is the contributing temperature Tij(t) of source server i to a sink server j, which is the temperature rise asffected by the history of the source server i’s outlet temperaturei−(t) through the spreading of cij:

ij(t) =∫ 0

cij(�)Ti−(t + �)d�. (4)

−∞

In addition to the cij curves, the model assumes n × neighting factors wij , which define how each contributing

able 1omenclature.

Symbol Definition

n Number of points of interest: chassis, CRAC inlets etct TimeTi+(t) Entering (sink) temperatures of point i at time tTi−(t) Exiting (source) temperatures of point i at time tTi∞ Starting steady state temperature of point icij(t) Temporal contribution of i’s temperature upon jcij(t) Temporal distribution of i’s temperature upon jwij Heat weighting factor (at sink)uij Heat division factor (at source)Q ij Cumulative contributed heat from point i to j at time tTij Cumulative contributed temperature from i to j at time tfi Air flow rate through a point (server or CRAC) ipi(t) Power at server icp specific heat of air

Informatics and Systems 3 (2013) 132– 147

temperature Tij(t) factors into the actual temperature Tj+(t) at theair inlet of server j. The wij can be organized into a matrix W:

W =

⎡⎢⎢⎢⎢⎣

w11 w21 · · · wn1

w12 w22 · · · wn2

......

. . ....

w1m w2m · · · wnm

⎤⎥⎥⎥⎥⎦ .

The core hypothesis of the model is that the temperature at the airinlet of a server j is the convex weighted sum of the temperaturecontributions of all servers (and of the CRAC) as they accumulateover time from the past:

Tj+(t) =n∑

i=1

wijT ij(t) =n∑

i=1

wij

∫ 0

−∞cij(�)Ti−(t + �)d�, (5)

where∑n

i=1wij = 1, ∀j, because of the thermal isolation assump-tion. Also, since a contribution temperature intuitively representsthe temperature of air from a source weighted over some past timeinterval, it should never be hotter or colder than it has ever been:

min(Ti−) � Tij � max(Ti−).

3.2. The symmetry property

The proposed circulation model, as defined, suggests thatbetween a contributing server and a receiving server, at any pointin time t, there is heat from the contributing server that arrivesfrom any point in the past (− ∞ , t − �) (see Fig. 10). The hysteresisparameter � expresses the delay it takes for heat to start arriving.

Since the above is true for any point in time, it is also true, in aconverse manner, that heat Qi−(t) produced at any moment t will bedivided according to some division factors uij (

∑nj=1ui,j = 1, ∀i) and

Fig. 10. Explanation of the proposed transient model assertions: heat leaving aserver i splits according to weights ui and reaches each server j with a temporaldistribution (spreading) cij(t) which includes some hysteresis (�ij). For a receivingserver j, incoming heat at time t is a weighted sum of integrals of accumulated heatfrom the past as dictated by the the heat contribution curves cij(t). cij(t) mirrors cij(t)(Eq. (6)).

Page 8: Using transient thermal models to predict cyberphysical phenomena in data centers

uting: Informatics and Systems 3 (2013) 132– 147 139

tt

or

Q

T

c

Af

Eti

iDic

ta

4

wlaa

4

rttc

whatsetiw

Fig. 11. The methodology of yielding the cij curves involves inducing a Dirac-like

G. Varsamopoulos et al. / Sustainable Comp

hat the heat distribution function is cij(�) (this function incorporateshe hysteresis � as well).

Assuming that heat is distributed according to cij(�), irrespectivef when it is generated, then the contributed heat at time t at theeceiving server is:

ij(t) =∫ t

−∞cij(−�)uijQ (t + �)d�.

herefore, if we can easily calculate the function cij(t) as:

ij(t) = cij(−t). (6)

lso, we can establish a relationship between the heat divisionactors uij and the weighting factors wij as follows:

Tj+(t) = cpfjQj+(t) = cpfj

n∑i=1

Q ij(t)

= cpfj

n∑i=1

uij

∫ 0

−∞Qi−(t + �)cij(�)d�

=n∑

i=1

uijcpfjcpfi

∫ 0

−∞cpfiQi−(t + �)cij(�)d�

=n∑

i=1

uijfjfi

∫ 0

−∞Ti−(t + �)cij(�)d�

⇒ wij = uijfjfi

.

(7)

qs. (6) and (7) define a symmetry relationship between heat dis-ribution as it is generated at the sources, and heat contribution ast arrives at the sinks.

It should be noted that the matrix U = {uij} of the division factorss closely related to the heat recirculation matrix D (Eq. (1)). Matrix

describes, in a steady-state notion, how output heat from a servers split among all servers (and to the CRAC as well) and also does aonversion from power to temperature [20].

The next sections answer the fundamental questions of: (i) howo derive the cij curves and the weighting factors wij (Section 4),nd (ii) how accurate and useful the model is (Section 5).

. Model learning

In this section, we present a methodology on how to derive theeights w and cij curves for a given data center. The methodology

everages the symmetry relationship to first derive the cij curvesnd uij factors from CFD simulations, and then compute the c(t)nd wij .

.1. Methodology for yielding wij and cij

A series of CFD simulations with high temporal resolution isequired (in this case, we used the OpenFOAM CFD solver). Oncehe model is learned, the CFD solver would no longer be required. Fur-hermore, a CFD solver can be done away with entirely if the dataan be obtained from sensors in the data center.

To obtain the c(t) curves we first generate n CFD simulationshich start with a steady state of Tconst temperature and thenaving a single location at a time produce a considerable temper-ture spike. The simulation should include a parameter specifyinghat the outlet of all servers (other than the previously specifiedhort period of extreme heat) should act as “heat sinks”, that is

mit a constant temperature Ti−(t) = Tconst, regardless of their inletemperatures Ti−(t). Heat sinking ensures that eventually the heatntroduced by the temperature spike will dissipate and the system

ill revert to the pre-spike state. This procedure is performed in n

heat spike at the source server i, one source i at a time, and then observing thetemperatures over time at the sink servers. The uij factors are computed by dividingthe areas of each curve by the sum of all the areas at a sink.

separate experiments, one for each server and CRAC, to obtain theirrespective curves (Fig. 11).

The temperature curves Tj+(t) that are observed at the air inletsare used to derive the cij(t) and wij as follows:

1 Remove the bias temperature of the pre-spike steady state forthe curves.

2 (Calculate uij) For each experiment i, divide the area of each tem-perature curve by the sum of the areas of the curves:

uij =∫ ∞

0Tj+(�)d�∑n

j=1

∫ ∞0

Tj+(�)d�, for each experiment i.

3 Calculate wij using Eq. (7).4 (Calculate cij(t)) For each experiment i, normalize each curve to

a unit area:

cij(t) = Tj+(t)∫ ∞0

Tj+(�)d�.

5 Calculate cij(t) using Eq. (6).

4.2. Example CFD experiment on small containerized data center

We provide another example application of the methodology onthe CFD model of a smaller, containerized data center consisting of asingle row with 4 racks. Each rack contains 6 chassis. A mesh modelof this data center’s layout can be found in Appendix A. Fig. 12 showsa sample set of cij curves. Note that the arrival of heat is much morecompressed in these curves, as opposed to (some of) those from themedium sized data center. This is likely due to the smaller size ofthe containerized data center compared to the medium sized datacenter.

5. Validation

Validation contains the following aspects:

• Cornerstone validity (Section 5.1). Can the model predict tran-sient temperatures? A comparison with an equivalent CFD modelis conducted to demonstrate the validity of the model.

• Linearity (Section 5.2). The transient heat model, as definedexhibits linearity with respect to the sources and their contri-bution, i.e., if a source increases its output temperature, the sinkswill get a corresponding linear increase (according to the weights

wij), and it will not affect the heat from other sources (there is nomultiplication or cancellation or other type of non-linear effectbetween two heat sources). Does this linearity actually manifestin actual data center rooms? Also, how accurate is the model
Page 9: Using transient thermal models to predict cyberphysical phenomena in data centers

140 G. Varsamopoulos et al. / Sustainable Computing: Informatics and Systems 3 (2013) 132– 147

ted to

5

pi3pt

iaotd

5

o

Fat

Fig. 12. Two examples of server inlet (T+

(t)) traces, t-axis inver

when the fan speeds change? We conduct CFD simulations andsee that linearity does occur.Scalability (Section 5.3). How do the runtimes increase as thedata center population (i.e., the number of servers) increases?The test shows a polynomial-like increase in runtime.

.1. Comparison against CFD simulation

To validate the physical relevance of the proposed model werepared a CFD setup on OpenFOAM as in Fig. 13. The physical setup

s a 10 m long rectangle of a 0.1 m by 0.1 m square vent with the chassis and 1 CRAC inserted at distances apart from each otherroportional to the initial delay indicated in Fig. 13. The results ofhe CFD simulator strongly agree with our model (see Fig. 14).

The primary advantage of our simulator over the CFD simulators that our results can be obtained within 1 s as opposed to thepproximately 1 h solution time of the CFD simulation. This enablesur model to be used by data center administrators provided thathe parameters of our model are learned through the procedureescribed in Section 4.

.2. Linearity of heat contributions

The simulations used to derive the cij curves are performedne for each server (e.g., each simulation is that of a single server

ig. 13. Basic simulation setup for the discrete-time simulator in Section 5.1. Everyirflow arrives at exactly one other place. Air passing through each node changes byhe amount listed above it. Air leaving a node takes � time to arrive at the next node.

show as ci,j , showing the shape of the heat contribution curves.

emitting heat, and measuring the inlet temperatures of all serversto create the temperature trace cij). It is not necessary to performsimulations for every combination of servers due to the linearityof heat arrival as a function of the number of servers contributing.Mathematically, if ci&j,k(t) represents the heat arrival at server k’sinlet from servers i and j emitting simultaneously: ci&j,k(t)∼ci,k(t) +cj,k(t). Fig. 15 shows two examples of how the linear sum of twoindividually simulated ci,j curves very closely approximates thevalues generated by both servers emitting simultaneously.

5.2.1. Effect of fan speed change on linearityThe transient model’s depiction of heat arrival rate in a physical

layout is necessarily dependent on the speed of air flowing throughthe servers. It is a well known fact that modern servers modulatefan speed according to utilization. It is prohibitively expensive toderive ci,j curves for every possible fan speed combination for eachserver. To overcome this challenge, we instead propose perform-ing modifications to ci,j curves derived at a median fan speed. Wehypothesize that ci,j curves can be approximated as the sum of afinite number of gamma distributions (for more detailed analysisof this topic see Appendix B), where each individual distributionrepresents a different path of airflow delivering heat from serveri to server j. If fan speed is increased, these distributions will shiftto an earlier point, thus delivering heat earlier. Fig. 16, in compari-son to Fig. 15’s depiction of c1&2,3, exemplifies the shifting of peakstowards an earlier arrival time. Fig. 17 gives a cumulative look atthis change in arrival time.

5.3. Scalability testing

The proposed transient model consists of n equations on theinlet temperatures, each of them being a sum of weighted inte-grals over n heat contribution curves. With some manipulation (seeAppendix C), it is possible to bring the equations into a linear sys-

tem x = Ax + B. The size of the A matrix is O(n2), so we speculatethat the runtimes will follow a quadratic growth over the popula-tion of the data center n. Fig. 18 shows how the runtimes scale fora simulated time of 50 s on a contemporary commodity computer
Page 10: Using transient thermal models to predict cyberphysical phenomena in data centers

G. Varsamopoulos et al. / Sustainable Computing: Informatics and Systems 3 (2013) 132– 147 141

0 5 10−10

−5

0

5

10

time (s)

tem

pera

ture

(o C

)

CFD simulation vs transient model. Server1.

transient modelCFD simulation

0 5 10−10

−5

0

5

10

time (s)

tem

pera

ture

(o C

)

CFD simulation vs transient model. Server2.

transient modelCFD simulation

0 5 10−10

−5

0

5

10

time (s)

tem

pera

ture

(o C

)

CFD simulation vs transient model. Server3.

transient modelCFD simulation

0 5 10−10

−5

0

5

10

time (s)

tem

pera

ture

(o C

)

CFD simulation vs transient model. CRAC.

transient modelCFD simulation

Fig. 14. Temperature over time results compared for each node between transient model and CFD calculations. CFD computation on OpenFOAM took about 1 h. Transientmodel computation on MATLAB ode45 took about 1 s.

0 5 10 15 200

0.01

0.02

0.03

0.04

c’(t

)

c’ Examples from Server 1, 2, and 1&2 Simulations

c’1,3

(t)+c’ 2,3(t)

c’1&2,3

(t)

c’1,3

(t)

c’2,3

(t)

0 5 10 15 200

0.002

0.004

0.006

0.008

0.01

0.012

0.014

c’(t

)

c’ Examples from Server 17, 20, and 17&20 Simulations

c’17,6

(t)+c’20 ,6

(t)

c’20,6

(t)

c’17,6

(t)

c’17&20 ,6

(t)

ci,j , sh

cc

6c

fgacdw

6

ttt

Time (Seconds)

Fig. 15. Two examples of server inlet (T+

(t)) traces,

onsisting of a 2.3 GHz Intel Core i5-2410M system with 3MB L3ache and 4GB RAM at 667 MHz.

. Combining the thermal model with heat generation andooling

As discussed earlier in Section 2.2, the minimum requirementsor a thermal simulation of a data center’s operation is to have a heateneration model (for the computing servers), a heat circulationnd mixing model (for heat transfer among the equipment), and aooling model that removes the heat (for the CRAC). This sectioniscusses possible combinations of the proposed transient modelith heat generation and cooling models.

.1. Heat generation models

For heat generation, it is easy to use any model that conformso the generic form that defines a relationship Tj− = f(Tj+) betweenhe air inlet and the air outlet of a server. For example, assuminghat the air takes �t time to pass through the server, and assuming

Time (Seconds)

owing the linear combination of individual curves.

that it picks all of the heat produced in that interval, then the outlettemperature can be defined as:

Tj−(t) = Tj+(t − �t) + K

∫ �t

0

P(t + �)d�

= Tj+(t − �t) + KP�t, (if P(t) = P∀t).

(8)

The above relationship can be translated into a discrete-time sys-tem:

Tj−(n) = Tj−(n − 1) + P(n).

The above equations describe a system which passes all gener-ated heat instantaneously into the air. Motivated by Section 2.4, wepropose a system that exhibits lumped heat capacitance, based onNetwon’s law of object heating:

Tserver,j(t) = ˛j(Tserver,j(t) − Tj+(t)) + Pj(t)Cserver,j

, (9)

Tj−(t) = ˇj(Tserver,j(t) − Tj+(t)). (10)

that is, the rate of temperature change Tserver,j(t) of the server jdepends on the temperature difference (delta-t) between the server

Page 11: Using transient thermal models to predict cyberphysical phenomena in data centers

142 G. Varsamopoulos et al. / Sustainable Computing:

0 5 10 15 20 0 5 10 15 20 00

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Time (Seconds)

c’(t

)

c’ Examples from Server 5, 21, and 5&21 SimulationsFanspeed Doubled

c’1,3

(t)+c’ 2,3(t)

c’1&2,3

(t)

c’1,3

(t)

c’2,3

(t)

Fig. 16. An example depicting how c1&2,3, when the fan speeds of the heat emittingservers are increased (in this case doubled), shows a shifting of peaks towards anearlier time. Compare this specifically with Fig. 14’s depiction of c1&2,3 at a standardfan speed.

0 5 10 15 20 0 5 10 15 20 00

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time (Seconds)

cum

ulat

ive

c’(t

)

Cumulative Heat Arrival Comparison between Fanspeeds

cumulativec’

1&2,3(t)

double fanspeedcumulativec’

1&2,3(t)

standard fanspeed

Fig. 17. The cumulative sum of c1&2,3 at both standard and double fan speed. Asexpected, the double fan speed curve shows overall earlier heat arrival.

5 10 15 20 25 30 35 40 45 500

1

2

3

4

5

6

Population (number of servers and CRACs)

time

(sec

onds

)

Transient model simulation runtimes over data center population

Fig. 18. Runtimes of the transient model on MATLAB ode45 for a range of datacenter population for a simulated time of 50 s. Runs performed on a 2.3 GHz IntelCore i5-2410M system with 667 MHz RAM and 3MB L3 cache.

Informatics and Systems 3 (2013) 132– 147

and the air and on the heat produced by computation; and the outlettemperature depends on the heat exchanged from the server intothe air. The parameters ˛j and ˇj are the thermodynamic constantsthat include convection coefficient, area of convection, heat capac-itance of the server and the air, for ˛j and ˇj, respectively.

For cooling, it is possible to use any model of cooling that definesa transfer function between Tin and Tout, including the step-linearmodel: Tout = f(Tin).

6.2. Cooling models

A challenging point for cooling models is the incorporation of atwo-mode model. An example would be

Tn−(t) ={

Tn+(t) − b1, if Tn+(t) < Tthres

Tn+(t) − b2, if Tn+(t) � Tthres

(step − linear). (11)

Since we want to insert a transient model, we propose the use thestep linear model but in a form that captures the dynamics of thetemperature change:

Tn−(t) = ˇj(Tn−(t) − Tj+(t)) + Pn(t)Cair,AC

, (12)

where Cair,AC is the heat capacitance of the air in the AC

6.3. Putting them all together

In the first example below, we combine the transient heat modelwith Eqs. (8) and (11):

Example. As an example combination of the transient model witha constant heat generation model and a step-linear cooling model,the equations are:

Ti−(t) = Ti+(t) + a, i = 1 . . . n − 1, (constant rate)

Tn−(t) ={

Tn+(t) − b1, if Tn+(t) < Tthres

Tn+(t) − b2, if Tn+(t) � Tthres

, (step − linear)

Ti+(t) =n∑

i=1

wijT ij(t), ∀i. (from Eq.(5))

Our convention is to use the index i = n to denote the CRAC, i.e.,Tin(t) ≡ Tn+(t) and Tout(t) ≡ Tn−(t).

The above example assumes instantaneous heat generation andcooling models, i.e., hot air that enters the CRAC instantly exitscooled at the outlet. To account for heat capacitance at the servers,we combine the transient heat model with Eqs. (10) and (9):

Example. As an example combination of the transient model witha constant heat generation model and a step-linear cooling model,the equations are:

Ti+(t) =n∑

i=1

wijT ij(t), ∀i, (from Eq.(5))

Ts,j(t) = ˛i(Ts,j(t) − T−j

(t)) + Pj(t)Cj

, (j = 1, . . . , n − 1),

T−j

(t) = ˇj((T+j

(t) − Ts,j(t)), (j = 1, . . . , n − 1),

T−n = K(T+

j− T−

n ) + Pn(t)Cair,n

, (j = 1, . . . , n).

(13)

We use the same convention on CRAC numbering as server n.

The above example assumes instantaneous cooling models, i.e.,hot air that enters the CRAC instantly exits cooled at the outlet.

Page 12: Using transient thermal models to predict cyberphysical phenomena in data centers

uting: Informatics and Systems 3 (2013) 132– 147 143

Ii

7

hmratsrto

ttincsar

tpftp

A

Sp

Ac

fIfTttwe

1

2

We use an analytical representation of the temporal distribu-tion curves to formulate a linear system x = Ax + B and have ODEsolvers compute a solution. We selected to use a convex weighted

Table A.2Number of peaks in a curve.

G. Varsamopoulos et al. / Sustainable Comp

ntegration of our transient heat circulation model with non-nstantaneous CRAC cooling models is left as future work.

. Conclusions

In this paper we motivated the need for a light-weight transienteat circulation model to be used for quick evaluation of workloadanagement decisions. The proposed model (Section 3.1) accu-

ately characterizes thermal behavior as it varies over time. Theddition of a time dimension allows data center operators to modelhe periodic temperature fluctuations implied by discontinuous,tep-linear cooling models. While it maintains a good level of accu-acy, its speed is rather attractive: coarse-level ODE comptuationsake only a few seconds to complete, as opposed to several hoursf CFD computations.

The core hypothesis of the model is that heat, as it is produced athe computing equipment, is spread out as it follows the air flow inhe data center, and it arrives with a certain distribution at the airnlets of either other computing equipment or of the CRAC. Prelimi-ary simulations in Section 4 show that the shape of the distributionurves resembles a combination of inverse-Gaussian functions. Theplitting of the produced heat among the equipment is captured by

matrix of division factors uij, which is closely related to the heatecirculation matrix [20].

Research on this model is congruent with the objectives ofhe BlueTool project (http://impact.asu.edu/BlueTool), whose pur-ose is to develop and provide evaluation and validation toolsor green data center design. Operating fast yet accurate modelso produce results is a cornerstone factor for the success of theroject.

cknowledgements

We acknowledge Rose Robin Gilbert, Georgios E. Fainekos andunit Verma for their valuable assistance with the material in thisaper.

ppendix A. Yielding the transient model parameters for aontainerized data center

As mentioned in Section 4, our CFD experiments are per-ormed using OpenFOAM on a model as shown in Fig. A.19.n our experiments we modify two key files in the OpenFOAMramework: SetFieldsDict (located with the simulation results) andransientSimpleFoam.C (our custom solver for tracking heat flowhroughout the room). To derive ci,j curves for an entire data cen-er we perform one experiment for each server i during whiche measure the inlet temperatures for each server j. During each

xperiment we have three main phases:

. Steady flow state – Begin the simulation with initial values(temperature, flow directions, flow speeds). We begin the sim-ulation with equal temperatures throughout the room (291 K)and with servers and CRAC not emitting or removing any heat.This phase is set to run for 20 s so that flow within the roomreaches a somewhat steady state. For this phase, and all suc-cessive stages (with one exception) the exiting temperature ofeach server is overridden within the solver file to emit air at291 K.

. Heat emission –For a single second (simulation time) we emit

heat from the server i specified at start time. To do this, wemust remove the override term within the solver for server i toallow higher heat emission. This necessitates a recompilationof the solver before this phase begins. While the compilation

Fig. A.19. Mesh model of a hypothetical containerized data center.

is happening, SetFieldsDict is modified to include the heataddition to server i.

3. Continued measurement – To begin this phase, we must returnto masking all outlet temperatures to be 291 K. Once again,this requires modification to the solver and recompilation.SetFieldsDict is returned to its original state of no heat addition.Once the compilation is complete the simulation is run until theroom temperature at all locations is a steady 291 K, denotingthat the remnants of heat from the heat emission phase arethrough traversing the room and removed by the outlet heatmask. The exact length of time the simulation takes to reachthis state is dependent on the layout of the data center, and assuch a single experiment can be used to find a rough estimateof the time it takes. The small size of the containerized datacenter means that a runtime of 55 s ensures that all heat hasbeen removed from the room.

After each simulation, the temperature trace files for each serverare traversed and collected into a single matrix where columns aretime steps and rows are servers. Once ci,j curves are created usingthe temperature trace matrix, Section 4.1 explains how to createci,j curves.

We counted the number of peaks in the 625 curves produced andwe found a distribution of peaks as showin in Table A.2. We use thisresult to justify our decision to use three gamma distributions toapproximate those curves.

Appendix B. Fitting of the cij(t) temporal distribution cuves

Peaks in the curve: 1 2 3 �4

Curves with that many peaks: 518 73 8 28Cumulative percentile: 83% 95% 96% 100%

Page 13: Using transient thermal models to predict cyberphysical phenomena in data centers

1 ting:

sp

t

44 G. Varsamopoulos et al. / Sustainable Compu

um of three component gamma distributions (with high shapearameter) for each cij(t) curve, for the following reasons:

The curves seem to have an exponential tail (i.e., exponentialfade out). The candidate distributions considered were InverseGaussian (IG(�, �)) and high-shape Gamma(�, �). Gamma distri-bution was chosen over IG for ease of analytical integration anddifferentiation.Many curves exhibit multiple peaks, therefore we chose to repre-sent them as weighted sums of Gamma distribution components.We chose the shape parameter � for the gamma to be 8. Many ofthe curves sharp peaks, and the choice of � = 8 to be the lowestshape value that was good enough for keeping the fitting errorlow.The number of peaks vary from curve to curve. Using a vari-able number of component gamma distributions makes it hardto analytically yield and then program the ODE formulation ofthe linear system. Fixing that number makes formulation easier.In the small containerized simulation, most curves have up tothree peaks, so for this example, we chose to fix the number to

three.

The chosen analytical fit is given by the following formula ofhree weighted gamma distributions:

Fig. B.20. Example Gamma fittings of the temporal

Informatics and Systems 3 (2013) 132– 147

c(t) = v11(t) + v22(t) + v33(t)

= v1(t − �1, 8, a1) + v2(t − �2, 8, a2) + v3(t − �3, 8, a3)

= v1a8

1(t − �1)7e−a1(t−�1)

7!+ v2

a82(t − �2)7e−a2(t−�2)

7!

+v3a8

3(t − �3)7e−a3(t−�3)

7!, (v1 + v2 + v3 = 1).

(B.1)

In this formulation, there are two errors to be accounted for:

• Approximation error to the input curve, i.e.,

c(t) − (v11(t) + v22(t) + v33(t)) = approx.

• Convexity error of the weights, i.e.,

1 − (v1 + v2 + v3) = convex.

Fitting was done using MATLAB’s curvefit toolbox. We let the built-in solvers take care the approximation error, and use an iterativeapproach to minimize the convexity error (Fig. B.20).

distribution curves with respect to Eq. (B.1).

Page 14: Using transient thermal models to predict cyberphysical phenomena in data centers

uting:

As

t

T

T

T

T

j

T

wfief

i

Ws

T

w

x

itot

x

z

s

G. Varsamopoulos et al. / Sustainable Comp

ppendix C. Formulation of a data center model as a linearystem

The heat flow in data center is modeled according to three equa-ions described below

+j

(t) =n∑

i=1

wi,j

∫ 0

−∞T−

i(t + �)cij(�)d�, (C.1)

˙ s,j(t) = ˛i(Ts,j(t) − T−j

(t)) + Pj(t)Cj

, (j = 1, . . . , n − 1), (C.2)

˙ −j

(t) = ˇj((T+j

(t) − Ts,j(t)), (j = 1, . . . , n − 1), (C.3)

˙ −CRAC = K(T+

j− T−

CRAC ) + PCRAC (t)Cair

, (j = 1, . . . , n) (from Eq. (6.3)).

(C.4)

The cumulative contributed temperature from server i to server at time t can be expressed in the form:

ij(t) =∫ 0

−∞cij(�)T−

i(t + �)d�, (C.5)

here T−i

(t + �) is the exiting source temperature at time t + �, andunction cij(�) is summation of three gamma distributions hav-ng variable scale (a) and fixed shape (n) parameter which can bexpressed in the form as in Eq. (B.1). By using b8

(k)ij = v(k)ija8(k)ij/7!,

or k = 1, 2, 3, we have:

cij(�) = b81ij

(� − �1ij)7e−(�−�1ij)a1ij + b8

2ij(� − �2ij)

7e−(�−�2ij)a2ij

+b83ij

(� − �3ij)7e−(�−�3ij)a3ij .

(C.6)

Putting Eq. (C.6) into Eq. (C.5) with � ≤ 0 we can express Eq. (C.5)n the form:

Tij(t) = −∫ 0

−∞( b8

1ij(� + �1ij)

7e(�+�1ij)a1ij

+ b82ij

(� + �2ij)7e(�+�2ij)a2ij

+ b83ij

(� + �3ij)7e(�+�3ij)a3ij )T−

i(t − �)d�.

(C.7)

e rewrite Eq. (C.7) by splitting it into three terms and assign aeparate functional variable for each term in the form:

ij(t) = x1ij(�) + x2ij(�) + x3ij(�), (C.8)

here

kij(t) = −∫ 0

−∞(b8

kij(� + �kij)7e(�+�kij)akij )T−

i(t − �)d�. (C.9)

Reduction of Eq. (C.9) using repetitive integration by parts andntroduction of new variables end up with seven dependent func-ional variable represented as zkl(ij)(t)(l = 1, 2, . . ., 7). The set ofrdinary differential equations obtained can be represented inerms of three equation for different values of k and n.

˙ kij(t) = −b8kij�

7kije

�kijakij T−i

(t) − akijxkij(t) + 7bkijzk1ij(t), (C.10)

zklij(t) = b8−lkij

�8−(l+1)kij

e�kijakij T−i

−akijzklij(t) − (8 − l + 1)bkijzk(l+1)ij(t),(C.11)

˙ k7ij(t) = akije�kijakij T−

i− bkijzk7ij(t). (C.12)

Eq. (C.10) represents each term for the Eq. (C.8), Eq. (C.11) repre-ents zklij(t)’s differential equations for l = 1, 3, . . ., 6 and Eq. (C.12)

Informatics and Systems 3 (2013) 132– 147 145

represents the same as Eq. (C.11) but for l = 7. So time differenti-ated form of Eqs. (C.8), (C.11) and (C.12) from 22 sets of dependentordinary differential equations.

In a nut shell the set of ordinary differential equations tobe solved in order to find the output temperature of the serverare:

T ij(t) = −b81ij

�71ij

e�1ija1ij T−i

(t) − a1ijx1ij(t) + 7b1ijz11ij(t)

−b82ij

�72ij

e�2ija2ij T−i

(t) − a2ijx2ij(t) + 7b2ijz21ij(t)

−b83ij

�73ij

e�3ija3ij T−i

(t) − a3ijx3ij(t) + 7b3ijz31ij(t),

(C.13)

zk1ij(t) = b7kij

�6kij

e�kijakij T−i

(t) − akijzk1ij(t) − 6bkijzk2ij(t),

zk2ij(t) = b6kij

�5kij

e�kijakij T−i

(t) − akijzk2ij(t) − 5bkijzk3ij(t),

zk3ij(t) = b5kij

�4kij

e�kijakij T−i

(t) − akijzk3ij(t) − 4bkijzk4ij(t),

zk4ij(t) = b4kij

�3kij

e�kijakij T−i

(t) − akijzk4ij(t) − 3bkijzk5ij(t),

zk5ij(t) = b3kij

�2kij

e�kijakij T−i

(t) − akijzk5ij(t) − 2bkijzk6ij(t),

zk6ij(t) = b2kij

�kije�kijakij T−

i(t) − akijzk6ij(t) − bkijzk7ij(t),

(C.14)

TS,j(t) = ˛j(TS,j(t) − T−j

(t)) + Pj(t)Cj

, (j = 1, . . . , n − 1) (Eq. (C.2))

(C.15)

T−j

(t) = ˇj((T+j

(t) − TS,j(t)), (j = 1, . . . , n − 1)(from Eq. (C.1)),

(C.16)

T−n (t) = K(T+

CRAC − T−CRAC (t)) + PCRAC (t)

Cair, (j = n). (C.17)

For the set of Eq. (C.14) the value of k is 1, 2 and 3 and theabove equations are solved using variable step Runge Kutta Methodincorporated in MATLAB function ode45.

For framing the differential equation in MATLAB a matrix A isdefined with coefficients corresponding to the parameters of dif-ferential equations framed from (C.13) through (6.3) along withmatrix B consisting of power values:

X = A × X + B (C.18)

The matrix A is a square matrix of dimension(2n − 1 +3n2 + 21n2 × 2n − 1 +3n2 + 21n2) with first n − 1 rowsas differential of server temperature of each server (Ts(1−(n−1))),next n − 1 rows as the outlet temperature of each server(T−

1−n), nextrow for the CRAC outlet temperature next 3n2 as differential of thex variables (xk,1,1–xk,n,n) for k = 1, 2, 3 and last 21n2 as differential

of the z variables (zk,l,1,1–zk,l,n,n) for k = 1, 2, 3 and l = 1, 2, . . ., 7.The columns represents the same as rows except for they areundifferentiated variables for each of the parameters mentionedabove.
Page 15: Using transient thermal models to predict cyberphysical phenomena in data centers

1 ting:

X

a

B

R

[

[

[

[

[

[

[

[

[

[

[

[

[

[

46 G. Varsamopoulos et al. / Sustainable Compu

The X and B vectors can be represented as:

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

Ts1

...

Ts(n−1)

T−1

...

T−n−1

T−CRAC

x1,1,1)

x2,1,1)

x3,1,1

x1,1,2

...

x3,n,n

z1,1,1,1)

...

z3,7,n,n

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, (C.19)

nd

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

P1(t)/C1

P2(t)/C2

...

Pn−1(t)/Cn−1

0...

0

PCRAC/Cair

0...

0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

. (C.20)

eferences

[1] U.S. Environment Protection Agency, Report to congress on server anddata center energy efficiency (2007). http://www.energystar.gov/ia/partners/prod development/downloads/EPA Datacenter Report Congress Final1.pdf

[2] Z. Abbasi, G. Varsamopoulos, S.K.S. Gupta, Thermal aware server provision-ing and workload distribution for internet data centers, in: ACM InternationalSymposium on High Performance Distributed Computing (HPDC10), 2010, pp.130–141, http://dx.doi.org/10.1145/1851476.1851493.

[3] A. Banerjee, T. Mukherjee, G. Varsamopoulos, S.K.S. Gupta, Cooling-aware andthermal-aware workload placement for green HPC data centers, in: Inter-national Conference on Green Computing Conference (IGCC2010), 2010, pp.245–256, http://dx.doi.org/10.1109/GREENCOMP.2010.5598306.

[4] T.D. Boucher, D.M. Auslander, C.E. Bash, C.C. Federspiel, C.D. Patel, Viability ofdynamic cooling control in a data center environment, in: (IEEE) Inter SocietyConference on Thermal Phenomena, 2004, pp. 593–600.

[5] J. Donald, M. Martonosi, Techniques for multicore thermal management: Clas-sification and new exploration, SIGARCH Computer Architecture News 34 (2)(2006) 78–88, http://dx.doi.org/10.1145/1150019.1136493.

[6] M. Fontecchio, Companies reuse data center waste heat to improve energy

efficiency (2008, May). http://searchdatacenter.techtarget.com/news/article/0,289142,sid80 gci1314324,00.html

[7] T. Heath, B. Diniz, E.V. Carrera, W.M. Jr., R. Bianchini, in: In Proceedings ofthe Symposium on Principles and Practice of Parallel Programming (PPoPP),Chicago, IL, USA, 2005. http://doi.acm.org/10.1145/1065944.1065969

Informatics and Systems 3 (2013) 132– 147

[8] T.D. Boucher, D.M. Auslander, C.E. Bash, C.C. Federspiel, C.D. Patel, Viabilityof dynamic cooling control in a data center environment, Journal of Elec-tronic Packaging 128 (2) (2006) 137–144, http://dx.doi.org/10.1115/1.2165214http://link.aip.org/link/?JEP/128/137/1

[9] Q. Tang, S.K.S. Gupta, D. Stanzione, P. Cayton, Thermal-aware task schedulingto minimize energy USAge for blade servers., in: 2nd IEEE Int’l Dependable,Autonomic, and Secure Computing (DASC’06), 2006, pp. 195–202.

10] Q. Tang, S.K.S. Gupta, G. Varsamopoulos, Thermal-aware task scheduling fordata centers through minimizing heat recirculation, in: IEEE Cluster, 2007, pp.129–138.

11] T. Mukherjee, A. Banerjee, G. Varsamopoulos, S.K.S. Gupta, Model-driven co-ordinated management of data centers. (Elsevier) Computer Networks, 54(16),in: Special Issue on Managing Emerging Computing Environments, 2010, pp.2869–2886, http://dx.doi.org/10.1016/j.comnet.2010.08.011.

12] G. Varsamopoulos, A. Banerjee, S.K.S. Gupta, Energy efficiency of thermal-aware job scheduling algorithms under various cooling models., in:International Conference on Contemporary Computing IC3, Noida, India,2009, pp. 568–580, http://dx.doi.org/10.1007/978-3-642-03547-0 54http://impact.asu.edu/thesis/Varsamopoulos2009-cooling-models.pdf

13] S. Gupta, G. Varsamopoulos, A. Haywood, P. Phelan, T. Mukherjee, Bluetool:using a computing systems research infrastructure tool to design and test greenand sustainable data centers., in: I. Ahmad, S. Ranka (Eds.), Handbook of Energy-Aware and Green Computing, 1st ed., Chapman & Hall/CRC, 2012.

14] E.K. Lee, I. Kulkarni, D. Pompili, M. Parashar, Proactive thermal managementin green datacenters, The Journal of Supercomputing 60 (2) (2012) 165–195,http://dx.doi.org/10.1007/s11227-010-0453-8.

15] S.K.S. Gupta, R.R. Gilbert, A. Banerjee, Z. Abbasi, T. Mukherjee, G. Varsamopou-los, GDCSim – an integrated tool chain for analyzing green data center physicaldesign and resource management techniques, in: Proceedings of InternationalGreen Computing Conference(IGCC11), IEEE, Orlando, FL, 2011.

16] Z.T. Hrvoje Jasak, Aleksandar Jemcov, OpenFOAM: A C++library for complexphysics simulations, in: International Workshop on Couplted Methods inNumerical Dynamics IUC, 2007, pp. 47–66. http://powerlab.fsb.hr/ped/kturbo/openfoam/papers/CMND2007.pdf

17] T. Mukherjee, A. Banerjee, G. Varsamopoulos, S.K.S. Gupta, S. Rungta, Spatio-temporal thermal-aware job scheduling to minimize energy consumption invirtualized heterogeneous data centers, Computer Networks 53 (17) (2009)2888–2904.

18] J. Moore, J. Chase, P. Ranganathan, Weatherman: automated, online and pre-dictive thermal mapping and management for data centers, in: Proceedingsof the 2006 IEEE International Conference on Autonomic Computing,ICACXXXX’06, IEEE Computer Society, Washington, DC, USA, 2006, pp.155–164, http://dx.doi.org/10.1109/ICAC.2006.1662394.

19] J. Moore, J. Chase, P. Ranganathan, Making scheduling “cool”: temperature-aware workload placement in data centers., in: Proceedings of theannual conference on USENIX Annual Technical Conference, ATECXXXX’05,USENIX Association, Berkeley, CA, USA, 2005, pp. 5–5. http://dl.acm.org/citation.cfm?id=1247360.1247365

20] Q. Tang, S.K.S. Gupta, G. Varsamopoulos, Energy-efficient thermal-aware taskscheduling for homogeneous high-performance computing data centers: acyber-physical approach, IEEE Transactions on Parallel and Distributed Sys-tems, Special Issue on Power-Aware Parallel and Distributed Systems 19 (11)(2008) 1458–1472, http://dx.doi.org/10.1109/TPDS.2008.111.

21] J. Moore, R. Sharma, R. Shih, J. Chase, C. Patel, P. Ranganathan,;1; Goingbeyond CPUs: the potential of temperature-aware data center architec-tures, in: First Workshop on Temperature-Aware Computer Systems, 2004.http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.87.6147

22] A. Kubilay, S. Zimmermann, I. Zinovik, B. Michel, D. Poulikakos, Compactthermal model for the transient temperature prediction of a water-cooledmicrochip module in low carbon emission computing, Numerical Heat TransferA (59) (2011) 815–835, http://dx.doi.org/10.1080/10407782.2011.578014.

23] L. Marshall, http://www.coolsimsoftware.com/wwwroot/LinkClick.aspx?fileticket=ap5wEZJbecQ%3D&tabid=189, 2010.

Georgios Varsamopoulos is Research Assistant Profes-sor in the School of Computing, Informatics and DecisionSystems Engineering, at Arizona State University, Tempe,AZ, USA. He received his B.E. in Computer Science andEngineering from University of Patras, Greece, his M.S.in Computer Science from Colorado State University, FtCollins, Colorado, USA, and his Ph.D. in Computer Sci-ence from Arizona State University. His research interestsinclude modeling of cyber-physical systems, performanceand operation optimization, energy-aware and context-aware computing, wireless and mobile communicationsand security. He is with the editorial board of Elsevier’sSimulation and Modeling Practice (SIMPAT) and he has

served as a co-chair and reviewer for numerous conferences and reviewer for sev-eral journals, including IEEE Transactions on Parallel and Distributed Systems, IEEE

Transactions in Mobile Computing and Elsevier Computer Networks. His researchwork has been funded by the National Science Foundation, the U.S. Department ofTransporation, Science Foundation Arizona (SFAz), Intel Corporation and RaytheonCorporation. He is co-recipient of best poster award. He is faculty member of theImpact Lab (http://impact.asu.edu/).
Page 16: Using transient thermal models to predict cyberphysical phenomena in data centers

uting:

G. Varsamopoulos et al. / Sustainable Comp

Michael Jonas is a Ph.D. student in the School of Com-puting, Informatics, and Decision Systems Engineering atArizona State University. He works at Microsoft Corpo-ration doing data analytics at Windows Telemetry. Hisresearch interests include artificial intelligence, cognitivearchitectures, knowledge representation and modeling ofcomplex systems, and planning.

Joshua Ferguson is currently an M.S. student in theComputing, Informatics, and Decision Systems at ArizonaState University. His research interests include data cen-ter management, high-performance computing, and thenetwork management of each.

Joydeep Banerjee is currently a Ph.D. student in theSchool of Computing, Informatics and Decision SystemsEngineering at the Arizona State University. His researchinterests include thermal management of data-centers,

Informatics and Systems 3 (2013) 132– 147 147

green computing, sustainable computing and modeling of energy storage devices indata centers. Joydeep Banerjee received B.E degree in Electronics and Telecommu-nication Engineering from Jadavpur University, India.

Sandeep K.S. Gupta is the Chair of Computer Engineer-ing Graduate Program and a Professor in the School ofComputing, Informatics, and Decision Systems Engineer-ing (SCIDSE), Arizona State University, Tempe, USA. Hereceived the B.Tech degree in Computer Science and Engi-neering (CSE) from Institute of Technology, Banaras HinduUniversity, Varanasi, M.Tech. degree in CSE from IndianInstitute of Technology, Kanpur, and M.S. and Ph.D. degreein Computer and Information Science from Ohio State Uni-versity, Columbus, OH. His current research is focused oncyber-physical systems with emphasis on green comput-ing, pervasive healthcare, and criticality-aware systems.Gupta’s research awards include a best 2009 SCIDSE senior

researcher and a best paper award. His research has been supported by Science Foun-dation of Arizona, National Science Foundation, National Institutes of Health, IntelCorp., Raytheon Missile Systems, and Northrop Grumman Corp. He has served orcurrently serving on several editorial boards including IEEE Transactions on Par-allel and Distributed Systems, Springer Wireless Networks, Elsevier SustainableComputing, and IEEE Communication Letters. Gupta has served on several programcommittees, including Percom, Wireless Health, BSN, and ICDCS, chair/co-chairedseveral workshops and conferences, including Greencom and BodyNets, and co-

edited several special issues for various journals and magazines, including IEEETransactions on Computers (SI on Data Management and Mobile Computing), IEEEPervasive Computing (SI on Pervasive Computing), and IEEE Proceedings (SI oncyber-physical systems). Gupta is a senior member of IEEE and heads the ImpactLab (http://impact.asu.edu) at ASU.