sebasen goasguen –sebgoa@clemson - fermilabcd-docdb.fnal.gov/0040/004050/001/cloud-fermi.pdf ·...

Post on 28-Apr-2018

219 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

August 20th, Fermi Lab 1

Sebas&enGoasguen–sebgoa@clemson.edu

SchoolofCompu,ng

ClemsonUniversity,Clemson,SC

Scien,ficAssociateatCERN

Summer2009andSummer2010

August 20th, Fermi Lab 2

Outline

•  CloudBasics•  BuildingaCloudProvider

– Lxcloud@CERN

•  VOCsandClouds– ResearchdoneatClemson

August 20th, Fermi Lab 3

WhatisCloudCompu&ng?

August 20th, Fermi Lab 4

Afewreferences

"Above the clouds: A Berkeley view of cloud computing"http://berkeleyclouds.blogspot.com/

"A break in the clouds: towards a cloud definition"L.M Vaquero et al. SIGCOMM computer communication review, 2008. http://portal.acm.org/citation.cfm?id=1496100

"An EGEE Comparative Study - Grid cloud comparative study"M-Elian Begin, 2009

August 20th, Fermi Lab 5

OntheHypecurve

• NowprobablyatthetopoftheHype–Oct09

August 20th, Fermi Lab 6

Trendy…

• Source:hOp://www.google.com/trends

August 20th, Fermi Lab 7

Cloudforma&on

• SlideadaptedfromRichWolski,UCSB

August 20th, Fermi Lab 8

WhiteHouseisgoingtotheCloud• Reducecosts…SeeApps.gov

August 20th, Fermi Lab 9

DOEandNASAtoo(Checknovacc.org)

August 20th, Fermi Lab 10

EverythingismovingtotheCloud…StayonEarththough!

• hOp://contactdubai.com/tag/saas‐soTware‐or‐storage‐as‐a‐service

August 20th, Fermi Lab 11

An“Old”idea:OSI/AnatomyoftheGrid/Windowsarchitectures…

August 20th, Fermi Lab 12

WhatistheCloud?The*aaS

•  SaaS–SoVwareasaService‐•  PaaS–PlaYormasaService‐

•  Iaas–InfrastructureasaService–•  Servicecomposi,onatalllayersofdistributedsystem.Buildsasystemofsystem

•  SoTwareandhardwarereuse

•  Tendencyforthe*aaS‐itusbutthesethreearethemainones

August 20th, Fermi Lab 13

SoVwareasaService

August 20th, Fermi Lab 14

Skyisthelimit…

•  Phoneapps…FermiVoice?

August 20th, Fermi Lab 15

PlaYormasaService

August 20th, Fermi Lab 16

InfrastructureasaService/Comingofageofvirtualiza&on

August 20th, Fermi Lab 17

WhatistheCloud?The*aaS•  SaaS–SoVwareasaService–

– EasyAccesstohostedapplica,onsoverthenetwork.MostlikelyusingyourBrowser

– APItotheseapplica,ons•  PaaS–PlaYormasaService–

– Environmenttodeploynewapplica,ons– Restrictedcapabili,esoffered– APItothispla]ormandaccesstoSaaSAPI

•  Iaas–InfrastructureasaService–– AccesstoHardwareresources– APItomakeresourcealloca,onrequests

August 20th, Fermi Lab 18

KeyFeatures•  Youdon’tknowwhat’sbehindbutitworks

– Transparency•  YouPaywhatyouuse

– U,litypricing•  Yougetwhatyouaskfor(On‐demand)

– ReadthefineprintsoftheSLAs…•  Itscalesifyouneedmore

– Howfardoesitscale?– Doesn’tthismeantheunderlyingresourcesareunderu,lized?

August 20th, Fermi Lab 19

Whynow?Evolu&onoftheMashupRevolu&onthanks

toanAPI“explosion”

August 20th, Fermi Lab 20

Whynow?•  BigInternetcompaniesfacedalotofdatatoanalyze:weblogs…

•  Developedinhouse:Newfilesystem(Hadoop),newanalysisframework(Map‐Reduce)

•  Massiveamountofresourcesallacrosstheplanet:>500,000coresforGoogle?

•  Higherneedstoconsolidate:virtualiza,on,energycosts.

•  Newdevices:iPhone/G1•  Atrulyinter‐connectedplanet

August 20th, Fermi Lab 21

Afewinteres&ngthings…tos&rthepot•  Industryisleading.Isacademiabehind?

•  Whocaresaboutstandards?(>20bodiesworkingoncloudstandards…)

•  Weshouldswitchparadigmandrewriteapplica,onsoncetheyare6monthsold.

August 20th, Fermi Lab 22

Outline

•  CloudBasics•  BuildingaCloudProvider

– Lxcloud@CERN(Incollabora&onwithUlrichSchwickerath,EwanRoche,BelmiroMoreiraandRomainWartel)

•  VOCsandClouds– ResearchdoneatClemson

August 20th, Fermi Lab 23

IaaSlevel•  Forconsolida,ngservices

– UsedinITforawhilenow– FermiGridservicesrunninginXenVMs

•  Forofferingon‐demandservices– E.gVOBoxes,replacehardwarerequest

•  Forvirtualizinglargescaleservices– Clusterson‐demand

•  Virtualiza,onisakeyenablerforIaaS

August 20th, Fermi Lab 24

BatchVirtualiza,on

  RunbatchjobswithinVirtualMachines   BeOerapplica,onenvironment

  Custommadebyuser   Increasedsecurity   BeOercontrolonresourcesharing

  Mul,‐coreapps   Increasedflexibilityontheadminside

  CanrunapreferredOSonthemetal

Whyvirtualizing“Batch”?

August 20th, Fermi Lab 25

BatchVirtualiza,on

Type1:Runmyjobs(inyourVM)

Type2:RunmyjobsinmyVM

Howtovirtualize“Batch”…smoothly?

August 20th, Fermi Lab 26

Type3:Givememyinfrastructurei.eaVMorabatchofVMs

Movingtothecloud:

August 20th, Fermi Lab 27 3/23

Deployment Models Innovation in Cloud Computing Architectures

August 20th, Fermi Lab 28

Maincomponents/characteris&csSetofHypervisors•  Physicalmachineswithavirtualmachinemonitor•  XenorKVM...orHyper‐V...orVMwareESx...VMprovisioningsystem•  OpenNebula•  Nimbus•  Eucalyptus•  Pla]ormISF•  oreventradi,onalschedulerslikePBS/Maui.Imagedistribu&onmechanism•  Sharedfilesystem(e.gNFS,AFS,PVFS,Lustre...)•  Copyimages(e.gscp,wget,BiOorent)Networking•  Private/Publicbridged•  NAT

August 20th, Fermi Lab 29

ThoughtsforOSG…tos&rthepotagain•  Sitesneedtohavehypervisors,that’sastar,ngpoint.Withoutit/themtherewon’tbeOSGclouds.

•  WhatVMM/HypervisortheyusedoesnotmaOer…butmyguessisthat80%willuseKVM

•  WhatprovisioningsystemtheyuseisamaOeroflocaltechnicalsetup,tasteandrela,onships

•  Sitescandothisnow•  Thehardproblemisintheimagetransferandtrust…SeeHEPiXvirtualiza,onworkinggroup

August 20th, Fermi Lab 30

CERN'sLXCLOUDarchitecture

•  ImagerepositorywithGoldennodes.

•  VMinstancesnotquaOormanagedhavefinitelife,me

•  SpecificIP/MACsarepinnedtohypervisors

•  Currentlytes,ngtwoprovisioningsystem:OpennebulaandPla]ormISF.

August 20th, Fermi Lab 31

August 20th, Fermi Lab 32

ProvisioningsystemOpenNebulaandPla]ormISFarecurrentlybeingevaluated.ResultsshowninthistalkwereobtainedwithOpenNebula.

OpenNebulaoutoftheUniversityCompultenseofMadrid•  C/C++corewithRubydriversandcommandlineinterface•  MysqlandSqlitebackends•  Usesshascommunica,onbetweenfrontendandhosts•  XML‐RPCAPI•  •  SupportforLVMcontributedbyCERN•  EnablesHybridclouds(i.einstan,a,ononremotecloudproviders)

•  ImplementssubsetofEC2interfaceaswellasupcomingOCCIinterfaceforPubliccloudinterface.

August 20th, Fermi Lab 33

ComparisonwithSimilarTechnologiesOpenNebula - Architecture, Current Status & Roadmap

Platform ISF VMware Vsphere Eucalyptus Nimbus OpenNebula

Virtualization Management VMware, Xen VMware Xen, KVM Xen Xen, KVM,

VMware

Virtual Network Management Yes Yes No Yes Yes

Image Management Yes Yes Yes Yes Yes

Service Contextualization No No No Yes Yes

Scheduling Yes Yes No No Yes

Administration Interface Yes Yes No No Yes

Hybrid Cloud Computing No No No No Yes

Cloud Interfaces No vCloud EC2 WSRF, EC2 EC2 QueryOGF OCCI

vCloud

Flexibility and Extensibility Yes No Yes Yes Yes

Open Source No No GPL Apache Apache

August 20th, Fermi Lab 34

CERN'sLXCLOUDdetails

•  AlogscpandbiOorrentimagedistribu,onhasbeenimplemented

•  Hypervisorsrunu,li,estodetectwhatVMtheyareallowedtorunandwhichimagestheyneedtodownload

•  OpenNebulatriggersinstan,a,onviassh

•  InstancesbasedonLVMsnapshots

August 20th, Fermi Lab 35

August 20th, Fermi Lab 36

ImageDistribu&on

Push:•  Sequen,alSCP•  logarithmicSCP(scp‐wave)•  hOp://code.google.com/p/scp‐wave/

Pull:•  wgetviaanhOpbasedrepository(locally)•  BiOorrent(RomainWartel,BelmiroMoreira@CERN)

SharedFS•  NFS•  PVFS,Lustre...

August 20th, Fermi Lab 37

Imagedistribu&onresults(thxtoBelmiro)

August 20th, Fermi Lab 38

Guidingtheprovisioning

•  Definepoliciestocomposethebatchfarm•  Automatetheprovisioningofthevirtualmachinessuchthatthepoliciesareenforced.

•  e.gInspectthejobqueueanddeducethebestcomposi,onofthebatchfarm.IntermsofSMPVMs,OS...

•  AsizerisusedtomonitorthepoolofVMinstancesandevaluatethepolicies.

•  Currentlyonlyonepolicy:"KeepthepoolfullwiththepropersharesofVMtypes"

•  SeeICAC2010andCCGRID2009papers

August 20th, Fermi Lab 39

AutonomicProvisioningResults

August 20th, Fermi Lab 40

EarlyResultsofsizer

August 20th, Fermi Lab 41

JoiningtheBatchsystem...Acontextualiza=onproblem...

CONTEXT = [ vmid = "$VMID", TTL = "3", AFS = "off", files = "/opt/vmimage/init.sh /opt/vmimage/etchosts /opt/vmimage/etcsysconfigifcfg /opt/vmimage/id_rsa.pub /o pt/vmimage/lsfcontext.conf /opt/vmimage/etcsysconfignetwork", target = "xvdb" ]

• FilesandvariablesarestoredinaISOcreatedonthefly.

• StartupscriptmountsthisISOandrunscontextualiza,onscript.

• VMsaresetupasdynamichostsintheLSFpool.

August 20th, Fermi Lab 42

ScalabilityTests...7,500slotsinLSFviaOpennebula

August 20th, Fermi Lab 43

Tes&ngLSFscalability

August 20th, Fermi Lab 44

IaaSatClemson•  Thereallyeasyway:

– KVMonaregularHPCcluster

– NATnetworking(everyVMgetsitsownNAT)

– BaseimageonNFSserver– KVMsnapshotmodecreatestemporarydiskinscratch,diskdiscardedonceinstanceisshutdown

– SubmitVMsasPBSjobs

IMAGE=/home/sebgoa/kvm/star5.img

export TMPDIR=/local_scratch

kvm -hda $IMAGE -net nic,model=e1000 -net user -m 1280 -snapshot -nographic;

August 20th, Fermi Lab 45

IaaSatClemson

•  But…– NosharedFSbetweenVMs

– LookslikeeachVMhasthesameIP

– Can’tuseregularjobmanagementsystemstorunjobsinthoseVMs(needglidein/proxylikesolu,on)

•  ThissetuphasbeenoneofthekeydriversforourdevelopmentofKestrel:AnXMPPbasedjobmanagementsystem

August 20th, Fermi Lab 46

Kestrel

•  AjobmanagementframeworkusingtheXMPPprotocol

•  Startedasastudentproject

•  UsesInstantMessagingconceptsofno,fica,ons

•  Prac,calinadversenetworkcondi,ons

hop://wiki.github.com/legastero/Kestrel/hops://twiki.grid.iu.edu/bin/view/CampusGrids/InstallingKestrel

August 20th, Fermi Lab 47

Boo&ngVMsisextremelyfast(20VMs/sec)

August 20th, Fermi Lab 48

STARSuccesswithClemsonIaaSandKestrel

•  “Buttosimulatetheequivalentsampleof12.2BillionMonte‐Carloeventswith~10MillionacceptedbyeventtriggeringaTerfulleventreconstruc,on,wewouldhavetaken3yearsatBNLon50machinesThisMonte‐Carloeventgenera,onwouldessen,allynothavebeendone.Withtheresourcesfromcloud,wetook3‐4weeks.”–JeromeLauretBNL.

August 20th, Fermi Lab 49

Conclusions•  TheCloudisherelet’shopeitgetssunny•  APIexplosionopensuppossibili,es•  FocusingonIaaSlayers,LXCLOUDandClemson’sclustershavebeendeveloped/enhancedtoprovisionVMs.

•  GreatscalabilitywithOpenNebula•  KVMshowsgreatpromiseespeciallywiththesnapshotmode

•  PerformancewillgetevenbeOer• MayneedspecializedjobmanagementsystemstomakeuseofCloudsacrossmul,‐site

August 20th, Fermi Lab 50

ThankstoNSF,DOEandOSGThankstoLanceStout,MikeMurphy,

MichaelFenn,LintonAbrahamandalltheotherstudents…

ThankstoCERNandtheIT/PES‐PSgroupThankstoJeromeLauret,MaohewWalker

Ques&ons?:sebgoa@clemson.eduhop://cirg.cs.clemson.edu

August 20th, Fermi Lab 51

Outline

•  CloudBasics•  BuildingaCloudProvider

– Lxcloud@CERN(Incollabora,onwithUlrichSchwickerath,EwanRoche,BelmiroMoreiraandRomainWartel)

•  VOCsandClouds– ResearchdoneatClemson

August 20th, Fermi Lab 52

VOC:VirtualOrganiza&onCluster(JGC+FGCSpapers)

August 20th, Fermi Lab 53

WhyVOCsakaClouds?•  Observa,onthatwhatpeoplewantisresourceswiththeirownOS/Appsandcentralscheduling:Pilotjobframeworks.

•  AcloudisaclusteroverWAN

•  Thereforethereisaneedfor– Awaytorequest/startthenodes– Awaytocreateavirtualnetwork– Awaytorunjobsinthem

•  VerysimilartoglideinWMSbutthepilotsasktostartVMs

August 20th, Fermi Lab 54

Mul&‐SiteOverlay(ICAC2010)

August 20th, Fermi Lab 55

VOCImplementa&on

•  Mul,pleconfigura,ons:– Type1:SharedheadnodeonPhysicalcluster,VOisunawareofVOC(e.gLXCLOUD)

– Type2:VOprovidesvirtualheadnodesonmul,plegridsites.

– Type3:VOusesanoverlaynetworkwithasingleheadnode(e.gSTAR).

August 20th, Fermi Lab 56

Type1:Implementa&on

•  KVMvs.Xenforeaseofuse

•  NormalClusteru,li,es/techniques

•  NFSshare•  AndPVFSsetup

•  KVMoffersasnapshotmodethatgivesusabilitytouseasingleimagefile.Writesaretemporary

August 20th, Fermi Lab 57

Load‐DrivenProvisioning(CCGRID09)•  DynamicProvisioningisdoneviatheuseofaWatchdogontheVOCheadnode

•  WatchdogmonitorsincomingjobsontheOSGgatekeeper(Condorjobmanagerisused)

•  Whenjobsareinthelocalschedulerqueue,thewatchdogstartsaVMonaphysicalhost(sta,cmappingbetweenhostandguestcurrently).XML‐RPCsystem

•  WhenVMstarts,CondorinsidetheVMstartsandadver,zesitspresencetothecentralmanager‐>Jobsrun.

August 20th, Fermi Lab 58

ExperimentalResults•  EngageVOonOSG•  SiteClemson‐BirdnestonOSGProduc,on

•  Clustersizerespondstoload,Simula,onResultsconfirm(PendingIPDPSpaper,simulator:simVOCavailableathOp://cirg.cs.clemson.edu/soTware/simvoc)

August 20th, Fermi Lab 59

From: ACAT 2010, February 22-27th Jaipur/India

Engage VO on OSG

top related