san disk axel rosenberg
Post on 19-Jan-2017
70 Views
Preview:
TRANSCRIPT
ExascaleArchitecturesDISAGGREGATEDSTORAGE&COMPUTE
DirectorISV&StrategicPartners
Axel–C.Rosenberg
TheConsequencesofInfiniteStorageBandwidth
CreaAonofaGlobalLeaderinStorageTechnologyEnhancedscaleanddiversitystrengthensabilitytocaptureopportuniAesinanevolvinglandscape
1 LTM revenues based on most recent public filings and Wall Street research; Western Digital and SanDisk LTM as of 7/1/2016; Toshiba represents March 2016 LTM revenue.
$17,8 $15,9
$11,2 $11,2 $10,3
$5,5 $4,8 $3,4
$2,5
LTM
Rev
enue
1
(Information Storage) (NAND) (NAND) (NAND)
(Storage & Memory) (NAND)
©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved. 3
LeaderinLow-LatencyFabrics&DriverTechnologies
©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.ConfidenAal. 4
• Demonstratedindustry-leadingnextgeneraAonlowlatencynetworkingtechnologiessuitableforemergingNVMs
• LowlatencyRDMAEthernetfabricsandprotocols:NVMeoverfabrics,RDMAfabricsforemergingNVMsandNVMefabricsstorage/memoryappliances
• EnablinguseofemergingNVMinnetworked(datacenter)environment
• LinuxopensourcedriverimplementaAonensuresflexiblesupportforfutureproducts
NVMeoverfabricsLinuximplementaAon–7uslatency
RDMAtoPCIememorymappedReRAMasfastasDRAM!
5
Driver:BifurcaAonofData
FastData
Big Data
DATA
TransacAons
DATA
DeepArchive
TransacAons/Sensors/Logs
StreamingAnalyAcs
DeepArchive
AcAveArchive
BatchAnalyAcs
INSIGHTS
DATA
OldWorld NewWorld
©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.
§ CapacityHDDsincreasinglybeingdirectedatworkloadsthatarebyte-richandaccess-poor(coldAer)
§ Expandingroleofflashbeyondcachingtoprimarystorage
§ Datatemperaturerisingeveryyear
Principlesforchoosingmediabyworkload
Sourceofworkloadcharacteris2csand2008,2013lines:ArefM.(Google)Sourceof2018,2020break-evenline:calcula2onbyPankajM.(CTO)andFadiA.(ES)Sourceofaccessratebyageofdata:Kestu2sP.(Facebook)
©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.
Driver:SoPwareDefinedStorage
7©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.
Fabrics(akaNetworks)
Network,Storage,&DRAMtrendsLog scale
• UseDRAMBandwidthasaproxyforCPUthroughput
• ReasonableapproximaAonforDMAandpoorcacheperformanceworkloads(e.g.Storage)
Big difference in slope!
©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.
Network,Storage,&DRAMtrendsLinear scale
Infinite Storage Bandwidth• Samedataaslastslide,but
fortheLog-impaired
• StorageBandwidthisnotliterallyinfinite
• Butthera0oofNetworkandStoragetoCPUthroughputiswideningveryquickly
©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.
SSDBW∝ NetworkBW(~10SSDsperport)BW/TB∝Constant(0.25GB/sperTB)
1
10
100
1000
10000
100000
1000000
2004 2006 2008 2010 2012 2014 2016 2018 2020 2022Year
SSDspeedMB/s
NetworkspeedMB/s
SSDDensity100sofMB
GB/s/TB NetworkSpeed/SSDspeed
BitsoverFabrics?
©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.
Concept:DisaggregaAonofStorage
12©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.
INFINIFLASH
Winning platinum in Storage Insider IT Awards 2015
InfiniFlash™
§ AllFlash§ Only3RU§ 64TBto512TBRaw§ DisrupAveCost
©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.
§ StorageSohwareunbundlingfromHardware
§ ExplosionofSDSofferingsinrecentyears
§ ExplosionofSDSdeploymentsinrecentyears
§ SDSChangestheresponsibiliAes,notthetechnology
SoPwareDefinedStorage(SDS)–What’snew?
§ Storageperformanceishugelyaffectedbyseeminglysmalldetails
§ AllHWisnotequal–Switches,NICs,HBAs,SSDsallmajer• DriversabstracAondoesn’thidedynamicbehavior
§ AllSWisnotequal–Distro,Patches,Drivers,ConfiguraAonmajer
§ Typicallylargedeltabetween“default”and“tuned”systemperf
§ What’sausertodo?
SoPwareDefinedStorage–what’sNOTnew
§ CPU–Corecounts,ClockSpeeds,CacheSizes
§ DRAM–cachesizes
§ Network–wirespeed,RDMAcapability
§ Storage–Redundancyfordurabilityandavailability• Storageredundancyforavailabilityisprimarilyanetworkproblem
• Howmuchredundancyfordurabilityisrequired?
ControllingStorageSystemCosts
§ AnnualizedFailureRate(AFR)ofFlashissuperiortoHDD
§ MeasuredHDDAFR≈1.7%(1styear),≈8%(3rdyear)1
§ MeasuredFlashAFR≈%0.612orevenless[0.1%..0.5%]3
§ For100PBofrawstoragein8TBDrives(HDD&Flash)
§ WeeklyFailureratesofHDD→upto19Drives/Week
§ WeeklyFailureratesforFlash→1.4cards/Week
What’sdifferentaboutFlash?
1“FailureTrendsinLargeDiskDrivePopula2on”,Feb2007,Google
2hjp://www.intel.com/content/dam/doc/technology-brief/intel-it-validaAng-reliability-of-intel-solid-state-drives-brief.pdf
3hjp://techreport.com/review/26269/behind-the-scenes-with-intel-ssd-division
§ MeanTimetoDataLoss(MTTDL)iswhatyouactuallycareabout
§ ToconvertAFRintoMTTDLyoumustincluderepairAme(MTTR)
§ RepairoperaAonsdegradenormaloperaAons• HDDrebuildsarehighlydetrimentaltooperaAons(I/Oblender)
• FlashrebuildsonlymodestlyaffectoperaAons
§ HDDrebuildAmesusuallylimitedbyoperaAonaldegradaAon
§ FlashrebuildAmesusuallylimitedbynetworkandCPU
FailureRatedoesn’ttellthewholestory
§ Newdeploymentarchitectureforanoldidea–RAID§ TradeoffofmulApleparameters
• StorageEfficiency• ParityComputaAonCost• RebuildCost• Performancewhendegraded
§ TypicallythegoalisaspecificMTTDLforthebestcost
WhatisErasureCoding?
ErasureCodingData0-7
Data0 Data1 Data2 Data3 Data4 Data5 Data6 Data7 Parity0 Parity1K=8 M=2
K=NumberofDataChunksM=NumberofParity(Syndrome)ChunksHerewehave8+2=10Chunkstobeplacedindifferentfaultdomains
ParityComputaAon
§ TradiAonalHDDRAID-5/6Performspoorly• ComputaAonsforparitygeneraAon
• IncreasedseeksduetoaddiAonalparitywrites(2-3xforrandomwrite)
§ ModernFlashECperformancesufficient• CPUsnowopAmizedforstorage(mulA-core,bejerinstrucAonsets)
• Flasheliminatesseekpenalty
§ Singleservercaneasilysupport>1.5GB/secofwriteencodingBW*
ErasureCodingPerformance
*RGW4MBObjectWrites(YCSB),K=4,M=2,Cauchy-goodusing2xE5-26802.8GHz8x16GBRDIMM
§ With1nodedown,~3GB/SecofreadBWavailable/Server*
§ With2nodesdown,~2GB/SecofreadBWavailable/Server*
§ Rebuildcaneasilyajainfulldevicespeed• Rebuildsarereadintensivewhichisflashfriendly
§ Cephonlyrebuildsin-usedatatofurtherreducerebuildAmes
§ DegradedoperaAonsapplyduringavailabilityoutagestoo!!
PerformancewithErasureCodingwhiledegraded
*RGW4MBObjectReads(YCSB),K=4,M=2,Cauchy-goodusing2xE5-26802.8GHz8x16GBRDIMM
§ RepairRate+DegradedOperaAonalDemands≤TotalBW&IOPS
§ TotalBW&IOPSdependentonoperaAonalaccesspajerns• BW&IOPSduringdegradedoperaAonsmuchlessthannormaloperaAons
• RepairoperaAonstypicallydegradeoperaAonssignificantly
§ Mustchoosebetween:• PrioriAzingoperaAonsoverrebuild→ ↓MTTDL
• PrioriAzingrebuildoveroperaAons→ ↓appperformance
• ProvisioningenoughBW&IOPStocoverboth→ ↑cost
MeanTimeToRepair(HDD)
§ 8TBHDD
§ Rebuildlimitedto3Days(MTTR=72hours)*
§ WithAFR(8TBHDD)=1.7%
§ 3xReplicaAon→MTTDLof4x1012Hours
§ WithAFR(HDD)=8%
§ 3xReplicaAon→MTTDLof3.5x1010Hours
§ 8TBSSD
§ RebuildlimitedbywriteBW(8.9Hours)
§ WithAFR(8TBSSD)=0.61%
§ 2xReplicaAon→MTTDLof1.1x1011Hours
MeanTimeToDataLoss(HDD3xReplicaAon)
• Assumes100%(8TB)rebuildrequired
§ 8TBSSD
§ RebuildlimitedbywriteBW(250MB/Sec)
§ WithAFR(8TBSSD)=0.61%
§ 8+1ErasureCoding→MTTDLof7.8x109Hours(MTTR8.9Hours)*
§ 8+2ErasureCoding→MTTDLof1.2x1013Hours(MTTR17.8Hours)*
§ 16+2ErasureCoding→MTTDLof1.9x1012Hours(MTTR17.8Hours)*
§ 16+4ErasureCoding→MTTDLof2.0x1018Hours(MTTR35.6Hours)*
MeanTimeToDataLoss(Flash+EC)
*4drivefailuresfor16+4EC,2drivefailuresforx+2EC,1drivefailurefor8+1EC.Assumes100%rebuildrequired
§ Flashw/EChassuperiordurability&availabilityvsHDDreplicaAon
§ Flashw/ECreducesstorageoverheadfrom3xto1.1x
§ Forlargescaledeployment,increasedcostofCPUmorethanbalancedbyreducednetworkandstoragecosts
ErasureCodingSummary
Whathappensaswegetclosertothelimit?
§ NewDenserServerFormFactors• Blades• Sleds
§ GoodshorttermsoluAons
Let’sGetSmall!
§ StorageCost=Media+Access+Management
§ Sharednothingarchitectureconflatesaccessandmanagement
§ StoragecostswillbecomedominatedbyManagementcost
§ StoragecostsbecomeCPU/DRAMcosts
EffectsOfTheCPU/DRAMBoeleneck
§ MovemanagementtoupperlayerswhereCPUcanberight-sizedbyclient
§ WhatkindofmediaaccessdoIwant?• SimpleenoughfuncAonalitytobedonedirectlyindrivehardware–NOCPU• Allowdirectaccessthroughoutthecomputeclusteroveranetwork• Justenoughmachinerytoenablecoarse-grainedsharing
EmbracingTheCPU/DRAMBoeleneck
§ Inshort,youreallywantaSAN!– Ormoretechnically,FabricConnectedStorage
NotYourFather’sSAN§ ThreeproblemswithcurrentSAN
• Fibrechanneltransport• SCSIaccessprotocol• DriveorientedstorageallocaAon
§ Allofthesewanttobeupdated• Fibrechannelisbrijleandcostly
• SCSIiniAatorshavelongcodepathscateringtoseldomusedconfiguraAons
• Robustsub-drivestorageallocaAon
SAN2.0§ NVMeoverFabrics
§ 1.0Specisout
§ SimpleenoughfordirecthardwareexecuAonofdatapathops
§ MinimaliniAatorcodepathlengthsimproveperformance
§ Namespacesallowsub-driveallocaAons
§ Notmatureenoughforenterprisedeployment–yet
§ Soon,NICswillforwardNVMeoperaAonstolocalPCIedevices
§ CPUremovedfromthesoQwarepartofthedatapath
§ CPUissAllneededforthehardwarepartofthedatapath
§ IOPSimprove,BWisunchanged
§ SignificantCPUfreedforapplicaAonprocessing
§ GeRngclosertothewall!
SecondGeneraAonSAN2.0
§ NewgeneraAonofcombinedSSDcontrollerandNIC• RethinkofinterfaceseliminatesDRAMbuffering
§ Networkgoesrightintothedrive
§ NoCPUtobefound
§ Workswellwithrackscalearchitecture
ThirdGeneraAonSAN2.0,Imagined
§ Disaggregated/RackScaleArchitecture• Fabricconnected• Independentlyscalecompute,networkingandstorage
Let’sGetReally Small
What’sItAllMean?§ Newformfactorsareineverybody'sfuture
§ Thecomingavalancheofstoragebandwidthwantstobefree• NotimprisonedbyaCPU
§ RackScaleArchitectureallowsnewStorage/Computeconfigs
§ Storagewillbeincreasingly“SohwareDefined”astheHWevolves
DataCenterSoluAons 37
InfiniFlashIF150
8TBFlash-CardInnovaAon• EnterpriseGradePower-FailSafe• Latchingintegrated&monitored• Directlysamplesairtemp• FormfactorenableslowestcostSSD
Non-disrupAveScale-Up&Scale-Out• Capacityondemand
• ServehighgrowthBigData• 3UchassisstarAngat64TBupto
512TB• 8to648TBFlashCards(SAS)
• Computeondemand• ServedynamicappswithoutIOPS/
TBbojlenecks• Addupto8servers
DataCenterSoluAons 38
DisaggregaAonistheKeytoBreakthroughEconomics
OldModel§ Monolithic
§ ProprietarystorageOS
§ Costly:$$$$$
NewModel§ Disaggregated§ SohwareDefinedStack§ Green!§ Highperformance§ CosteffecAve§ Flexible
Standardx86Servers
InfiniFlashHW
SOFTWAREDEFINEDSTORAGE
DataCenterSoluAons 39
SanDiskFlashStart™andFlashAssure™• InstallaAonandTrainingServices• 24/7,GlobalonsiteTSANET
CollaboraAveSoluAonsSupport• 2hrPartsdelivery–750+global
locaAons
SohwareDefinedAll-FlashStoragetheDisaggregatedModelforScale
SanDiskFlash• SharedFlashStorageèINFINIFLASH
• FlashinServer
SW#Choice
FlashSoft ION#Accelerator
ComputeChoice
FlashSoQ
DataCenterSoluAons 40
IF100+SuperMicro+Ceph:Scale-OutSoluAon
40
Block&Object
&§ Ultra-denseHighCapacityFlashstorage
– 512TBin3U,Scale-outsohwareforPBscalecapacity
§ Highlyscalableperformance– IndustryleadingIOPS/TB
§ Cinder,GlanceandSwiPstorage– Add/removeserver&capacityon-demand
§ Enterprise-Classstoragefeatures– AutomaAcrebalancing
– HotSohwareupgrade
– Snapshots,replicaAon,thinprovisioning
– Fullyhotswappable,redundant
§ CephOpAmizedforSanDiskflash– Tuned&HardenedforInfiniFlash
InfiniFlashIF500All-FlashStorageSystemBlockandObjectStoragePoweredbyCeph
DataCenterSoluAons 41
2016InfiniFlashCustomerMomentum(par2allis2ng)Customer VerAcal ApplicaAon/Plamorm SoluAonDomain
USUniversity EducaAon SpectrumScale(GPFS)
CLOUD
Intl.CreditCard FinancialServices Oracle DATABASES,ANALYTICS
USBank FinancialServices Vmware,OpenStack VIRTUALIZATION
GlobalLeadingISV Tech EnterpriseCloud CLOUD
CompuGroup(Germany/US) Healthcare/LifeSciences DataCore CLOUD
GlobalOnlineShop OnlineCommerce NoSQL BIGDATAANALYTICS
MajorLeagueBaseball Media&Entertainment Tegile BIGDATAMEDIA
JapaneseTelco Telco MapR BIGDATAANALYTICS
USTelco Media&Entertainment OpenStack BIGDATAMEDIA
GleSys(CSPSweden) CloudServiceProvider Nexenta CLOUD
Intl.Analyst Financial OpenStack BIGDATA&CLOUD
CanadianBroadcasAngCorp Media&Entertainment Nexenta BIGDATAMEDIA
DataCenterSoluAons 42c©2015SanDiskCorporaAon.Allrightsreserved.SanDiskisatrademarkofSanDiskCorporaAon,registeredintheUnitedStatesandotherCountries.InfiniFlashandSanDiskIONAcceleratoraretrademarksofSanDiskCorporaAon.OtherbrandnamesmenAonedhereinareforidenAficaAonpurposesonlyandmaybethetrademarksoftheirrespecAveholder(s).
top related