brados: declarative,programmable object storagemsevilla/papers/watkins-socc17.pdf · brados:...

1
Brados: Declarative,Programmable Object Storage Noah Watkins, Michael Sevilla, Ivo Jimenez, Neha Ohja, Peter Alvaro, Carlos Maltzahn Tradi&onal Applica&on Storage Interfaces Distributed Storage Emerging Applica&ons Emerging applica&ons are integra&ng into the en&re storage stack, construc&ng domain- specific interfaces, and reusing services. Clear, direct applica&on seman&cs Control over low-level data layouts Open-source storage systems are exposing internal services to applica&ons Ceph and RADOS provide numerous domain-specific interfaces In-produc&on interfaces support high-profile applica&ons (e.g. OpenStack) Beginning to see third-party interface contribu&ons Category Specializa0on Methods Locking Shared Exclusive 6 Logging Replica State Timestamped 3 4 4 Garbage Collec&on Reference Coun&ng 4 Metadata Management RBD RGW User Version 37 27 5 5 1 2 3 4 5 6 7 8 9 10 11 12 ... Ceph / RADOS LevelDB RocksDB WiredTiger BlueStore Log Striping [1] Balakrishnan, et. al, “CORFU: A Shared Log Design for Flash Clusters”, NSDI 2012 Driving example is ZLog, an implementa&on of the CORFU [1] high-performance shared-log protocol on top of soaware-defined storage. Service reuse: replica&on and erasure coding Transparent upgrades and &ering Explore new interface implementa&ons Large Design State Space Exis&ng approaches to extensibility rely on hard-coded interfaces and data layouts. A large design space complicates development and upgrade decisions. Rela%ve performance difference between two versions of Ceph using different storage strategies. Developer may have selected non-op%mal solu%on in older version. Two implementa%ons of the same interface may have up to an order of magnitude difference in append performance across log entry sizes. When the size of the design space is large automated techniques to generate physical designs are needed. Storage Abstrac0ons Are Changing Storage System Programmability in the Wild bloom do # epoch guard invalid_op <= (op * epoch).pairs{|o,e| o.epoch <= e.epoch} valid_op <= op.notin(invalid_op) ret <= invalid_op{|o| [o.type, o.pos, o.epoch, 'stale']} # op's position found in log found_op <= (valid_op * log).lefts(pos => pos) notfound_op <= valid_op.notin(found_op) # demux on operation type write_op <= valid_op {|o| o if o.type == 'write'} seal_op <= valid_op {|o| o if o.type == 'seal'} end bloom :write do temp :valid_write <= write_op.notin(found_op) log <+ valid_write{ |o| [o.pos, 'valid', o.data]} ret <= valid_write{ |o| [o.type, o.pos, o.epoch, 'ok'] } ret <= write_op.notin(valid_write) {|o| [o.type, o.pos, o.epoch, 'read-only'] } end bloom :seal do epoch <- (seal_op * epoch).rights epoch <+ seal_op { |o| [o.epoch] } temp :maxpos <= log.group([], max(pos)) ret <= (seal_op * maxpos).pairs do |o, m| [o.type, nil, o.epoch, m.content] end end Brados is a declara%ve language based on Bloom (Alvaro, CIDR ’11) that is used to express storage interfaces. Shown above is a snippet of the specifica%on of the CORFU protocol. Op%miza%on techniques are applied to generate an implementa%on. Dataflow analysis Performance sta%s%cs from storage system Op%miza%on Plan genera%on Declara0ve Language Example Service : Distributed Shared-Log This work is par&ally supported by a CROSS research appointment. For more informa&on about CROSS please visit hbp://cross.ucsc.edu . The Zlog project an an open-source project published at hbps://github.com/noahdesu/zlog . You can contact the author Noah Watkins at [email protected] . Large configura&on space of hardware and soaware components. Custom interface and physical design space

Upload: hoangdien

Post on 19-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Brados: Declarative,Programmable Object Storagemsevilla/papers/watkins-socc17.pdf · Brados: Declarative,Programmable Object Storage Noah Watkins, ... Ceph / RADOS LevelDB RocksDB

Brados: Declarative,Programmable Object Storage Noah Watkins, Michael Sevilla, Ivo Jimenez, Neha Ohja, Peter Alvaro, Carlos Maltzahn

Tradi&onalApplica&on

StorageInterfaces

DistributedStorage

EmergingApplica&ons

Emerging applica&ons are integra&ng into theen&re storage stack, construc&ng domain-specificinterfaces,andreusingservices.

•  Clear,directapplica&onseman&cs•  Controloverlow-leveldatalayouts

•  Open-source storage systems areexposing internal services toapplica&ons

•  CephandRADOSprovidenumerousdomain-specificinterfaces

•  In-produc&on interfaces supporthigh-profile applica&ons (e.g.OpenStack)

•  Beginning to see third-partyinterfacecontribu&ons

Category Specializa0on Methods

Locking SharedExclusive 6

LoggingReplicaStateTimestamped

344

GarbageCollec&on ReferenceCoun&ng 4

MetadataManagement

RBDRGWUserVersion

372755

1 2 3 4 5 6 7 8 9 10 11 12 ...

Ceph/RADOSLevelDBRocksDBWiredTigerBlueStore

LogStriping

[1]Balakrishnan,et.al,“CORFU:ASharedLogDesignforFlashClusters”,NSDI2012

Driving example is ZLog, an implementa&on of theCORFU[1]high-performanceshared-logprotocol ontopofsoaware-definedstorage.

•  Servicereuse:replica&onanderasurecoding•  Transparentupgradesand&ering•  Explorenewinterfaceimplementa&ons

LargeDesignStateSpaceExis&ngapproachestoextensibilityrelyonhard-codedinterfacesanddatalayouts.Alargedesignspacecomplicatesdevelopmentandupgradedecisions.

Rela%ve performance difference between two versions of Ceph usingdifferent storage strategies. Developer may have selected non-op%malsolu%oninolderversion.

Two implementa%onsof thesame interfacemayhaveuptoanorderofmagnitudedifferenceinappendperformanceacrosslogentrysizes.Whenthe size of the design space is large automated techniques to generatephysicaldesignsareneeded.

StorageAbstrac0onsAreChanging StorageSystemProgrammabilityintheWild

bloom do # epoch guard invalid_op <= (op * epoch).pairs{|o,e| o.epoch <= e.epoch} valid_op <= op.notin(invalid_op) ret <= invalid_op{|o| [o.type, o.pos, o.epoch, 'stale']} # op's position found in log found_op <= (valid_op * log).lefts(pos => pos) notfound_op <= valid_op.notin(found_op) # demux on operation type write_op <= valid_op {|o| o if o.type == 'write'} seal_op <= valid_op {|o| o if o.type == 'seal'} end

bloom :write do temp :valid_write <= write_op.notin(found_op) log <+ valid_write{ |o| [o.pos, 'valid', o.data]} ret <= valid_write{ |o| [o.type, o.pos, o.epoch, 'ok'] } ret <= write_op.notin(valid_write) {|o| [o.type, o.pos, o.epoch, 'read-only'] } end bloom :seal do epoch <- (seal_op * epoch).rights epoch <+ seal_op { |o| [o.epoch] } temp :maxpos <= log.group([], max(pos)) ret <= (seal_op * maxpos).pairs do |o, m| [o.type, nil, o.epoch, m.content] end end

Bradosisadeclara%velanguagebasedonBloom(Alvaro,CIDR’11)that isusedtoexpressstorageinterfaces. Shown above is a snippet of the specifica%on of the CORFU protocol. Op%miza%ontechniquesareappliedtogenerateanimplementa%on.

•  Dataflowanalysis•  Performancesta%s%cs

fromstoragesystem•  Op%miza%on•  Plangenera%on

Declara0veLanguage

ExampleService:DistributedShared-Log

Thisworkispar&allysupportedbyaCROSSresearchappointment.Formoreinforma&onaboutCROSSpleasevisithbp://cross.ucsc.edu.TheZlogprojectananopen-sourceprojectpublishedathbps://github.com/noahdesu/[email protected].

Largeconfigura&onspaceofhardwareandsoawarecomponents.

Custominterfaceandphysicaldesignspace