cyrus eurosys

Upload: gonzalo-martinez

Post on 06-Jul-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 Cyrus Eurosys

    1/16

    CYRUS: Towards Client-Dened Cloud Storage

    Jae Yoon ChungDept. of CSE

    [email protected]

    Carlee Joe-WongPACM

    Princeton [email protected]

    Sangtae HaDept. of CS

    University of [email protected]

    James Won-Ki HongDept. of CSEPOSTECH

    [email protected]

    Mung ChiangDept. of EE

    Princeton [email protected]

    AbstractPublic cloud storage has recently surged in popularity. How-ever, cloud storage providers (CSPs) today offer fairly rigidservices, which cannot be customized to meet individualusers’ needs. We propose a distributed, client-dened ar-chitecture that integrates multiple autonomous CSPs intoone unied cloud and allows individual clients to specifytheir desired performance levels and share les. We design,implement, and deploy CYRUS (Client-dened privacY-protected Reliable cloUd Service), a practical system thatrealizes this architecture. CYRUS ensures user privacy andreliability by scattering les into smaller pieces across mul-tiple CSPs, so that no one CSP can read users’ data. We

    develop an algorithm that sets reliability and privacy param-eters according to user needs and selects CSPs from which todownload user data so as to minimize latency. To accommo-date multiple autonomous clients, we allow clients to uploadsimultaneous le updates and detect conicts after the factfrom the client. We nally evaluate the performance of aCYRUS prototype that connects to four popular commer-cial CSPs in both lab testbeds and user trials, and discussCYRUS’s implications for the cloud storage market.

    1. IntroductionCloud computing, driven in large part by business stor-age, is forecast to be a $240 billion industry in 2020 [ 15],with 3.8 billion personal storage accounts in 2018 [ 31]. YetPermission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor prot or commercial advantage and that copies bear this notice and the full citationon the rst page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specic permissionand/or a fee. Request permissions from [email protected]’15 , April 21–24, 2015, Bordeaux, France.Copyright is held by the owner/author(s). Publication rights licensed to ACM.ACM 978-1-4503-3238-5/15/04.. . $15.00.http://dx.doi.org/10.1145/2741948.2741951

    even as storing les on the cloud becomes ever more pop-ular, privacy and reliability concerns remain. Cloud storageproviders (CSPs) have experienced high-prole data leaks,e.g., credit card information from Home Depot [32] andcelebrity photos from iCloud [ 39], as well as outages: Ama-zon’s popular S3 service went down for almost an hour in2013, affecting websites like Instagram and Flipboard [ 37];and Dropbox went down for three hours in 2014 [ 18].

    These performance deciencies are especially serious forenterprise le sharing, a fast growing market [ 21] that re-quires high reliability and rapid le synchronization. Filesharing presents unique challenges to cloud storage, as dif-ferent clients (e.g., different users or multiple devices be-

    longing to one user) must be able to securely retrieve the lat-est versions of the le from the cloud. Further complicatingthis need is the fact that some clients may not always be ac-tive, preventing reliable client-to-client communication. Wecan identify several requirements for effective le sharing:

    • Privacy: Users wish to prevent other parties, includingcloud providers, from accessing their data.

    • Reliability: Users must be able to access data stored inthe cloud at all times.

    • Latency: To effectively share les, users must be able toquickly retrieve the latest le versions from the cloud andupload edited versions in real time.

    The performance criteria above may sometimes conict,e.g., greater privacy may require more client-cloud com-munication and worsen the latency of le updates. We there-fore seek a system design that allows clients to freely navi-gate these tradeoffs according to their own needs and prior-ities. To this end, we propose a cloud storage system calledCYRUS (Client-dened privacY-protected Reliable cloUdStorage) that satises the above requirements by leveraginguser accounts at multiple clouds (e.g., private storage servers

  • 8/17/2019 Cyrus Eurosys

    2/16

    GoogleDrive

    Dropbox Enterprise

    server

    CYRUS CYRUS

    Box

    Client Control

    CYRUS

    Client Control

    GoogleDrive

    Figure 1: CYRUS allows multiple clients to control lesstored across multiple CSPs.

    or user accounts at CSPs like Dropbox and Box). CYRUScontrols the distribution of data across different CSPs fromuser devices, allowing users to customize their achieved per-formance requirements and trade off between them as nec-essary. Figure 1 illustrates CYRUS’s basic architecture, withmultiple users sharing les stored at multiple CSPs.

    1.1 Distributed Cloud Storage Architectures

    Leveraging multiple CSPs represents an architectural changefrom the usual Infrastructure-as-a-Service (IaaS) model,which rents customized virtual machines (VMs) to users.While some CSPs offer software-dened service level agree-ments with customized performance guarantees [ 22], theseapproaches fundamentally leave control at the CSP. Clientscontrol their VMs or specify performance through softwareon the cloud and are still limited by the CSP’s functionality.Yet CYRUS does not consist solely of autonomous clientseither, unlike peer-to-peer systems that allow each clientnode to leverage resources at other connected nodes for stor-age or computing. CYRUS does not treat CSPs as peers and

    does not assume that clients can directly communicate witheach other; instead, the CSP “nodes” become autonomousstorage resources separately controlled by the clients.

    CYRUS’s distributed, client-controlled architecture en-forces privacy and reliability by scattering user data to mul-tiple CSPs, so that attackers must obtain data from multipleCSPs to read users’ les; by building redundancy into thele pieces, the le remains recoverable if some CSPs fail.File pieces can be uploaded and downloaded in parallel, re-ducing latency. Since the client controls the le distribution,clients can choose where to distribute the le pieces so as todene customized privacy, reliability, and latency.

    CYRUS is not the rst work to recognize the benetsof a client-dened approach to cloud storage. DepSky, a“cloud-of-clouds” system, also controls multiple CSPs fromthe client [ 7]. Yet while DepSky demonstrates a proof-of-concept system with Byzantine quorum system protocols,it does not fully address the practical challenges of real-izing such client-controlled storage for large-scale deploy-ment. In particular, users wishing to share les can havestringent latency requirements, which they may wish to tradeoff with privacy and reliability. For instance, higher privacyrequirements would require downloading le pieces from

    slower clouds. CYRUS allows full privacy and reliabilitycustomization and optimizes latency in realizing a deploy-able and customizable storage system.

    CYRUS’s design addresses four challenges not fully cov-ered in previous work:

    • Heterogeneous CSP le handling: CSPs have differ-ent le management policies and APIs, e.g., trackingles with different types of IDs and varied support forle locking. To accommodate these differences, CYRUSuses only the most basic cloud APIs. This limitationmakes it particularly difcult to support multiple clientstrying to simultaneously update the same le; unlike con-ict resolution protocols such as Git [ 2], CYRUS must beable to resolve conicts without server-side support. Ourapproach is inspired by large-scale, performance-focuseddatabase systems [5, 11, 13]. We allow clients to uploadconicting les and then resolve conicts as necessary.Other approaches require more client-CSP communica-tion, increasing latency [ 7].

    • Sharing les between clients: CYRUS’s client-based ar-chitecture and the lack of reliable client-to-client com-munication make sharing les between clients difcult:clients need to share information on each le’s storagecustomization. We avoid storing this data at a centralizedlocation, which creates a single point of failure, by scat-tering le metadata among CSPs.

    • CSP infrastructure sharing: Some CSPs may share in-frastructure, e.g., Dropbox running its software on Ama-zon’s servers [16]. Storing les at CSPs with the samephysical infrastructure makes simultaneous CSP failuresmore likely. To maximize reliability, CYRUS infers thesesharing relationships and uses them in selecting the CSPswhere les are stored.

    • Optimizing le transfer delays: Client connections todifferent CSPs have different speeds. Thus, CYRUS sig-nicantly reduces latency by optimally choosing theCSPs from which to download le pieces, subject toprivacy and reliability constraints on the number of lepieces needed to reconstruct the le.

    We discuss other works related to distributed, client-controlled cloud storage in Section 2.

    1.2 Research ContributionsIn designing, implementing, and deploying CYRUS, wemake the following research contributions:

    Client-based architecture integrating multipleCSPs (Sec-tion 3): CYRUS’s software, located at client devices, scat-ters les across different clouds so that no one CSP canaccess them, but the le can be recovered if some CSPs fail.CYRUS is CSP-agnostic, requiring only basic APIs at eachCSP, and supports multiple clients (i.e., devices) concur-rently accessing the same les with no increase in latency.

  • 8/17/2019 Cyrus Eurosys

    3/16

    These devices may be owned by different users, allowingthem to share data .1

    Customizable reliability, privacy, and performance (Sec-tion 4): Users can congure reliability and privacy levelsby specifying the number of CSPs to which different lesshould be scattered and the number of pieces required to re-

    construct the le. We infer CSPs sharing cloud platforms andselect those that do not share infrastructure to maximize re-liability. Users select the CSPs from which to download thele pieces in parallel so as to minimize latency.

    CYRUS operations on multiple clouds and clients (Sec-tion 5): Users build a CYRUS cloud by calling a standardset of APIs. CYRUS transparently aggregates le operationson multiple CSPs, internally storing and tracking le meta-data and CSP storage locations for efcient operations.

    Prototype implementation and deployment (Sections 6and 7): We implemented a prototype of CYRUS on MacOS X and Windows .2 We evaluate its performance on fourcommercial CSPs and show that CYRUS outperforms other

    CSP integration services in real-world scenarios. We alsopresent results from a small pilot trial in the U.S. and Korea.

    Discussion of market implications (Section 8): Whilemany users today only use storage from one CSP to avoidmanaging multiple CSP accounts, CYRUS encourages usersto use more CSPs for greater privacy and reliability. CYRUSthus evens out demand across CSPs and makes it easier formarket entrants to gain users.

    2. Related WorkByzantine fault-tolerance systems have been studied forreliable distributed storage [10, 17, 25–27]. These worksproposed protocols that can read and write with unreliableCSPs. Realizing these protocols, however, requires runningservers at CSPs, which is not generally applicable to cloudservices based on HTTP(S) REST APIs. Similarly, object-based distributed storage systems [ 9, 40] also require run-ning code at the servers to handle multiple transactions andsynchronization. In these systems, user data are split intosmaller objects and distributed on multiple object storages.To reconstruct les, the systems maintain metadata, includ-ing the required objects and their storage locations.

    Another approach to integrating multiple CSPs is to useproxy servers [3, 19, 30, 35, 38]. The proxy can scatterand gather user data to and from multiple CSPs, providing

    transparent access for users. Yet while the proxy server is adata sharing point among multiple clients, allowing greaterdeduplication efciency, it is also a single point of failure.

    Cloud integration from the client has been proposed in[6, 7, 23, 24, 29]. CYRUS, however, provides a customizableframework that sets reliability and privacy levels while opti-mizing user delays in accessing and storing les. Moreover,

    1 In the rest of the paper, “users” and “clients” are used synonymously andcan refer to multiple devices owned by one person.2 Demo video available at http://youtu.be/DPK3NbEvdM8 .

    CYRUS allows multiple clients to upload conicting lesand resolve conicts later, which improves the transaction(e.g., le upload) speed compared to alternative approaches.Table 1 compares CYRUS with similar storage systems. Wedesign CYRUS to achieve all of Table 1’s functions withouta central server for coordinating client’s requests.

    3. CYRUS DesignWe rst highlight some important challenges in buildingCYRUS in Section 3.1 before presenting CYRUS’s systemarchitecture and API for user interaction in Section 3.2.

    3.1 Design Considerations

    CYRUS’s operations and functionalities must reside eitheron the clients, CSPs, or a combination of both. Yet in re-alizing such functionalities, we face two main challenges:heterogeneity in CSP le handling and the lack of reliable

    direct client-to-client communication .To illustrate CSP heterogeneity, Table 2 shows the APIsand achieved performance for a range of CSPs. Though mostare similar, the API implementations differ in the functionsthey provide and their le object handling. For example,Dropbox uses les’ names as their identiers, while GoogleDrive uses a separate le ID. Thus, when a client uploadsa le with existing lename, Dropbox overwrites the previ-ous le, but Google Drive does not. CYRUS accommodatessuch differences by only using basic cloud API calls: authen-ticate, list, upload, download, and delete, which are availableeven on FTP servers. We therefore shift much of CYRUS’sfunctionality to the clients, inuencing our design choices:

    Ensuring privacy and reliability: Standard replicationmethods for ensuring reliability do not require signicantclient or CSP resources but are not secure. Many encryptionmethods, on the other hand, are vulnerable to loss of the en-cryption key. CYRUS overcomes these limitations by split-ting and encoding les at the client and uploading the lepieces to different CSPs. For further security, the pieces’ le-names are hashes of information known only to the clients.Reconstructing the le requires access to pieces on multiple,but not all, CSPs, ensuring both privacy and reliability.

    Concurrent le access: Since most CSPs do not sup-port le locking, CYRUS cannot easily prevent simultane-ous le uploads from different clients: the second client willnot know that a le is being updated until the rst client n-ishes uploading. Theoretically, since POST in HTTP is notidempotent, the server status should change when the rstupdate nishes, and we could handle the conict by over-writing the rst with the second le update. However, CSPsdo not always enforce this standard. Thus, a locking or over-writing approach requires creating lock les and checkingthem after a random backoff time, leading to long delays[7]. CYRUS instead creates new les for each update andthen detects and resolves any conicts from the client.

    http://youtu.be/DPK3NbEvdM8http://youtu.be/DPK3NbEvdM8http://youtu.be/DPK3NbEvdM8

  • 8/17/2019 Cyrus Eurosys

    4/16

    Erasure DataConcurrency Versioning

    Optimal CSP Customizable Client-basedcoding deduplication selection reliability architecture

    Attasena [6] Yes No Yes No No No NoDepSky [7] Yes No Yes Yes No No YesInterCloud RAIDer [ 23] Yes Yes No Yes No No YesPiCsMu [24] No No No No No No NoCYRUS Yes Yes Yes Yes Yes Yes Yes

    Table 1: Comparison of CYRUS’s features with similar cloud integration systems.

    CSP Format Protocol Authentication RTT (ms) Throughput (Mbps)Amazon S3* XML SOAP/REST AWS Signature 235 1.349Box JSON REST OAuth 2.0 149 2.128Dropbox JSON REST OAuth 2.0 137 2.314OneDrive JSON REST OAuth 2.0 142 2.233Google Drive JSON REST OAuth 2.0 71 4.465SugarSync XML REST OAuth-like 146 2.171CloudMine JSON REST ID/Password 215 1.474Rackspace XML/JSON REST API Key 186 1.704Copy JSON REST OAuth 192 1.651ShareFile JSON REST OAuth 2.0 215 1.474

    4Shared XML SOAP OAuth 1.0 186 1.704DigitalBucket* XML REST ID/Password 217 1.461Bitcasa* JSON REST OAuth 2.0 139 2.281Egnyte JSON REST OAuth 2.0 153 2.072MediaFire XML/JSON REST OAuth-like 192 1.651HP Cloud XML/JSON REST OpenStack Keystone V3 210 1.509CloudApp* JSON REST HTTP Digest 205 1.546Safe Creative* XML/JSON REST Two-step authentication 295 1.075FilesAnywhere XML SOAP Custom 202 1.569CenturyLink XML/JSON SOAP/REST SAML 2.0 293 1.082

    Table 2: APIs and measured performance of commercial cloud storage providers. Throughput is calculated from the measuredRTT assuming a 0.1% packet loss rate and 65,535 byte TCP window size. All measurements were taken in Korea.

    Client-based architecture: To reconstruct a le, a clientneeds access to its metadata, i.e., information about whereand how different pieces of the le are stored. The easiestway to share this metadata is to maintain a central metadataserver [8], but this solution makes CYRUS dependent ona single server, introducing a single point of failure andmaking user data vulnerable to attacks at a single server.At the other extreme, a peer-to-peer based solution doesnot guarantee data accessibility since clients are not alwaysavailable. Our solution is to scatter the metadata across allof the CSPs, as we do with the les. Clients access themetadata by downloading its pieces from the CSPs; withoutretrieving information from multiple CSPs, attackers cannot

    access user data.3

    This approach ensures that CYRUS is asconsistent as the CSPs where it stores les.Selecting CSPs: CYRUS chooses the number of le

    pieces to upload or download so as to satisfy privacy andreliability requirements. When choosing which CSPs to use,CYRUS chooses CSPs on independent cloud platforms soas to maximize reliability. For instance, CSPs marked withan asterisk in Table 2 have Amazon destination IPs, so a

    3 Since we store metadata pieces at all CSPs, clients can always nd anddownload metadata pieces.

    Functionality CYRUS API callcreate a CYRUS cloud s s = create()add a cloud storage c add(s,c)remove a cloud storage c remove(s,c)get a le f of version v f’ = get(s,f,v)put a le f put(s,f)delete a le f delete(s,f)list les under a directory d [(f,r),] = list(s,d)reconstruct s s’ = recover(s)

    Table 3: CYRUS’s Application Programming Interface.

    single failure at Amazon’s datacenters could simultaneouslyimpact these CSPs, compromising reliability. The table alsoshows that different CSPs have very different client connec-

    tion speeds; thus, carefully selecting the CSPs from whichCYRUS downloads les can signicantly reduce the latencyof le transfers.

    3.2 CYRUS Architecture

    Users interact with CYRUS through Table 3’s set of APIcalls. These include standard le operations, such as upload-ing, downloading, and deleting a le, as well as le recoveryand CSP addition and removal. To realize this API whilemeeting the design challenges in Section 3.1, CYRUS must

  • 8/17/2019 Cyrus Eurosys

    5/16

    perform three main functions: integrate multiple clouds,scale to multiple clients, and optimize performance.

    Integrating multiple clouds: As explainedabove, CYRUSscatters le pieces to multiple CSPs so that no single CSPcan reconstruct a user’s data. We increase reliability by stor-ing more le pieces than are necessary for recovering the

    le. This idea is encapsulated in a (t, n ) secret sharingscheme , which divides user data into n shares, each storedon a different CSP [ 36]. Secret sharing divides and encodesthe data in such a way that reconstructing any part of theoriginal data requires at least t of the le shares. Takingt < n thus ensures reliability, and taking t > 1 ensures thatmultiple CSPs are required to recover user data. Users canreconstruct the le using the shares from any t CSPs.

    Scaling to multiple clients: To download les, clientsmust know the locations of the les’ shares. Thus, we main-tain a separate metadata le for each le stored on the cloud;as a client uploads a le, it records the share locations in thismetadata. We store the metadata in a logical tree at CSPs,

    with clients maintaining local copies of the metadata treefor efciency. All clients can sync their local copies of themetadata tree to track updated share and le locations. Thetree structure also allows CYRUS to handle conicting leupdates. Clients do not lock les while modifying them butcan upload conicting le versions as different nodes on themetadata tree. We traverse the tree to nd and resolve leconicts.

    Optimizing cost and performance: CYRUS reducesusers’ cost by limiting the amount of data that must be storedon CSPs. Before scattering les, we divide each le intosmaller discrete chunks. Unique chunks are then dividedinto shares using secret sharing, which are scattered to theCSPs. Figure 2 illustrates this division of les to chunksand chunks into shares. Since different les can use thesame chunks, deduplication reduces the total amount of datastored at CSPs, conserving storage capacity. We also limitthe amount of data that can be stored at different CSPs, e.g.,to the maximum amount of free storage capacity.

    We choose the number of shares to upload in order to sat-isfy reliability and privacy constraints, i.e., n and t for secretsharing. Adjusting these parameters allows CYRUS to adaptto changes in cloud conditions and user preferences. Afterchoosing n , we select the CSPs so that they do not share acloud platform. When reconstructing les, we minimize thelatency of downloading shares .4

    4. Optimized Cloud SelectionWe rst consider inferring which CSPs run on the samecloud platforms in Section 4.1, which helps to ensure reli-

    4 This downlink CSP selection is client-specic and does not affect otherclients’ performance: regardless of the CSPs selected for downloading theshares, any modied shares are uploaded to the same set of CSPs for otherclients to retrieve (uplink CSP selection is discussed in Section 5.3). Thus,when other clients retrieve the modied shares, they have the same set of choices for CSPs from which to download the modied shares.

    File

    Chunk Chunk … Chunk

    … Share Share

    Share

    Share Share

    Share

    Figure 2: Mapping les to shares.

    Amazon

    Datacenter

    CoreNetwork

    AccessNetwork

    Figure 3: Clustering of Table 2’s CSPs. The root and leaf nodes represent the client and CSPs respectively.

    ability by avoiding correlated CSP failures. Given this lack of correlation, we then consider how many shares to uploadto CSPs (Section 4.2) and present an algorithm for choosingthe CSPs from which to download shares in Section 4.3.

    4.1 CSP Platform Independence

    Storing shares of the same chunk at CSPs on a commoncloud platform increases the chance that client data will notbe recoverable, which is particularly serious during long-term CSP outages. To prevent this scenario, clients thathighly value reliability can cluster their CSP accounts bycloud platform and store a chunk’s shares on at most oneCSP from each cluster. Since infrastructure locations rarelychange, the clustering need only be done once and updatedwhen the client adds a new CSP account.

    Since some users will have many CSP accounts, manu-ally clustering CSPs by cloud platform is often infeasible.We can therefore infer the CSP clusters by either network probing to detect the client-CSP routing paths or by ge-

    olocation based on destination IPs. CYRUS takes the rstapproach, as geolocation can only detect approximate geo-graphical proximity, while routing traces can more preciselyidentify shared infrastructure. We use traceroute to ndthe path between a given user and each CSP and constructthe minimal spanning tree of the resulting graph. 5 Figure 3shows an example of the tree from a single client to the CSPs

    5 With some CSPs, clients connect to endpoint APIs that are separate fromthe le storage locations. We can read the subsequent internal CSP connec-tion to the le storage location to infer its true IP address.

  • 8/17/2019 Cyrus Eurosys

    6/16

    from Table 2. We hierarchically cluster the CSPs by horizon-tally cutting the tree at a given level. For example, we ndve CSPs deployed on Amazon, which are marked with as-terisks in Table 2.

    4.2 Balancing Privacy and Reliability

    Users specify CYRUS’s privacy and reliability levels bysetting the secret sharing parameters n and t for each chunk.The user rst species the privacy level by choosing t, or thenumber of shares required to reconstruct a chunk: since weupload at most one share to each CSP, t species the numberof CSPs needed for reconstruction. Taking t = 2 is sufcientto ensure that no one CSP can access any part of users’ data,but more privacy-sensitive users may specify a larger t.

    The user then species reliability in terms of an upperbound on the overall failure probability (i.e., the probabil-ity that we cannot download t shares of a chunk due to CSPfailure). The failure probability of any given CSP, which wedenote by p, is estimated using the number of consistent

    failed attempts to contact CSPs .6

    Users specify a threshold,e.g., one day, of time; if a CSP cannot be contacted for thatlength of time, then we count a CSP failure. The probabil-ity that we cannot download t shares equals the probabilitythat more than n − t CSPs fail, which we calculate to be

    t − 1s =0 C (n, s ) (1 − p)

    s pn − s . We then bound this probabil-ity below by searching for the minimum n such that

    t − 1

    s =0C (n, s ) (1 − p)s pn − s ≤ . (1)

    We nd n by increasing its value from t to its maximumvalue (the total number of CSPs or clusters); taking the

    minimum such n limits the data stored on the cloud. Thechunk share size is independent of n , so uploading n sharesrequires an amount of data proportional to n .

    4.3 Downlink Cloud Selection

    CYRUS chooses the n CSPs to which shares should be up-loaded using consistent hashing on their SHA-1 hash values.To download a le, a client must choose t of the n CSPs fromwhich to download shares. Since we choose t < n for reli-ability, the number of possible selections can be very large:suppose that R chunks need to be downloaded at a giventime, e.g., if a user downloads a le with R unique chunks.There are then C (t, n )R possible sets of CSPs, which grows

    rapidly with R . We thus choose the CSPs so as to minimizedownload completion times.We suppose that le shares are stored at C CSPs. We

    index the chunks by r = 1 , 2, . . . , R and the CSPs by6 We do not consider link failures, as CYRUS is designed to reduce theimpact of failures at CSPs. Thus, CSP failure probabilities are taken as uni-form, which is observed in practice within an order of magnitude [ 12, 34],and independent, since we choose CSPs with distinct physical infrastruc-ture. If CSPs have different failure rates, we can simply set p to be thelargest, so that we conservatively overestimate the probability of joint CSPfailures.

    c = 1 , 2, . . . , C , dening dr,c as an indicator variable forthe share download locations: dr,c = 1 if a share of chunk ris downloaded from CSP c and 0 otherwise. Denoting chunk r ’s share size as br , the total data downloaded from CSP cis r br dr,c . We use β c to denote the download bandwidthallocated to chunk c and nd the total download time

    maxc

    r

    br dr,cβ c

    . (2)

    While minimizing ( 2), the selected CSPs must satisfyconstraints on available bandwidth and feasibility (e.g., se-lecting exactly t CSPs).

    Bandwidth: The bandwidth allocated to the CSP connec-tors is restricted in two ways. First, each CSP has a maxi-mum achievable download bandwidth, which may vary overtime, e.g., as the CSP demand varies. We express these max-ima as upper bounds: β c ≤ β c for all CSPs c. Second,the client itself has a maximum download bandwidth, which

    must be shared by all parallel connections. We thus constrain

    cβ c ≤ β, (3)

    where β denotes the client’s total downstream bandwidth. 7Feasibility: CYRUS must download t of shares of each

    chunk, and can only download shares from CSPs where theyare stored. We thus introduce the indicator variables ur,c ,which take the value 1 if a share of chunk r is stored at CSPc and 0 otherwise. Our feasibility constraints are then

    cdr,c = t, dr,c ≤ u r,c . (4)

    CYRUS’s download optimization problem is thus

    miny,d,β

    y (5)

    s.t . r br dr,cβ c

    ≤ y; c = 1 , 2, . . . , C (6)

    cβ c ≤ β, β c ≤ β c ,

    cdr,c = t, dr,c ≤ ur,c (7)

    Exactly solving ( 5–7) is difcult: rst, the constraintson y are non-convex, and second, there are integrality con-straints on dr,c . We thus propose a heuristic algorithm that

    yields a near-optimal solution with low running time. More-over, our algorithm is solved online : we iteratively selectthe CSPs from which each chunk’s shares should be down-loaded, allowing us to begin downloading chunk shares be-fore nding the full solution. This allows us to outperform aheuristic that downloads shares from the CSPs with the high-est available bandwidth. In that case, all chunk shares willbe downloaded sequentially from the same t CSPs, so some

    7 Each client maintains local bandwidth statistics to all CSPs for differentnetwork interfaces.

  • 8/17/2019 Cyrus Eurosys

    7/16

    1 for η = 1 to R do2 Solve convexied, relaxed (5–7) with xed CSP selections dr,c

    for r < η ;3 Fix bandwidths βc ;4 Constrain dη,c ∈ {0, 1} for all c;5 Solve for the dr,c with xed bandwidths;6 Fix CSP selections dη,c ;

    7 endAlgorithm 1: CSP and bandwidth selection.

    Cloud Storage

    Metadata Storage

    CYRUS

    CloudCommunication

    Cloud 2

    Cloud 1

    Cloud n

    MetadataSync

    File Sync

    Chunk &Share Creation

    Efcientmetadata lookup

    Private, reliableshare transfers

    Figure 4: Decoupling metadata and le control.

    shares will wait a long time before being downloaded. Ouralgorithm downloads shares in parallel from slower CSPs.Our solution proceeds in two stages. First, we compute

    a convex approximation to ( 5–7) that does not considerthe integer constraints on dr,c . To do so, we rst convex-ify (6) by dening D r,c ≡ d

    1 / 2r,c . The resulting constraints

    r br D2r,c /β c ≤ y are then convex, with the non-convexity

    moved to the new constraints Dr,c = d1 / 2r,c . These are ap-

    proximated with linear over-estimators D̂ r,c ≥ d1 / 2r,c that

    minimize the discrepancy between D̂ 2r,c and dr,c .8 We ndthat the closest linear estimator is D̂ r,c = 3 1 / 4 dr,c / 2 +3− 1 / 4 / 2. On adding these constraints to ( 5–7) and replacing

    (6) with r br ˆD

    2

    r,c /β c ≤ y, we can solve the convexiedversion of (5–7) for the optimal y, β c , dr,c , and D̂ r,c .We then x the optimal bandwidths β c , turning (5–7)

    into a linear integer optimization problem that can be solvedwith the standard branch-and-bound algorithm. Branch-and-bound scales exponentially with the number of integer vari-ables, so we impose integer constraints on one chunk’s vari-ables at a time; thus, only C variables are integral (dr,c forone chunk r ). We constrain chunk 1’s d1 ,c variables to beintegral and re-solve (5–7), then re-solve the convex approx-imation, x the resulting bandwidths, constrain d2 ,c to beintegral, etc. Algorithm 1 formalizes this procedure.

    5. CYRUS OperationsWe now present CYRUS’s operations realizing Section 3’sarchitecture. To do so, we rst note that CYRUS’s perfor-mance depends on not only its CSP selection algorithm (Sec-tion 4) but also storage of le metadata (i.e., records of a le’s component chunks and share locations). Since themetadata is both much smaller than the actual shares and

    8 By requiring that these approximations be over-estimators, we ensure thatif the constraints max c r br D̂

    2r,c /β c ≤ y hold, then ( 6) holds as well.

    ! "#" ! "#% ! "#& '

    ! %#" ! %#% ! %#& '

    ! " ! % ! & '

    ( "

    ( %

    (

    '

    ! #

    ) "

    ) %

    ) &

    * $%&'()*

    + + % , *

    * -

    & - ( - ) . *

    + -

    & - ( - ) . *

    ) " ,- )./0( "

    ) %,- )./0( 1

    ) &,- )./0( 2

    '

    3 $

    & % '

    / * . % + 0 1 - *

    23*4-+*0& 50.+36 20.0 7%/-/890+-*

    7&%'/ 0&&%$0:%)

    Figure 5: A non-systematic Reed-Solomon erasure code.

    accessed more often, we separate le and metadata control(Figure 4). We describe our le and metadata storage in Sec-tions 5.1 and 5.2 respectively and then show how these areused to upload, download, and sync les at multiple clients.

    5.1 Storing Files as Shares

    CYRUS divides les into chunks and then divides thechunks into shares, which are stored at CSPs.Constructing le chunks: CYRUS divides a user’s le

    into chunks based on content-dependent chunking, andstores only unique chunks to reduce the amount of datastored at CSPs. When a le is modied, content-dependentchunking only requires chunks to be modied if their con-tents are changed, unlike xed-size chunking, which changesall chunks. We determine chunk boundaries using Rabin’sFingerprinting [ 33], which computes the hash value of thecontent in a sliding window wi , where i is the offset on thele ranging from 0 ≤ i < file size − window size . Whenthe hash value modulo a pre-dened integer M equals a

    pre-dened value K with 0 ≤ K < M , we set the chunk boundary and move on to nd the next chunk boundary.Dividing chunks into shares: After CYRUS constructs

    chunks from a le, it uses (t, n ) secret sharing [36] to divideeach chunk into n shares . We implement this secret sharingas a non-systematic Reed-Solomon (R-S) erasure code [ 28].As shown in Figure 5, with R-S coding the coded shares(c0 , c1 ,...) do not contain the original elements (d0 , d1 ,... ),so no data can be recovered from the le chunk with an in-sufcient number of shares. Indeed, R-S coding goes furtherthan secret sharing: it can recover a chunk’s data even if thereare errors in the t shares used to reconstruct the chunk, withn/t times storage overhead .9 To improve security, we takethe R-S code’s dispersal matrix as the Vandermonde matrixgenerated by a t-dimensional vector computed from a con-sistent hash of the user’s key string. Since R-S decoding re-quires the dispersalmatrix (Figure 5), reconstructing a chunk from a given set of shares (c0 , c1 , . . .) requires the key string.

    Naming: We name each share as H (index,H (chunk.content )) , where H is the SHA-1 and H any hash function.This naming scheme ensures that no CSP can discover the

    9 R-S codes need n times the storage space of the original data.

  • 8/17/2019 Cyrus Eurosys

    8/16

    File Metadata

    File Metadata

    ChunkMap Id, offset, length, t, n

    chunkId, cId, idx 7e3a, 2, 1

    7e3a, 5, 2 ShareMap …

    Id prevId name clientId isDelete time length

    3d841 4918e a.txt 2 false- 1392832122 49381

    7e3a, 0, 4829, 2, 3 …

    File Metadata Id prevId name

    ab3a2 4918e a.txt

    Root

    … …

    FileMap

    Id prevId name

    37e14 4918e a.txt

    … … …

    ab31, 896, 4811, 2, 3

    ab31, 1, 1 ab31, 2, 2

    Figure 6: Metadata data structures.

    index of a given share, but that it can be recovered by theclient. Moreover, each share is guaranteed to have a uniquele name, since a share’s content is uniquely determined bythe chunk contents, the share creation index, and t. Thus,we only overwrite the existing le share if its content is thesame, reducing the risk of data corruption.

    5.2 Metadata Data Structures

    CYRUS maintains metadata for each le, which stores itsmodication history and share composition. The metadataare stored as a logical tree with dummy root node (Figure 6);subsequent levels denote sequential le versions. New lesare represented by new nodes at the rst level. Each nodeconsists of three tables:

    FileMap: This table stores the le Id , or SHA-1 hash of its content, and prevId , its parent node’s ID ( prevID = 0

    for new les). We also store the clientID , indicating theclient creating this version of the le, as well as the le name,whether it has been deleted, last modied time, and size.

    ChunkMap: This table provides the information to re-construct the le from its chunks. It includes the Id , or SHA-1 hash, of each chunk in the le; offset , or positions of thechunks in the le; the chunk sizes; and the t and n valuesused to divide the chunks into shares.

    ShareMap: This table stores the shares’ CSP locations.The chunkId is the chunk content’s SHA-1 hash value, idxis the share index, and cId gives the CSP where it is stored.

    In addition to le-specic metadata, CYRUS maintains aglobal chunk table listing the chunks whose shares are stored

    at each CSP. We store the metadata les using (t, m ) secretsharing at a xed set of m CSPs. Clients maintain localcopies of the metadata tree for efciency and periodicallysync with the metadata stored at the CSPs.

    5.3 Uploading and Downloading Files

    Uploading les: Algorithm 2, illustrated in Figure 7, showsCYRUS’s procedure for uploading les. The client rst up-dates its metadata tree to ensure that it uses the correct par-ent node when constructing the le’s metadata (steps 1 and

    1 Function Upload( file )2 head = getHead( file )3 newHead = SHA1( file )4 UpdateHead( file, head, newHead )5 chunks = Chunking( file )6 for chunk in chunks do7 UpdateChunkMap( newHead, chunk )

    8 Scatter(chunk )9 end

    // Wait until uploading all chunks10 UploadMeta(getMeta( file ))11 Function Scatter( chunk )12 clouds = ConsistentHash( chunk.id )13 if chunk is not stored then14 shares = RSEncode( chunk,t, n )15 for share in shares do16 conn = clouds .next()17 conn .Requester(PUT, share )// Async event

    requester18 end19 end

    Algorithm 2: Uploading les.

    ec11

    ec11 1245

    1) Find currentle version(line 1)

    2) Check if modied(lines 2—4)

    File

    3) Chunk the le(line 5)

    5) Create and uploadshares of new chunks (lines 6—9, 13—19)

    4) Find new chunks(line 12)

    Client Cloud

    Figure 7: Uploading a le (lines refer to Algorithm 2).

    2 in Figure 7). We then divide the le into chunks (step 3)and avoid uploading redundant chunks by checking whether

    shares of each chunk are already stored in the cloud (step 4).New chunks are divided into shares, as in Section 5.1, andthe shares scattered to CSPs (step 5). CYRUS uses consistenthashing [20] to select the n CSPs at which to store shares of each chunk, allowing us to balance the amount of data storedat different CSPs and minimize the necessary share reallo-cation when CSPs are added or deleted (Section 5.5). TheCSPs are selected by mapping the hash value of the chunk content to a position on the consistent hash ring, which ispartitioned among CSPs. We then select the rst n CSPs en-countered while traversing the ring and send upload requeststo the corresponding cloud connectors. The returns of the re-quests are handled asynchronously, as explained below. Weupload the le metadata to CSPs (line 10 in Algorithm 2)only after receiving returns from all upload requests, so thatno other client will attempt to download the le before allshares have been uploaded.

    Downloading les: CYRUS follows Algorithm 3 todownload a le. First, the client checks for the latest leversion by sync-ing its metadata tree with the cloud (line2). CYRUS then identies the le’s chunks and gatherstheir shares, selecting the download CSPs using Algorithm 1(lines 3–5) and handling returns from the download requests

  • 8/17/2019 Cyrus Eurosys

    9/16

    1 Function Download( file )2 meta = downloadMeta( file )3 for chunk in meta.chunkMap do4 Gather(chunk,meta.shareMap [chunkId ])5 end

    // Wait until downloading all chunks6 if checkConict( meta ) then

    7 meta.conflict = True8 end9 updateSnapshot( file , snapshot )

    10 Function Gather( chunk, shareMap )11 clouds = OptSelect(chunk, shareMap)12 for share in chunk.shares do13 conn = clouds.next()14 conn .Requester(GET, share )// Async event

    requester15 end

    Algorithm 3: Downloading les.

    as asynchronous events (lines 12–15). Finally, CYRUSchecks for le conicts and prompts users to resolve them(lines 6–8); we elaborate on this step in Section 5.4.

    Asynchronous event handling: Due to CSPs’ differ-ent link speeds and shares’ different sizes, CYRUS re-quires asynchronous event handling for uploads and down-loads. Upon receiving response messages from CSPs (inmost cases an HTTP response message with status code),CYRUS’s cloud connectors send the events to a registeredevent receiver at the CYRUS core. The receiver can receivefour types of share transmission events: GET META, PUTMETA, GET, and PUT; and maintains three boolean vari-ables: ShareComplete, ChunkComplete, and FileComplete.ShareComplete is set to 1 if the share is successfully up-loaded or downloaded, and ChunkComplete is set to 1 if n shares are successfully uploaded or t shares successfullydownloaded. FileComplete is set to 1 if all chunks of a leare successfully uploaded or downloaded.

    5.4 Synchronizing Files

    Synchronization service: Clients sync les by detectingchanges at their local storage and CSPs. Changes at the localstorage can be detected by regularly checking last-modiedtimes and le hash values. Changes at CSPs can be seenby looking up the list of metadata les stored in the cloud,since a new metadata le is created with each le upload(Algorithm 2). As explained in Section 5.2, the metadata arestored at a xed subset of CSPs, making this lookup efcient.

    Despite regular metadata sync-ing, nonzero network de-lays make it possible for two CYRUS clients to attempt tomodify the same le at the same time. Clients cannot preventsuch conicts by locking les while modifying them; someCSPs do not support locking .10 We instead assign uniquenames to uploaded shares as described in Section 5.1 and let

    10 For instance, if two clients try to modify the same le at the same time,Dropbox allows both to create a locking le but changes one locking le’sname. Thus, only one client can delete its locking le, allowing us to createa lock. On Google Drive, however, we cannot create a lock: both clients’locking les retain their original names and can be created and deleted.

    9b9af694

    ab3a363c 3d84621d

    7dba2abd 8f456eda File roots

    Conict 1

    Conict 2

    Figure 8: Two types of le conicts.

    clients upload conicting le updates. Clients identify andresolve the resulting conicts when downloading les (line7 in Algorithm 3).

    Distributed conict detection: We identify two types of le conicts, as shown in Figure 8. First, two clients maycreate les with the same lename but different contents,resulting in different metadata le names, e.g., metadata7dba2abd and 8f456eda . Second, one client can modifythe previous version of a le due to delays in sync-ingmetadata. We then nd two child nodes from one parent, e.g.,ab3a363c and 3d84621d in the gure. Each node represents

    an independent modication of the parent node’s le.When new metadata is downloaded from the cloud, we

    check for conicts by rst checking if it has a parent node.If so, we check for the rst type of conict by searchingfor other nodes with the same lename. The second type of conict arises if the new node has a parent. We traverse thetree upwards from this node, and detect a conict if we nda node with multiple children.

    File deletion and versioning: Clients can recover pre-vious versions of les by traversing the metadata tree upfrom the current le version to the desired previous version.CYRUS also allows clients to recover deleted les by locat-ing their metadata. When clients delete a le, CYRUS marksits metadata as “deleted,” but does not actually delete themetadata le .11 Shares of the le’s component chunks areleft alone, since other les may contain these chunks.

    5.5 Adapting to CSP Changes

    Over time, a user’s set of viable CSPs may change: usersmay add or remove CSPs, which can occasionally fail. Thus,CYRUS must be able to adapt to CSP addition, removal, andfailure. We consider these three cases individually.

    Adding CSPs: A user may add a CSP to CYRUS byupdating the list of available CSPs at the cloud. Once thislist is updated, subsequently uploaded chunks can be storedat the new CSP. Moreover, the new CSP does not affectthe reliability or privacy experienced by previously uploadedchunks. Since uploading shares to the new cloud can use asignicant amount of data, we do not change the locationsof already-stored shares. Shares of the le metadata can bestored at the new CSP, with appropriate indication in thelist of available CSPs, if the user wishes to increase thereliability of the metadata storage.

    11 Since le metadata is very small, metadata from deleted les does nottake up much CSP capacity.

  • 8/17/2019 Cyrus Eurosys

    10/16

    Cloud 1 Cloud 2 Cloud 3 Cloud 4

    Chunk A A1 A2

    Chunk B B1 B2

    Chunk C C2

    2) Download share C1 to construct ale of chunks B and C

    C1

    Chunk B

    B2

    3) Construct and uploadshare B2

    Chunk C

    1) Remove Cloud 2

    Figure 9: Share migration when removing a CSP.

    Metadata Manager

    C Y R

    U S U s er I n t er f a c e

    File Constructor

    C S P

    S el e c t or

    CloudConnector

    Scatter C S P A P I s

    A u t h en t i c a t i on

    Downloader

    ChunkList

    Metales

    CSP 1

    CSPTable

    CSP 2

    CSP n Gather

    ChunkAggregator

    ShareDecoder

    Uploader ChunkConstructor

    ShareEncoder

    E v en t H

    an d l er

    Figure 10: Client-based CYRUS implementation.

    Removing CSPs and failure recovery: CYRUS detectsa CSP failure if it fails to upload shares to that CSP; oncethis occurs, CYRUS periodically checks if the failed CSP isback up. Until that time, no shares are uploaded to that CSP.CSP removal can be detected by repeated upload failuresor a manual signal from users. A failed or removed CSP ismarked as such in the list of available CSPs.

    Unlike adding a CSP, removing a CSP reduces the reli-ability of previously uploaded chunks and shares. Thus, tomaintain reliability we must reconstruct the removed sharesand upload them to other CSPs. We similarly migrate the lemetadata that was stored on the deleted CSP. However, whilethe metadata is relatively small and can be migrated withoutexcessive overhead, uploading all of the le chunk sharesrequires uploading a large amount of data. Indeed, from auser’s perspective, a cloud may fail temporarily (e.g., for afew hours) but then come back up; it is therefore impracticalto move all shares at once. Instead, we note that reliabilitymatters most for frequently accessed shares and use a “lazyaddition” scheme. Whenever a client downloads a le, wecheck the locations of its chunks’ shares. Should one of these

    locations have been removed or deleted, we create a newshare and upload it to a new CSP. Figure 9 shows the proce-dure of adding these shares: after a CSP is removed, a clientdownloads a le. If a share of one of the le’s chunks wasstored at the removed CSP, we reconstruct the share from thechunk and upload it to a new CSP.

    6. CYRUS ImplementationWe have prototyped CYRUS on three representative plat-forms (Mac OS X, Windows, and Linux) using Python and

    C, with the software architecture shown in Figure 10. Theprototype has seven main components: 1) a graphical userinterface, 2) C modules implementing content-based chunk-ing and (t, n ) secret sharing, 3) threads uploading and down-loading contents to and from CSPs, 4) a metadata managerthat constructs and maintains metadata trees, 5) an event

    handler for upload and download requests, 6) a cloud selec-tor selecting CSPs for uploading and downloading shares,and 7) cloud connectors for popular commercial CSPs. OurCYRUS prototype consists of 3500 lines of Python code.

    Our implementation integrates the CYRUS APIs (Ta-ble 3) and provides a graphical interface for people to useCYRUS. To increase efciency, we use C modules to im-plement le chunking (Rabin’s ngerprinting) and dividingchunks into shares (Reed-Solomon coding). We implementthe cloud selection algorithm in Python.

    To ensure transparency to different CSPs, we createa standard interface to map CYRUS’s le operations tovendor-specic cloud APIs. This task involves creating a

    specic REST URL with proper parameters and content. Weutilize existing CSP authentication mechanisms for access toeach cloud, though such procedures are not mandatory. Wehave implemented connectors for Google Drive, DropBox,SkyDrive (now called OneDrive), and Box, with connec-tors to more CSPs planned in the future. These providers, asshown in Table 2, use standard web authentication mecha-nisms. Open source cloud connectors like Jclouds [4] can beused in place of CYRUS’s own connectors.

    Figure 11 shows screenshots of our user interface. Figure11a shows a list of CSP accounts connected to CYRUS,while Figure 11b lists the les and folders uploaded toCYRUS’s directory. Figure 11c shows the history of a lewithin CYRUS; users can view and restore previous versionsof each le. A demo video of the prototype is available [1].

    7. Performance EvaluationCYRUS’s architecture and system design ensure that it sat-ises client requirements for privacy, reliability, and latency.This section considers CYRUS’s performance in a testbedenvironment before presenting real-world results from acomparison with similar systems and trial deployment.

    7.1 Erasure Coding

    R-S decoding requires an attacker to possess at least t sharesof a given le chunk as well as the dispersal matrix used toencode the chunk. Since we generate the dispersal matrixfrom hash values of the user’s key string, recovering thedispersal matrix requires the user’s key string, providing aninitial layer of privacy. Even if the dispersal matrix is known,the attacker must still gain access to t of n shares of eachle chunk. Since we store the shares on physically separateCSPs (Section 4.1), such a coordinated breach of users’ CSPaccounts is unlikely.

  • 8/17/2019 Cyrus Eurosys

    11/16

    (a) Connected CSP accounts. (b) Files stored on CYRUS. (c) File history in CYRUS.

    Figure 11: Screenshots of the CYRUS user interface.

    0

    100

    200

    300

    400

    500

    600

    2 3 4 5 6 7 8 9 10

    T h r o u g

    h p u

    t [ M B / s ]

    t

    EncodingDecoding

    (a) Changing t with n =11.

    0

    100

    200

    300

    400

    500

    600

    3 4 5 6 7 8 9 10 11

    T h r o u g

    h p u

    t [ M B / s ]

    n

    EncodingDecoding

    (b) Changing n with t=2.

    Figure 12: Empirical overhead of 100 MB chunk encodingand decoding while changing t and n .

    If the attacker knows parts of the dispersal matrix or re-covers t < t shares, it would be computationally expensiveto guess the remaining information. The dispersal matrix isgenerated by a vector of length t , each of whose elementsbelongs to a nite eld of size 2m , where m is an integerencoding parameter chosen by the user. Similarly, the non-systematic R-S code can use t shares to recover the origi-nal le up to t − t linear equations, or equivalently t − tunknowns. Since each unknown has 2m possible values, anexhaustive search would be computationally expensive.

    We implement the share encoding and decoding with thewell-known zfec library [ 41]. Figure 12 shows the empir-ical overhead of the share encoding and decoding. As wewould expect, a larger t leads to longer decoding times (i.e.,lower throughput), with a minimum throughput of around100MB/s for t=10. The encoding throughput depends moreon n rather than t, reaching a minimum of around 100MB/swhen n=11. In our experiments, we take (t, n ) between (2,3)and (3,5), so the encoding and decoding throughputs are atleast 200 and 300 MB/s respectively. The completion timebottleneck in our experiments below is therefore data trans-fer rather than encoding and decoding overhead.

    7.2 Privacy and Reliability

    We rst show that CYRUS improves CSP reliability in Fig-ure 13, which shows the simulated number of cloud fail-ures with single CSPs and CYRUS with different congu-rations. We base the simulations on real-world monitoringdata from four commercial CSPs whose downtime variesfrom 1.37 to 18.53 hours per year [ 12]. At the end of a sim-ulation with 107 trials, even the most reliable CSP returnedapproximately 1,500 failed requests, while CYRUS showed

    Extension # of les Total bytes Avg. size (bytes)pdf 70 60,575,608 865,366pptx 11 12,263,894 1,114,899docx 15 9,844,628 656,309 jpg 55 151,918,946 2,762,163mov 7 351,603,110 50,229,016apk 10 4,872,703 487,270ipa 4 47,354,590 11,838,648

    Total 172 638,433,479 3,711,823

    Table 4: Testbed evaluation dataset.

    1

    10

    100

    1000

    10000

    100000

    0 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06 1e+07

    C u m u

    l a t i v e

    f a i l u r e s

    Trial

    Single CSPCYRUS(2,3)

    CYRUS(3,4)CYRUS(4,5)

    Figure 13: Simulated number of cumulative CSP failures.

    only 44 failures with (t, n ) = (3 , 4) and no failures with(t, n ) = (2 , 4). However, as we show in testbed experimentsbelow, taking (t, n ) = (3 , 4) improves upload and downloadcompletion times compared to other congurations.

    Testbed setup: We construct a testbed with a MacBook Pro client and seven private cloud servers as our CSPs, con-nected with 1Gbps ethernet links. We emulate CSP cloudperformance by using tc and netem to change network con-ditions for the different servers. We set maximum through-puts of 15MB/s for four cloud servers (the “fast” clouds) and2MB/s for the remaining three clouds (the “slow” clouds).

    We evaluate CYRUS’s performance on a dataset of sev-eral different le types. Table 4 summarizes the number andsize of the dataset les by their extensions. The total datasetsize is 638.43 MB, with an average le size of 3.71 MB. Weuse content-based chunking to divide the les into chunkswith an average chunk size of 4MB, following Dropbox [ 14].

    Performance results: We rst consider the effect of changing reliability and privacy on le download comple-tion times. We test three congurations: (t, n ) = ( 2 , 3),(2, 4), and (3, 4). Compared to the (2, 3) conguration,(t, n ) = (2 , 4) is more reliable, while (t, n ) = (3 , 4) gives

  • 8/17/2019 Cyrus Eurosys

    12/16

    0

    50

    100

    150

    200

    250

    300

    (2,3) (2,4) (3,4)

    C o m p

    l e t i o n

    t i m e

    [ s e c

    ]CYRUS

    HeuristicRandom

    (a) Mean completion time.

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 2 4 6 8 10 12 14 16

    C D F

    Download speed [MB/s]

    CYRUSHeuristicRandom

    (b) Throughput, (t, n ) = (2 , 3).Figure 14: Testbed download performance of random,heuristic, and CYRUS cloud selection.

    more privacy. We compare the performance of three down-load CSP selection algorithms: CYRUS’s algorithm fromSection 4, random, and heuristic. 12 The random algorithmchooses CSPs randomly with uniform probability, and theheuristic algorithm is a round-robin scheme.

    The download completion times are shown in Figure 14a.For all congurations, CYRUS’s algorithm has the short-est download times. The random algorithm has the longest,likely due to the high probability of downloading from aslow cloud. Figure 14b shows the distribution of through-puts achieved by all les; we see that the throughput withCYRUS’s algorithm is decidedly to the right of (i.e., is largerthan) the random and heuristic algorithms’ throughputs.

    The completion time of CYRUS’s algorithmwhen (t, n ) =(3, 4) is especially short compared to the other (t, n ) values(Figure 14a). Since the share size of a given chunk equals thechunk size divided by t, higher values of t will yield smallershare sizes, lowering completion times for (parallel) sharedownloads. However, the random and heuristic algorithms’completion times do not vary much with the congurations.

    With (t, n ) = (3 , 4), CYRUS must download shares fromthree clouds instead of two with the other congurations, in-creasing the probability that these algorithms will use slowclouds and canceling the effect of smaller shares.

    Figure 15 shows the cumulative upload and downloadtimes for all les with CYRUS’s algorithm. As expected,the more private (3, 4) conguration has consistently shortercompletion times, especially for uploads. The more reliable(2, 4) and (2, 3) congurations yield more similar comple-tion times; their shares are the same size. The (2, 4) congu-ration has slightly longer upload times since the shares mustbe uploaded to all 4 clouds, including the slowest ones.

    7.3 Real-World BenchmarkingWe benchmark CYRUS’s upload and downloadperformanceon Dropbox, Google Drive, SkyDrive, and Box. We com-pare CYRUS to DepSky [ 7], a “cloud-of-clouds” system thatalso uses R-S coding to store les at multiple CSPs. Dep-Sky’s upload and download protocols, however, differ fromCYRUS’s: they require two round-trip communications with

    12 We did not compare the true-optimal performance, since nding the trueoptimum would require searching over ∼ C (t, n )300 possible downloadcongurations, which is prohibitively expensive.

    0

    50

    100

    150

    200

    250

    300

    0 20 40 60 80 100 120 140 160 180 C u m u

    l a t i v e c o m p

    l e t i o n

    t i m e

    [ s e c

    ]

    File id

    (2,3)(2,4)(3,4)

    (a) Upload.

    0

    10

    20

    30

    40 50

    60

    70

    80

    0 2 0 4 0 6 0 8 0 1 00 12 0 1 40 16 0 1 80 C u m u

    l a t i v e c o m p

    l e t i o n

    t i m e

    [ s e c

    ]

    File id

    (2,3)(2,4)(3,4)

    (b) Download.

    Figure 15: Testbed completion times of different privacyand reliability congurations.

    0

    50

    100

    150

    200

    250

    300

    350

    Upload Time Download Time

    C o m p

    l e t i o n

    t i m e

    [ s e c

    ]

    CYRUSDepSkyFull Rep

    Full Stripe

    Figure 16: Completion times of different storage schemes.

    CSPs to set lock les, preventing simultaneous updates, anda random backoff time after setting the lock. Moreover, Dep-Sky uses a greedy algorithm that always downloads sharesfrom the fastest CSPs, in contrast to CYRUS’s optimizedCSP selection. To compare the two systems, we implementDepSky within CYRUS.

    Figure 16 shows the upload and download completiontimes for a 40 MB le with CYRUS, DepSky, and two otherbaseline storage schemes. Full Replication stores a 40 MB

    replica and Full Striping a 10 MB fragment at each of thefour CSPs. Both CYRUS and DepSky use (t, n ) = (2 , 3),so each share is 20 MB .13 Since full striping uploads theleast amount of data to each CSP, it has the shortest uploadcompletion times. However, full striping is not reliable, asany CSP failure would prevent the user from retrieving thele. CYRUS has the second-best upload performance. Dep-Sky’s upload time is more than twice as long as CYRUS’sand longer than Full Replication’s, though Full Replicationuploads twice as much data as DepSky to each CSP.

    CYRUS’s optimized downlink CSP selection algorithmallows it to achieve the shortest download completion time.DepSky shows longer completion times than either CYRUSor Full Striping, which must download from all four CSPs,including the slowest. Full Replication has the longest down-load time, since we averaged its performance over all fourCSPs. Its download time would have been shorter (24.118seconds) with the optimal CSP, but longer (519.012 seconds)with the slowest.

    13 For ease of comparison to full striping and full replication, we do notchunk the le with CYRUS or DepSky. We can thus verify that CYRUS’sdownload CSP selection is perfectly optimal, as only C (t, n ) shares needto be downloaded from the CSPs.

  • 8/17/2019 Cyrus Eurosys

    13/16

    0

    10

    20

    30

    40

    50

    CSP1 CSP2 CSP3 CSP4

    C o m p

    l e t i o n

    t i m e

    [ S e c

    ]CYRUSDepSky

    (a) Upload.

    0

    10

    20

    30

    40

    50

    CSP1 CSP2 CSP3 CSP4

    C o m p

    l e t i o n

    t i m e

    [ S e c

    ]CYRUSDepSky

    (b) Download.

    Figure 17: Completion times with CYRUS and DepSky.

    0

    5

    10

    15

    20

    25

    30

    35

    CSP1 CSP2 CSP3 CSP4

    # o

    f S h a r e s

    CYRUSDepSky

    (a) Upload.

    0

    5

    10

    15

    20

    25

    30

    35

    CSP1 CSP2 CSP3 CSP4

    # o

    f S h a r e s

    CYRUSDepSky

    (b) Download.Figure 18: Share distribution with CYRUS and DepSky.

    We next compare CYRUS’s and DepSky’s achieved up-load and download completion times for a 1MB le, withupload and download measurements taken every hour fortwo days. Figure 17 shows box plots of the resulting com-pletion times. CYRUS achieves signicantly shorter com-pletion times to all CSPs; DepSky’s upload times are partic-ularly large at nearly twice those of CYRUS.

    Figure 18 shows the number of shares uploaded to eachCSP. DepSky stores more shares at consistently faster CSPs:it starts uploads to all CSPs and cancels pending requests

    when n uploads complete, while CYRUS distributes sharesevenly. CYRUS thus ensures that no one CSP will run outof capacity before the others, while DepSky’s approach cancause one CSP to quickly use all of its capacity, hindering thedistribution of shares to multiple clouds. Similarly, CYRUSspreads share downloads more evenly across CSPs.

    7.4 Deployment Trial Results

    We recruited 20 academic (faculty, research staff, and stu-dent) users from the United States and Korea to participatein a small-scale trial. We deployed trial versions of CYRUSfor OS X and Windows that send log data to our develop-ment server. Over the course of the trial in summer 2014, wecollected approximately 35k lines of logs. Since clients in-stalled CYRUS on their laptops and desktops, we observedlittle client mobility within the U.S. or Korea; we thus reportaverage results for each country and do not separate the trialresults for different client locations within the countries.

    We compare upload and download times for trial par-ticipants in the U.S. and Korea in Figure 19, which showsCYRUS’s completion times to upload a 20 MB test le whenconnected to Dropbox, Google Drive, SkyDrive, and Box.We use (t, n ) = (2 , 3) and (2, 4), so that uploading a le

    0

    10

    20

    30

    40

    50

    60

    Upload Download

    C o m p

    l e t i o n

    t i m e

    [ s e c

    ]CSP1CSP2CSP3

    CSP4CYRUS (2,3)CYRUS (2,4)

    (a) U.S.

    0

    100

    200

    300

    400

    500

    600

    U pl oad D own lo ad

    C o m p

    l e t i o n

    t i m e

    [ s e c

    ]CSP1CSP2CSP3

    CSP4CYRUS (2,3)CYRUS (2,4)

    (b) Korea.

    Figure 19: Completion times during the trial.

    to individual CSPs is neither as reliable nor as private asCYRUS.

    In the U.S. (Figure 19a), CYRUS encounters a bottleneck of limited total uplink throughput from the client, slowingits connections to each individual CSP and lengthening up-load completion time. When (t, n ) = (2 , 3), CYRUS is stillfaster than all but one CSP. However, when (t, n ) = (2 , 4),CYRUS uploads twice as much data in total than just up-loading the le to one CSP; (2, 4)’s upload time is thereforelonger than that of all single CSPs. In Korea (Figure 19b),connections to individual CSPs are much slower than in theU.S., so CYRUS does not encounter this client throughputbottleneck with either conguration. Since we upload lessdata to each CSP, both CYRUS congurations give shorterupload times than all individual CSPs.

    CYRUS’s download times for both congurations areshorter than those from individual CSPs. In both the U.S.and Korea, CYRUS has slightly longer download times thanthe fastest CSP, as it downloads shares from two CSPs. How-ever, its download times are shorter than those from all otherCSPs. Comparing the (t, n ) = (2 , 3) and (2, 4) congura-

    tions, we see that the (t, n ) values affect download timesmore in Korea, due to the slower CSP connections there. Us-ing (t, n ) = (2 , 4) instead of (2, 3) requires uploading an ad-ditional share, increasing the upload time in the U.S. by 7.78seconds with little decrease in the download time. However,in Korea, using (t, n ) = (2 , 4) does not appreciably increasethe upload time and saves 33.805 seconds of download time.For any (t, n ) values, CYRUS shortens the average uploadtime in Korea, where client bandwidth is not a bottleneck,and the average download time in both countries.

    7.5 User Experience

    After our trial, we informally surveyed the participantsto understand their experience in using CYRUS and theirthoughts on whether such a storage service would be usefulin practice. While the survey represents a limited subset of academic users who are familiar with computer usage, theirresponses demonstrate CYRUS’s potential to be practicaland useful for the general population.

    Most of our users had accounts at multiple CSPs butonly used one or two regularly, likely due to the overheadin tracking which les were stored at which CSP. CYRUSthus allowed them to better utilize space on all of their cloud

  • 8/17/2019 Cyrus Eurosys

    14/16

    accounts without additional overhead. In fact, two users inKorea thought CYRUS was faster than uploading les toindividual CSPs, as is consistent with the upload results inFigure 19b. The remaining users in the U.S. and Korea didnot notice any difference in upload or download speed withCYRUS. Users mainly used CYRUS to store documentation

    and image les, with almost all les < 10 MB in size.All but one user found CYRUS’s interface easy-to-use.Users liked having a separate folder for CYRUS’s les (Fig-ure 11b), with one U.S. user saying that “ CYRUS is [as]good as any other application when we consider le di-alogs. ” They also appreciated the ability to view les’ his-tory without accessing a web app (Figure 11c), with oneuser reporting that “ the fact that it[’]s visible on the App[made] it easier to use and to access. ” Users reported thatregistering their CSP accounts with CYRUS was their mainsource of inconvenience; our prototype seeks to minimizethis overhead by locally caching authentication keys so thatusers need only login to their CSPs once.

    Allusers expressed interest in using CYRUS in the future,but made some suggestions for improvement. One user, forinstance, suggested adding a feature to import les alreadystored at CSPs. The most common suggestion, made bynearly half of the users, was to implement CYRUS on mobiledevices so that les could be shared between their laptopsand smartphones. We plan to add mobile support in ourfuture work.

    8. Promoting Market CompetitionWhile commoditization often decreases market competitionby favoring large economies of scale, CYRUS may instead

    promote competition, as its privacy and reliability guaran-tees depend on users having accounts at multiple CSPs.

    Without CYRUS, users experience a phenomenon knownas “vendor lock-in:” After buying storage from one CSP, auser might not use storage from other CSPs due to the over-head in storing les at and retrieving them from multipleplaces. Since different CSPs enter the market at differenttimes, vendor lock-in can thus lead to uneven adoption of CSPs and correspondingly uneven revenues. To combat ven-dor lock-in, many CSPs offer some free storage for individ-ual (i.e., non-business) users. However, persuading users tolater pay for more storage is still difcult, unless the newCSP offers much better service than the previous one. Busi-ness users, who do not receive free storage, have no incentiveto join more than one CSP, exacerbating vendor lock-in.

    CYRUS eliminates vendor lock-in for its users by remov-ing the overhead of storing les at multiple CSPs. In fact,CYRUS encourages users to purchase storage at multipleCSPs, which increases its achievable reliability and privacy:users can store more chunk shares at different CSPs. As-suming comparable CSP prices, a given user might then pur-chase storage at all available CSPs, even-ing out CSP marketshares. CSPs entering into the market would also be able to

    gain users. Some CSPs could gain a competitive advantageby providing faster connections or better coordination withCYRUS, but user demand would still exist at other CSPs.

    Since CYRUS’s secret sharing scheme increases theamount of data stored by a factor of n/t , users would need topurchase more total cloud storage with CYRUS. Total CSP

    revenue from business users might then increase, though itwould likely be more evenly distributed among CSPs. CSPrevenue from individual users, however, might decrease:some users could collect free storage from different CSPswithout needing to purchase any additional storage. Thus,CYRUS may discourage CSPs from offering free introduc-tory storage to individual users, in order to boost revenue.

    9. ConclusionCYRUS is a client-dened cloud storage system that in-tegrates multiple CSPs to meet individual users’ privacy,reliability, and performance requirements. Our system de-sign addresses several practical challenges, including op-timally selecting CSPs, sharing le metadata, and concur-rency, as well as challenges in improving reliability and pri-vacy. CYRUS’s client-based architecture allows multiple,autonomous clients to scatter shared les to multiple CSPs,ensuring privacy by dividing the les so that no one CSP canreconstruct any of the le data. Moreover, we upload moreshares than are necessary to reconstruct the le to ensurereliability. CYRUS optimizes the share downloads so as tominimize the delay in transferring data and simultaneouslyallows le updates from multiple clients, detecting any con-icts from the client. We evaluate CYRUS’s performance inboth a testbed environment and real-world deployment.

    By realizing a client-dened cloud architecture, CYRUSips the traditional cloud-servicing-clients model to oneof clients controlling multiple clouds. CYRUS thus opensup new possibilities for cloud services, as similar systemsmay disrupt the current cloud ecosystem by commoditizingCSPs; thus, these services represent not just a technological,but also an economic change. These wide-ranging impli-cations make client-dened cloud architectures an excitingarea for future work.

    AcknowledgmentsThis research was supported by the Princeton University IPAcceleration Fund and the MSIP (Ministry of Science, ICT

    and Future Planning), Korea, under the ICT/SW CreativeResearch program (NIPA-2014-H0510-14-1009) supervisedby the NIPA (National IT Industry Promotion Agency).

  • 8/17/2019 Cyrus Eurosys

    15/16

    References[1] CYRUS demo video, 2014. http://youtu.be/

    DPK3NbEvdM8.[2] Git merge, 2015. http://git-scm.com/docs/git-merge .[3] ABU -L IBDEH , H . , PRINCEHOUSE , L . , AND WEATHER -

    SPOON , H. Racs: A case for cloud storage diversity. In Proc.

    of ACM SoCC (New York, NY, USA, 2010), ACM, pp. 229–240.

    [4] APACHE . What is apache JClouds?, 2013. http://jclouds.apache.org/ .

    [5] APACHE SOFTWARE FOUNDATION . HBase, 2014. http://hbase.apache.org/ .

    [6] ATTASENA , V., H ARBI , N., AN D DARMONT , J. Sharing-based privacy and availability of cloud data warehouses. In9mes journes francophones sur les Entrepts de Donnes et l’Analyse en ligne (EDA 13), Blois (Paris, Juin 2013), vol. B-9 of Revues des Nouvelles Technologies de l’Information ,Hermann, pp. 17–32.

    [7] BESSANI , A., C ORREIA , M., Q UARESMA , B., AND R É , F.,AN D SOUSA , P. Depsky: Dependable and secure storage ina cloud-of-clouds. In Proc. of ACM EuroSys (New York, NY,USA, 2011), ACM, pp. 31–46.

    [8] BESSANI , A . , MENDES , R . , OLIVEIRA , T., N EVES , N. ,CORREIA , M., PASIN , M., AN D VERISSIMO , P. Scfs: Ashared cloud-backed le system. In Proceedings of the 2014USENIX Conference on USENIX Annual Technical Confer-ence (Berkeley, CA, USA, 2014), USENIX ATC’14, USENIXAssociation, pp. 169–180.

    [9] BOWERS , K. D., JUELS , A., A ND OPREA , A. Hail: A high-availability and integrity layer for cloud storage. In Proc. of ACM CCS (New York, NY, USA, 2009), ACM, pp. 187–198.

    [10] C ACHIN , C. , AN D TESSARO , S. Optimal resilience for

    erasure-coded Byzantine distributed storage. In Proc. of IEEE DSN (Washington, DC, USA, 2006), IEEE Computer Society,pp. 115–124.

    [11] C HANG , F. , D EAN , J . , GHEMAWAT , S . , HSIEH , W. C.,WALLACH , D. A., BURROWS , M., CHANDRA , T., F IKES ,A. , A ND GRUBER , R. E. Bigtable: A distributed storage sys-tem for structured data. ACM Trans. Comput. Syst. 26 , 2 (June2008), 4:1–4:26.

    [12] C LOUD SQUARE . Research and compare cloud providersand services, Mar. 2015. https://cloudharmony.com/status-1year-of-storage-group-by-regions .

    [13] D ECANDIA , G., H ASTORUN , D., JAMPANI , M., K AKULA -PATI , G., L AKSHMAN , A. , P ILCHIN , A. , S IVASUBRAMA -

    NIAN , S., VOSSHALL , P., A ND VOGELS , W. Dynamo: Ama-zon’s highly available key-value store. In Proc. of ACM SOSP(2007), ACM, pp. 205–220.

    [14] D RAGO , I., MELLIA , M., M. M UNAFO , M., SPEROTTO , A.,SADRE , R., AN D P RA S , A. Inside Dropbox: Understandingpersonal cloud storage services. In Proc. of ACM IMC (NewYork, NY, USA, 2012), ACM, pp. 481–494.

    [15] D UMON , M. Cloud storage industry continues rapidgrowth. Examiner.com, 2013. http://www.examiner.com/article/cloud-storage-industry-continues-rapid-growth .

    [16] F INLEY, K. Three reasons why amazon’s new storage servicewon’t kill dropbox. Wired, 2014. http://www.wired.com/2014/07/amazon-zocalo/ .

    [17] G OODSON , G . R . , WYLIE , J . J . , G ANGER , G. R. , AN DREITER , M . K. Efcient byzantine-tolerant erasure-codedstorage. In Proc. of IEEE DSN (Washington,DC, USA, 2004),IEEE Computer Society, pp. 135–.

    [18] G UPTA , A. Outage post-mortem. Dropbox Tech Blog,2014. https://blogs.dropbox.com/tech/2014/01/outage-post-mortem/ .

    [19] H U , Y., C HEN , H. C. H. , L EE , P. P. C., AN D TAN G , Y.Nccloud: Applying network coding for the storage repair ina cloud-of-clouds. In Proc. of USENIX FAST (Berkeley, CA,USA, 2012), USENIX Association, p. 21.

    [20] K ARGER , D., SHERMAN , A., B ERKHEIMER , A., B OGSTAD ,B., D HANIDINA , R., IWAMOTO , K., K IM , B., M ATKINS , L.,AN D Y ERUSHALMI , Y. Web caching with consistent hashing.Comput. Netw. 31 , 11-16 (May 1999), 1203–1213.

    [21] K EPES , B. At last some clarity about who is win-ning the cloud le sharing war... Forbes, 2014.http://www.forbes.com/sites/benkepes/2014/03/03/user-numbers-or-revenue-for-cloud-file-sharing-vendors-which-is-most-important/ .

    [22] L ANGO , J. Toward software-dened slas. Commun. ACM 57 ,1 (Jan. 2014), 54–60.

    [23] L ING , C. W., AN D DATTA , A. InterCloud RAIDer: A do-it-yourself multi-cloud private data backup system. In Dis-tributed Computing and Networking . Springer, 2014, pp. 453–468.

    [24] M ACHADO , G., BOCEK , T., A MMANN , M., AN D STILLER ,B. A cloud storage overlay to aggregate heterogeneous cloudservices. In Local Computer Networks (LCN), 2013 IEEE 38th Conference on (Oct 2013), pp. 597–605.

    [25] M ALKHI , D., AN D REITER , M. Byzantine quorum systems.In Proc. of ACM STOC (New York, NY, USA, 1997), ACM,pp. 569–578.

    [26] M ALKHI , D., AN D REITER , M. K. Secure and scalablereplication in Phalanx. In Proc. of IEEE SRDS (Washington,DC, USA, 1998), IEEE Computer Society, pp. 51–.

    [27] M ARTIN , J.-P., A LVISI , L., AN D DAHLIN , M. MinimalByzantine storage. In Proc. of DISC (London, UK, UK,2002), Springer-Verlag, pp. 311–325.

    [28] M CELIECE , R. J., A ND SARWATE , D . V. On sharing secretsand Reed-Solomon codes. Communications of the ACM 24 , 9(1981), 583–584.

    [29] M U , S., CHE N , K., G AO , P., Y E , F., W U , Y., A ND ZHENG ,W. µ-libcloud: Providing high available and uniform ac-cessing to multiple cloud storages. In Proc. of ACM/IEEE GRID (Washington, DC, USA, 2012), IEEE Computer Soci-ety, pp. 201–208.

    [30] PAPAIOANNOU , T. G. , B ONVIN , N. , AN D ABERER , K.Scalia: An adaptive scheme for efcient multi-cloud storage.In Proc. of IEEE SC (Los Alamitos, CA, USA, 2012), IEEEComputer Society Press, pp. 20:1–20:10.

    http://youtu.be/DPK3NbEvdM8http://youtu.be/DPK3NbEvdM8http://youtu.be/DPK3NbEvdM8http://git-scm.com/docs/git-mergehttp://git-scm.com/docs/git-mergehttp://jclouds.apache.org/http://jclouds.apache.org/http://jclouds.apache.org/http://hbase.apache.org/http://hbase.apache.org/http://hbase.apache.org/https://cloudharmony.com/status-1year-of-storage-group-by-regionshttps://cloudharmony.com/status-1year-of-storage-group-by-regionshttps://cloudharmony.com/status-1year-of-storage-group-by-regionshttp://www.examiner.com/article/cloud-storage-industry-continues-rapid-growthhttp://www.examiner.com/article/cloud-storage-industry-continues-rapid-growthhttp://www.examiner.com/article/cloud-storage-industry-continues-rapid-growthhttp://www.examiner.com/article/cloud-storage-industry-continues-rapid-growthhttp://www.wired.com/2014/07/amazon-zocalo/http://www.wired.com/2014/07/amazon-zocalo/http://www.wired.com/2014/07/amazon-zocalo/https://blogs.dropbox.com/tech/2014/01/outage-post-mortem/https://blogs.dropbox.com/tech/2014/01/outage-post-mortem/https://blogs.dropbox.com/tech/2014/01/outage-post-mortem/http://www.forbes.com/sites/benkepes/2014/03/03/user-numbers-or-revenue-for-cloud-file-sharing-vendors-which-is-most-important/http://www.forbes.com/sites/benkepes/2014/03/03/user-numbers-or-revenue-for-cloud-file-sharing-vendors-which-is-most-important/http://www.forbes.com/sites/benkepes/2014/03/03/user-numbers-or-revenue-for-cloud-file-sharing-vendors-which-is-most-important/http://www.forbes.com/sites/benkepes/2014/03/03/user-numbers-or-revenue-for-cloud-file-sharing-vendors-which-is-most-important/http://www.forbes.com/sites/benkepes/2014/03/03/user-numbers-or-revenue-for-cloud-file-sharing-vendors-which-is-most-important/http://www.forbes.com/sites/benkepes/2014/03/03/user-numbers-or-revenue-for-cloud-file-sharing-vendors-which-is-most-important/http://www.forbes.com/sites/benkepes/2014/03/03/user-numbers-or-revenue-for-cloud-file-sharing-vendors-which-is-most-important/https://blogs.dropbox.com/tech/2014/01/outage-post-mortem/https://blogs.dropbox.com/tech/2014/01/outage-post-mortem/http://www.wired.com/2014/07/amazon-zocalo/http://www.wired.com/2014/07/amazon-zocalo/http://www.examiner.com/article/cloud-storage-industry-continues-rapid-growthhttp://www.examiner.com/article/cloud-storage-industry-continues-rapid-growthhttp://www.examiner.com/article/cloud-storage-industry-continues-rapid-growthhttps://cloudharmony.com/status-1year-of-storage-group-by-regionshttps://cloudharmony.com/status-1year-of-storage-group-by-regionshttp://hbase.apache.org/http://hbase.apache.org/http://jclouds.apache.org/http://jclouds.apache.org/http://git-scm.com/docs/git-mergehttp://youtu.be/DPK3NbEvdM8http://youtu.be/DPK3NbEvdM8

  • 8/17/2019 Cyrus Eurosys

    16/16

    [31] PATTERSON , S . Personal cloud storage hit 685petabytes this year. WebProNews, 2013. http://www.webpronews.com/personal-cloud-storage-hit-685-petabytes-this-year-2013-12 .

    [32] P ERLROTH , N . Home Depot data breach couldbe the largest yet. The New York Times, 2014.http://bits.blogs.nytimes.com/2014/09/08/home-depot-confirms-that-it-was-hacked/ .

    [33] R ABIN , M. Fingerprinting by Random Polynomials . Centerfor Research in Computing Technology. Aiken ComputationLaboratory, Univ., 1981.

    [34] R AWAT , V. Reducing failure probability of cloud storage ser-vices using multi-cloud. Master’s thesis, Rajasthan TechnicalUniversity, 2013.

    [35] R ESCH , J . K. , AN D PLANK , J. S. AONT-RS: Blendingsecurity and performance in dispersed storage systems. InProc. of the 9th USENIX FAST (Berkeley, CA, USA, 2011),USENIX Association, pp. 14–14.

    [36] S HAMIR , A . How to share a secret. Commun. ACM 22 , 11(Nov. 1979), 612–613.

    [37] S TONE , B. Another Amazon outage exposes thecloud’s dark lining. Bloomberg Businessweek, 2013.http://www.businessweek.com/articles/2013-08-26/another-amazon-outage-exposes-the-clouds-dark-lining .

    [38] S TRUNK , A . , M OSCH , M . , GROB , S . , THO B , Y., AN DSCHILL , A. Building a exible service architecture for usercontrolled hybrid clouds. In Proc. of ARES (Washington, DC,USA, 2012), IEEE Computer Society, pp. 149–154.

    [39] WAKABAYASHI , D. Tim Cook says Apple to add securityalerts for icloud users. The Wall Street Journal, 2014.http://online.wsj.com/articles/tim-cook-says-apple-to-add-security-alerts-for-icloud-users-1409880977 .

    [40] W EIL , S . A . , B RANDT , S . A . , M ILLER , E . L . , L ON G ,D. D. E., AN D MALTZAHN , C. Ceph: A scalable, high-performance distributed le system. In Proc. of OSDI (Berke-ley, CA, USA, 2006), USENIX Association, pp. 307–320.

    [41] W ILCOX -OH EARN , Z. zfec package 1.4.24. https://pypi.python.org/pypi/zfec .

    http://www.webpronews.com/personal-cloud-storage-hit-685-petabytes-this-year-2013-12http://www.webpronews.com/personal-cloud-storage-hit-685-petabytes-this-year-2013-12http://www.webpronews.com/personal-cloud-storage-hit-685-petabytes-this-year-2013-12http://www.webpronews.com/personal-cloud-storage-hit-685-petabytes-this-year-2013-12http://bits.blogs.nytimes.com/2014/09/08/home-depot-confirms-that-it-was-hacked/http://bits.blogs.nytimes.com/2014/09/08/home-depot-confirms-that-it-was-hacked/http://www.businessweek.com/articles/2013-08-26/another-amazon-outage-exposes-the-clouds-dark-lininghttp://www.businessweek.com/articles/2013-08-26/another-amazon-outage-exposes-the-clouds-dark-lininghttp://www.businessweek.com/articles/2013-08-26/another-amazon-outage-exposes-the-clouds-dark-lininghttp://www.businessweek.com/articles/2013-08-26/another-amazon-outage-exposes-the-clouds-dark-lininghttp://online.wsj.com/articles/tim-cook-says-apple-to-add-security-alerts-for-icloud-users-1409880977http://online.wsj.com/articles/tim-cook-says-apple-to-add-security-alerts-for-icloud-users-1409880977http://online.wsj.com/articles/tim-cook-says-apple-to-add-security-alerts-for-icloud-users-1409880977http://online.wsj.com/articles/tim-cook-says-apple-to-add-security-alerts-for-icloud-users-1409880977https://pypi.python.org/pypi/zfechttps://pypi.python.org/pypi/zfechttps://pypi.python.org/pypi/zfechttps://pypi.python.org/pypi/zfechttps://pypi.python.org/pypi/zfechttp://online.wsj.com/articles/tim-cook-says-apple-to-add-security-alerts-for-icloud-users-1409880977http://online.wsj.com/articles/tim-cook-says-apple-to-add-security-alerts-for-icloud-users-1409880977http://online.wsj.com/articles/tim-cook-says-apple-to-add-security-alerts-for-icloud-users-1409880977http://www.businessweek.com/articles/2013-08-26/another-amazon-outage-exposes-the-clouds-dark-lininghttp://www.businessweek.com/articles/2013-08-26/another-amazon-outage-exposes-the-clouds-dark-lininghttp://www.businessweek.com/articles/2013-08-26/another-amazon-outage-exposes-the-clouds-dark-lininghttp://bits.blogs.nytimes.com/2014/09/08/home-depot-confirms-that-it-was-hacked/http://bits.blogs.nytimes.com/2014/09/08/home-depot-confirms-that-it-was-hacked/http://www.webpronews.com/personal-cloud-storage-hit-685-petabytes-this-year-2013-12http://www.webpronews.com/personal-cloud-storage-hit-685-petabytes-this-year-2013-12http://www.webpronews.com/personal-cloud-storage-hit-685-petabytes-this-year-2013-12