10.1.1.12.3444

Upload: rejeev-cv

Post on 03-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 10.1.1.12.3444

    1/12

    pStore: A Secure Peer-to-Peer Backup System

    Christopher Batten, Kenneth Barr, Arvind Saraf, Stanley Trepetin

    {cbatten|kbarr|arvind s|stanleyt}@mit.edu

    Abstract

    In an effort to combine research in peer-to-peersystems with techniques for incremental backupsystems, we propose pStore: a secure distributedbackup system based on an adaptive peer-to-peernetwork. pStore exploits unused personal harddrive space attached to the Internet to provide thedistributed redundancy needed for reliable and ef-fective data backup. Experiments on a 30 nodenetwork show that 95% of the files in a 13 MBdataset can be retrieved even when 7 of the nodes

    have failed. On top of this reliability, pStore in-cludes support for file encryption, versioning, andsecure sharing. Its custom versioning system per-mits arbitrary version retrieval similar to CVS.pStore provides this functionality at less than 10%of the network bandwidth and requires 85% lessstorage capacity than simpler local tape backupschemes for a representative workload.

    1 Introduction

    Current backup systems for personal andsmall-office computer users usually rely on sec-ondary on-site storage of their data. Althoughthese on-site backups provide data redundancy,they are vulnerable to localized catastrophe.More sophisticated off-site backups are possiblebut are usually expensive, difficult to manage,and are still a centralized form of redundancy.Independent from backup systems, current peer-to-peer systems focus on file-sharing, distributedarchiving, distributed file systems, and anony-mous publishing. Motivated by the strengths

    and weaknesses of current peer-to-peer systems,as well as the specific desires of users needing tobackup personal data, we propose pStore: a se-cure peer-to-peer backup system.

    pStore provides a user with the ability to se-curely backup files in, and restore files from, adistributed network of untrusted peers. Insert,

    pStore was developed October-December 2001 as aproject for MIT 6.824: Distributed Computer Systems.

    update, retrieve, and delete commands may be in-voked by various user interfaces (e.g., a commandline, file system, or GUI) according to a usersneeds. pStore maintains snapshots for each fileallowing a user to restore any snapshot at a laterdate. This low-level versioning primitive per-mits several usage models. For example, worksin progress may be backed up hourly so that auser can revert to a last-known-good copy, or anentire directory tree can be stored to recover froma disk crash.

    pStore has three primary design goals: relia-bility, security, and resource efficiency. pStoreprovides reliability through replication; copies areavailable on several servers in case some of theseservers are malicious or unavailable. Since aclients data is replicated on nodes beyond hiscontrol, pStore strives to provide reasonable se-curity: private data is readable only by its owner;data can be remotely deleted only by its owner;and any unwanted changes to data can be easilydetected. Finally, since backups can be frequentand large, pStore aims to reduce resource-usageby sharing stored data and exchanging data onlywhen necessary.

    Section 2 discusses related systems. pStoredraws from their strengths while discarding func-tionality which adds overhead or complexity inthe application-specific domain of data backup.Section 3 outlines the pStore architecture, andSection 4 presents our implementation. Section 5evaluates the design in terms of the goals statedabove, and Section 6 concludes.

    2 Related Work

    A peer-to-peer backup system has two ma-jor components: the underlying peer-to-peernetwork and the backup/versioning framework.While much work has been done in the two fieldsindividually, there is little literature integratingthe two.

  • 7/28/2019 10.1.1.12.3444

    2/12

    2.1 Distributed Storage Systems

    There has been a wealth of recent work on dis-tributed storage systems. Peer-to-peer file shar-ing systems, such as Napster [15] and Gnutella[12], are in wide use and provide a mechanism for

    file search and retrieval among a large group ofusers. Napster handles searches through a cen-tralized index server, while Gnutella uses broad-cast queries. Both systems focus more on infor-mation retrieval than on publishing.

    Freenet provides anonymous publication andretrieval of data in an adaptive peer-to-peer net-work [5]. Anonymity is provided through sev-eral means including: encrypted search keys, datacaching along lookup paths, source-node spoof-ing, and probabilistic time-to-live values. Freenet

    deletes data which is infrequently accessed tomake room for more recent insertions.

    Eternity proposes redundancy and informationdispersal (secret sharing) to replicate data, andadds anonymity mechanisms to prevent selectivedenial of service attacks [1]. Document queriesare broadcast, and delivery is achieved throughanonymous remailers. Free Haven, Publius andMojo Nation also use secret sharing to achievereliability and author anonymity [9, 22, 13].

    SFSRO is a content distribution system pro-viding secure and authenticated access to read-only data via a replicated database [11]. Like SF-SRO, CFS aims to achieve high performance andredundancy, without compromising on integrityin a read-only file system [8]. Unlike completedatabase replication in SFSRO, CFS inserts filesystem blocks into a distributed storage systemand uses Chord as a distributed lookup mecha-nism [7]. The PAST system takes a similar lay-ered approach, but uses Pastry as its distributedlookup mechanism [10]. Lookups using Chordand Pastry scale as O(log(n)) with the number

    of nodes in the system. Farsite is similar to CFSin that it provides a distributed file system amongcooperative peers [2], but uses digital signaturesto allow delete operations on the file data.

    Several systems have proposed schemes to en-force storage quotas over a distributed storagesystem. Mojo Nation relies on a trusted thirdparty to increase a users quota when he con-tributes storage, network, and/or CPU resources

    to the system. The PAST system suggests thatthe same smart cards used for authenticationcould be used to maintain storage quotas [10].The Tangler system proposes an interesting quotascheme based on peer monitoring: nodes monitortheir peers and report badly behaving nodes to

    others [21].

    2.2 Versioning and Backup

    The existing distributed storage systems dis-cussed above are intended for sharing, archiving,or providing a distributed file system. As a re-sult, the systems do not provide specific supportfor incremental updates and/or versioning. Sincemany file changes are incremental (e.g., evolutionof source code, documents, and even some aspectsof binary files [14]), there has been a significant

    amount of work on exploiting these similarities tosave bandwidth and storage space.

    The Concurrent Versioning System, popularamong software development teams, combinesthe current state of a text file and a set of com-mands necessary to incrementally revert that fileto its original state [6]. Network Appliances in-corporates the WAFL file system in its network-attached-storage devices [3]. WAFL providestransparent snapshots of a file system at selectedinstances, allowing the file system data to be

    viewed either in its current state, or as it wasat some time in the past.Overlap between file versions can enable a re-

    duction in the network traffic required to updateolder versions of files. Rsync is an algorithm forupdating files on a client so that they are iden-tical to those on a server [20]. The client breaksa file into fixed size blocks and sends a hash ofeach block to the server. The server checks ifits version of the file contains any blocks whichhash to the same value as the client hashes. Theserver then sends the client any blocks for whichno matching hash was found and instructs theclient how to reconstruct the file. Note that theserver hashes fixed size blocks at every byte offset,not just multiples of the block size. To reduce thetime required when hashing at each byte offset,the rsync algorithm use two types of hash func-tions. Rsyncs slower cryptographic hash func-tion is used only when when its fast rolling hashestablishes a probable match. LBFS also uses file

    2

  • 7/28/2019 10.1.1.12.3444

    3/12

    H(E(A))Ver 1 H(E(B)) H(E(C)) ... H(E(A))Ver 1 H(E(B)) H(E(C)) ...

    H(E(B))Ver 2 H(E(D)) ...

    a b

    FBL FBL

    FB CFB BFB A FB A FB B FB CFB D

    Figure 1: File Block List and File Blocks: (a) shows a file with three equal sized blocks, (b) showshow a new version can be added by updating the file block list and adding a single new file block.

    block hashes to help reduce the ammount of datathat needs to be transmitted when updating afile [14]. Unlike rsyncs fixed block sizes, LBFSuses content-dependent fingerprints to deter-mine file block boundaries.

    3 System Architecture

    Before discussing the details of the pStorearchitecture, we present an overview of howthe system works for one possible implementa-tion. A pStore user first invokes a pStore clientwhich helps him generate keys and mark files forbackup. The user notes which files are most im-portant to him, and the client uses his choices to

    decide how often to backup the file and how manyreplicas to make.

    To insert a file, pStore computes an identifierspecific to the users file. The identifier is chosenin such a way that it will not conflict with iden-tically named files owned by other users. The fileis encrypted using symmetric encryption to pre-vent members of the pStore network from seeingits potentially private contents. The file is brokeninto digitally signed blocks, and signed metadatais assembled which indicates how the blocks canbe reassembled. The metadata and blocks areinserted into a peer-to-peer network. If the filechanges and is backed up again, only the changesto the file are stored.

    To retrieve the file, the user specifies its nameand version, or browses the pStore directory hier-archy. The metadata is retrieved, and it indicateswhere to look for blocks belonging to the desiredversion. When the file blocks are retrieved, theirsignatures are examined to ensure file integrity.

    3.1 Data Structures

    This section describes the data structures usedto manage files, directories, and versions. Thesedata structures were designed for reliability, se-

    curity, and resource efficiency.

    3.1.1 File Blocks Lists and File Blocks

    A pStore file is represented by a file block list(FBL) and several file blocks (FB). Each FB con-tains a portion of the file data, while the FBL con-tains an ordered list of all the FBs in the pStorefile. The FBL has four pieces of information foreach FB: a file block identifier used to uniquelyidentify each FB, a content hash of the unen-crypted FB, the length of the FB in bytes, and the

    FB datas offset in the original file. Figure 1(a)illustrates the relationship between a FBL and itsFBs. The figure uses the notation H(E(X)) to in-dicate the hash of the encrypted contents of fileblock X. A traditional file can be converted into apStore file by simply breaking the file into severalfixed size FBs and creating an appropriate FBL.File attributes such as permissions and date ofcreation can also be stored in the FBL.

    The FBL contains version information in theform of an additional ordered list of FBs for eachfile version. An FB list for a new version is cre-ated with an adaptation of the rsync algorithm[20]. The revised algorithm uses the FB hash,length, and offset information to make an efficientcomparison without the need to retrieve the ac-tual FBs. Our adaptation extends rsync to sup-port varying FB lengths that will arise from theaddition of new FBs as the pStore file evolves;care is taken to match the largest possible portionof the previous version. If duplicate FBs are de-

    3

  • 7/28/2019 10.1.1.12.3444

    4/12

    tected, then duplicates are not created. Instead,they are simply referenced in the FBL. This savesboth storage space and network traffic (which canbe particularly advantageous when using meteredor low bandwidth network connections). New orchanged portions of the file are broken into new

    FBs of appropriate size, referenced in the FBL,and inserted into the network. The final result,depicted in Figure 1(b), is a pStore file which canbe used to reconstruct either version with no du-plicate FBs. Another advantage of the pStoreversioning scheme is that corruption of an FBdoes not necessarily preclude the retrieval of allversions as it would if version information wereunified with the data (as in CVS). Versions thatdo not include the corrupt FB will remain intact.

    pStore also allows files to be grouped into di-

    rectories. A pStore directory is simply a text filelisting the name of each file and subdirectory con-tained in the directory. Since a directory is rep-resented as a pStore file, the same mechanism asabove can be used for directory versioning.

    File block lists and file blocks are encryptedwith symmetric keys to preserve privacy. Thisis in contrast to peer-to-peer publishing networkswhere encryption is used to aid anonymity anddeniability of content for node owners [5, 22]. Fileblocks are encrypted using convergent encryption

    [2]. Convergent encryption uses a hash of theunencrypted FB contents as the symmetric keywhen encrypting the FB. This makes block shar-ing between different users feasible since all userswith the same FB will use the same key for en-cryption. Convergent encryption makes less sensefor FBLs, since FBLs are not shared and the un-encrypted content hash would have to be storedexternal to the FBL for decryption. FBLs areinstead encrypted with a symmetric key derivedfrom the users private key. This means that allFBLs for a given user are encrypted with the

    same key, simplifying key management.

    3.1.2 Data Chunks

    For the purposes of pStore file insertion andretrieval in a distributed peer-to-peer network,FBLs and FBs are treated the same. Both canbe viewed as a data chunk, denoted C(i,p,s,d),where i is an identifier, p is public metadata, sis a digital signature, and d is the actual data.

    Any data chunk in the network can be retrievedby specifying its identifier. The public metadatais signed with the owners private key and theowners public key is included with each datachunk. This allows anyone to verify and viewpublic metadata, but only the owner can change

    the public metadata.As mentioned above, pStore uses a hash of the

    encrypted FB contents for the file block identi-fier. Thus, when the FB is retrieved, one cancompare a rehash of the data chunk to the iden-tifier to verify the data has not been tamperedwith. The public metadata for a FB chunk con-tains this identifier and is used for authenticationwhen deleting FB chunks. A FB chunk can bespecified more formally as follows:1

    i = H(H(d) salt)

    p = type i

    s = EpKA

    (H(p)) KA

    d = EsH(FB)(FB)

    CFB = C(i,p,s,d)

    The type is an indicator that this is a FB chunkas opposed to an FBL chunk. Type is includedto ensure correct sharing and deletion of chunks.The identifier salt is used for replication and isdiscussed in Section 3.2.1.

    A hash of the filename makes a poor FBLchunk identifier since it is likely multiple userswill have similarly named files creating unwantedkey collisions. Although two users have a file withthe same name, they might have very differentcontents and should be kept separate to main-tain consistency and privacy. A hash of the ac-tual FBL also makes a poor chunk identifier sinceit will change after every version and cannot beeasily recovered from the current local copy of thefile. pStore uses a namespace-filename identifierwhich is similar to Freenets signed-subspace keys

    [5]. Every pStore user has a private namespace1Notation used for cryptographic primitives:

    H(M) : one-way hash of MEtK(M) : M encrypted with key KDtK(M) : M decrypted with key KKA : public key belonging to AKA : private key corresponding to KA

    The superscript for encryption and decryption indicateswhether a symmetric scheme (t = s) or public key scheme(t = p) is used. The operator indicates concatenation.

    4

  • 7/28/2019 10.1.1.12.3444

    5/12

    into which all of that users files are inserted andretrieved, eliminating unwanted FBL chunk iden-tifier collisions between different users. A pStoreuser creates a private namespace by first creatinga private/public key pair. pStore provides theflexibility for a user to have multiple namespaces

    or for several users to share the same namespace.A namespace-filename identifier is formed by firstconcatenating the private key, pStore pathname,filename, and salt. The results are then hashed toform the actual identifier. In addition to provid-ing a unique namespace per user, this allows im-mediate location of known files without the needto traverse a directory structure. A FBL chunkcan be specified more formally as follows:

    i = H(KA path filename salt)

    p = type

    timestamp

    i

    H(d)s = Ep

    KA

    (H(p)) KA

    d = Esf(KA)(FBL)

    CFBL = C(i,p,s,d)

    where f is a deterministic function used to de-rive the common symmetric key from the usersprivate key. A timestamp is included to preventreplay attacks. The type and salt are used in asimilar manner as for FBs.

    The public key contained within the public

    metadata provides ownership information usefulwhen implementing secure chunk deletion (as de-scribed in Section 3.2.3). This ownership infor-mation may also be useful when verifying thata user is not exceeding a given quota. Unfor-tunately, attaching ownership information makesanonymity difficult in pStore. Since anonymity isnot a primary goal of pStore, we feel this is anacceptable compromise.

    The content hash in the public metadata pro-vides a mechanism for verifying the integrity of

    any chunk. Since the hash is publicly visible (butimmutable), anyone can hash the data contents ofthe chunk and match the result against the con-tent hash in the public metadata. Unlike FBs,there is no direct relationship between the FBLidentifier and the FBL contents. Thus an attackermight switch identifiers between two FBLs. Stor-ing the identifier in the FBL public metadata andthen comparing it to the requested search key pre-vents such substitution attacks.

    3.2 Using the Data Structures

    pStore relies on Chord to facilitate peer-to-peerstorage due to its attractive O(logN) guaran-tees concerning search times [7]. Unlike Freenet,Chord does not assume any specific data access

    pattern to achieve these search times, and thisintegrates nicely with pStores primarily data in-sert usage model. Since it is a low-level prim-itive, using Chord does not burden us with ex-traneous functionality ill-suited for a distributedbackup system such as absolute anonymity andfile caching at intermediate nodes.

    3.2.1 Replication

    pStore supports exact-copy chunk replicationto increase the reliability of a backup. Chunks are

    stored on several different peers, and if one peerfails then the chunk can be retrieved from any ofthe remaining peers. More sophisticated informa-tion dispersal techniques exist which decrease thetotal storage required while maintaining reliabil-ity [18]. Although pStore uses exact-copy chunkreplication to simplify the implementation, thesetechniques are certainly applicable and may beincluded in pStore in the future.

    To distribute chunk replicas randomly throughthe identifier space, salt is added when cre-

    ating the identifier. The salt is a predeter-mined sequence of numbers to simplify retrieval ofreplicas. This replication technique differs fromchain replication in which the user sends datato one node, requesting that it store a copy ofthe data and pass it along to another node untila counter has expired [8, 5]. Malicious or brokennodes in the chain can reduce the effectivenessof chain replication by refusing to pass the dataalong, potentially preventing replicas from reach-ing benign nodes. pStores replication techniqueavoids this problem by sending replicas directly

    to the target nodes.

    Many systems rely on caching frequently ac-cessed files along the retrieval path to furtherincrease data replication [5, 8, 10], but a databackup application is poorly suited to this formof caching. File insertion is much more com-mon than file retrieval for pStore and other databackup applications. Although FBLs are ac-cessed on every backup, they are never shared.

    5

  • 7/28/2019 10.1.1.12.3444

    6/12

    FB CFB A FB D

    H(E(A))Ver 1 H(E(B)) H(E(C)) ... H(E(A))Ver 1 H(E(B)) H(E(C)) ...

    H(E(B))Ver 2 H(E(D)) ...

    FBL for User A FBL for User B

    FB B

    Figure 2: Sharing File Blocks: Blocks are shared between user A and user B and are also sharedbetween two of different versions of the file for user B.

    Instead of caching along the lookup path, FBLscan be cached on the owners local machine. FBsare shared but are rarely accessed.

    3.2.2 Sharing

    Unlike file sharing peer-to-peer networks, apStore user cannot directly make use of the lo-cal storage space he contributes to the system,since it contains encrypted data from other users.To encourage fair and adequate donation of stor-age space, pStore would require a quota policywhere the amount of space available to a user forpStore backups is proportional to the amount ofpersonal storage space contributed to the system.Users then have a vested interest in making sure

    their data is stored efficiently. Techniques whichdecrease the space required to store a given file inpStore increase the effective pStore capacity forthe owner of that file. To address this issue pStorepermits sharing at the file block level to captureredundancy both within a single file and betweendifferent versions, files, and users.2 FB sharingoccurs explicitly during versioning as describedin Section 3.1.1. By exploiting the similarity be-tween most versions, this technique saves stor-age space and reduces network traffic which could

    be significant when performing frequent backups.FB sharing also occurs implicitly between dupli-cate files owned by the same or different users. 3

    2Other techniques which would further increase the ef-fective capacity for a user include data compression andinformation dispersal algorithms (as in [4, 18]).

    3We adopt the terminology presented in [2]: replicasrefer to copies generated by pStore to enhance reliability,while duplicates refers to two logically distinct files withidentical content.

    Implicit FB sharing is a result of using a hashof the encrypted contents as the file block iden-tifier. Convergent encryption ensures that iden-tical file blocks are encrypted with the same key,

    allowing multiple users to securely share the sameFBs. File sharing between users can be quitecommon when users backup an entire disk imagesince much of the disk image will contain commonoperating system and application files. A recentstudy showed that almost 50% of the used storagespace on desktop computers in Microsofts head-quarters could be reclaimed if duplicate contentwas removed [2]. Figure 2 illustrates an examplewhere two users have inserted an identical file.Notice how although user B has modified the file

    and inserted a new version, most of the FBs canstill be shared.

    FBLs are not shared since they are insertedinto the network under a private namespace.Even if two users have identical FBLs, they willhave different identifiers and thus be stored sep-arately in the network. Using a content hash ofthe FBL as the FBL identifier permits FBL shar-ing when the version histories, file attribute in-formation, and file content are identical. Thiswould be similar to the technique used to storeinodes in CFS [8]. Unfortunately, this drasticallyincreases the complexity of updating a file. Theentire virtual path must be traversed to find theFBL, and then after the FBL is modified, the vir-tual path must be traversed again to update theappropriate hash values. Even if pStore allowedFBL sharing, the number of shared FBLs wouldbe small since both the version history and thefile attributes of shared FBLs must be identical.

    6

  • 7/28/2019 10.1.1.12.3444

    7/12

    pStored

    Chord

    pStore Node

    Chord

    pStored

    pStore Node

    NetworkBackend

    Asynchronous Function Call

    Asynchronous RPC over UDP Socket

    Asynchronous RPC over Unix Socket

    pStoreClient

    pStoreLibrary

    SimulatorBackend

    pStored

    Chord

    pStore Source Node

    ...

    Figure 3: pStore Implementation

    pStore still allows sharing of directory informa-tion and actual file data through FB sharing butkeeps version information and file attributes dis-tinct for each user.

    3.2.3 Deletion

    If we assume a policy where users can only in-

    sert an amount data proportional to the amountof storage contributed, then pStore users may rea-sonably demand an explicit delete operation. Auser may want to limit the number of versionsper FBL or remove files to free space for newerand more important files.

    Explicit delete operations in peer-to-peer sys-tems are rare, since there is the potential for mis-use by malicious users. Systems such as Freenet,Chord, and Free Haven rely instead on remov-ing infrequently accessed files or using file expi-

    ration dates [5, 7, 9]. Removing infrequently ac-cessed files is an unacceptable solution for pStore,since by definition backups are rarely accesseduntil needed. Expiration dates are ill-suited to abackup system for two reasons. First, it is impos-sible for a user to renew chunks which, throughmodification, no longer exist on the users ma-chine. Second, a user may be unable to refreshhis data due to a hardware crash - and this isexactly the case when a backup is needed.

    Exceptions include Publius, which attaches anindicator to each file that only acceptable userscan duplicate to indicate file deletion, and Farsite,which uses digital signatures to authorize dele-tions. pStore also uses digital signatures in theform of the public metadata mentioned in Section3.1.2. This public metadata can be thought of asan ownership tag which authorizes an owner todelete a chunk.

    Each storage node keeps an ownership taglist (OTL) for each chunk. While FB chunks

    may have many ownership tags, each FBL chunkshould only have one ownership tag. A user canmake a request to delete a chunk by inserting adelete chunk into the network. The delete chunkhas the same identifier as the chunk to delete, buthas no data.

    When a storage node receives a delete chunk,

    it examines each ownership tag in the OTL asso-ciated with the chunk. If the public keys matchbetween one of the ownership tags and the deletechunk, then the delete is allowed to proceed. Theappropriate ownership tag is removed from theOTL, and if there are no more ownership tags inthe OTL then the chunk is removed. The OTLprovides a very general form of reference countingto prevent deleting a chunk when other users (orother files owned by the same user) still referencethe chunk.

    Notice that a malicious user can only deletea chunk if there is an ownership tag associatedwith the malicious user on the OTL. Even if amalicious user could somehow append his owner-ship tag to the OTL, the most that user coulddo is remove his own ownership tag with a deletecommand chunk. The chunk will remain in thenetwork as long as the OTL is not empty.

    4 Implementation

    We implemented pStore using both originalC++ code and third-party libraries for encryp-tion algorithms and asynchronous programming.pStore uses RSA for public key encryption [19],Rijndael (AES) for symmetric key encryption[17], and SHA-1 for cryptographic hashing [16].Figure 3 shows the components involved in thepStore implementation. pStore clients are linkedwith the pStore library to provide various userinterfaces. The current implementation provides

    7

  • 7/28/2019 10.1.1.12.3444

    8/12

    a command line interface for inserting, updating,retrieving, and deleting single files, as well as pub-lic and private key generation. In addition to theChord backend for peer-to-peer storage, a simula-tor backend was developed which stores chunks tothe local disk for debugging and evaluation pur-

    poses. pStored runs on each node in the networkand maintains a database of chunks which havebeen inserted at that node.

    The current system acts as a proof-of-conceptimplementation. As such, certain features in thepStore architecture have been omitted. pStoredirectories are not implemented and no effort ismade to handle very large files efficiently. Al-though quotas would be important in a produc-tion pStore system, this implementation does notprovide a mechanism for quota enforcement.

    5 Evaluation

    Three realistic workloads were assembled totest various aspects of pStore and evaluate howwell it meets its goals of reliability and minimalresource-usage. The first workload, softdev, mod-els a software development team that backs uptheir project nightly. At any time during the de-velopment process they must be able to revert toan older version of a file. It consists of ten nightly

    snapshots of the development of pStore and in-cludes a total of 1,457 files (13MB). The fulldiskworkload models individual users who occasion-ally back up their entire hard disk in case of emer-gency. We chose four Windows 2000 machinesand five Linux-based machines (four RedHat 7.1and one Mandrake 8.1). Due to resource con-straints we modeled each disk by collecting 5% ofits files. Files were chosen with a pseudo-randomnumber generator and the resulting workload in-cludes a total of 23,959 files (696MB). The fi-nal workload, homedir, contains 102 files (13MB)

    taken from a user home directory with severalhourly and daily snapshots. These workloads pro-vide a diverse testset with different file types, filesizes, number of files, and amounts of versioning.

    To test reliability, we brought up a network of30 nodes across five hosts and backed up bothsoftdev and homedir. Ordinarily, each pStorenode would be on a separate host, but five hostsallowed for a manageable test environment and

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 150%

    20%

    40%

    60%

    80%

    100%

    Number of Failed Nodes (30 Node Network)

    SuccessfullyRetrieve

    dFiles(PercentofTotal)

    SoftDev: r = 1SoftDev: r = 2SoftDev: r = 4HomeDir: r = 1HomeDir: r = 2HomeDir: r = 4

    Figure 4: File Availability vs. Node Failures

    the logical separation between nodes allowed for

    a fair reliability study. Nodes were killed one-by-one, and we measure how many files could beretrieved from the remaining nodes as a percent-age of the entire dataset.

    The results in Figure 4 show backup availabil-ity as a function of failed nodes. r denotes num-ber of instances of a file inserted into the network.As expected, increasing r increases the availabil-ity of the backup set. For homedir which consistsof fewer, larger files than softdev, the reliabilityis usually less for each number of failed nodes.

    Even when seven nodes (23% of 30) have leftthe network, four replicas are sufficient to bringback 95% of ones files for both workloads. Theobserved reliability of the pStore system can begeneralized as follows. Given a fraction, f, offailed nodes and a uniform assignment of chunksto nodes due to SHA-1 hashing, the probabilitythat a block is not available is fr. A file withk blocks is not retrievable if any of the k blocksare not available a probability of 1 (1 fr)k.Thus, for a given r (which can be small) and k,similar availability may be maintained as the net-

    work scales as long as f remains unchanged.

    The remaining experiments, which do not relyon the properties of pStores underlying network,were performed using the pStore simulator. Thesimulator allowed analysis to be performed priorto integration with Chord. Figures 5 and 6 showthe efficiency of pStore when working with soft-dev. The figures compare four techniques formaintaining versioned backups: local tape back-

    8

  • 7/28/2019 10.1.1.12.3444

    9/12

    ups (tape), pStore without versioning (nover),pStore with versioning (withver), and pStore us-ing CVS for versioning (cvs). The tape test wassimulated by using the cp command to copy filesto a new directory each night. The nover testdoes not use the versioning algorithm described

    in Section 3.1.1. Instead it simply inserts each ad-ditional version under a different pStore filename(creating a separate FBL for each version). Thewithver test uses the pStore versioning algorithm.To evaluate pStores versioning method againstan established versioning scheme, we used pStoreto back up CVS repository files. These files canbe retrieved and manipulated with the CVS pro-gram rather than relying on pStore versioning.In our tests, the entire CVS repository file is re-trieved and then deleted from the network. Next,

    CVS is used to produce a new version, and thenew repository file is inserted into the network.

    Figure 5 shows the network bandwidth usagefor each versioning technique when inserting eachnightly snapshot. Notice that for the first snap-shot, pStore FBLs and chunk overhead (publicmetadata) cause the pStore tests to use morebandwidth than the tape test. For later ver-sions, the withver test requires less than 10% thebandwidth of tape and nover tests since pStoreversioning avoids inserting identical inter-version

    data into the network. The cvs test uses drasti-cally more bandwidth since each new version re-quires retrieval of the entire CVS repository andconsequently all of the data associated with eachprevious version.

    The amount of bandwidth for restoring ver-sions was also analyzed. The tape, nover, andwithver tests were similar since each test mustretrieve the entire version. The cvs test againused drastically more bandwidth since all ver-sions must be retrieved at once and not just thethe desired version.

    Figure 6 shows how much storage is requiredin the network using each of the four methods.The stacked white bar indicates the additionalamount of storage required if implicit block shar-ing is turned off. Without versioning or blocksharing, pStore is nearly equivalent to the localtape backup with the exception of the storage re-quired for FBLs and chunk overhead. Since sub-sequent versions in softdev are similar or identi-

    11/21 11/22 11/23 11/24 11/25 11/26 11/27 11/28 11/29 11/300

    1

    2

    3

    4

    5

    6

    7

    8

    Date of Backup

    Bandwidth

    Usage(MB)

    Local Tape BackuppStore (No Versioning)pStore (Versioning)pStore (CVS)

    Figure 5: Bandwidth Usage - softdev Backup

    11/21 11/22 11/23 11/24 11/25 11/26 11/27 11/28 11/29 11/300

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    1213

    14

    Date of Backup

    StorageRequired(MB)

    Local Tape BackuppStore (No Versioning)

    pStore (Versioning)pStore (CVS)w/ No Block Sharing

    Figure 6: Required Storage for softdev

    cal, implicit block sharing can reduce the requiredstorage by 3-60% even without any sophisticatedversioning algorithms. The withver and cvs testsreap the benefit of block sharing as well. Sincethis is due to explicit block sharing during ver-sioning, there is no increased storage requirementwhen implicit block sharing is turned off (as illus-trated by the lack of visible white bars for these

    tests). The withver and cvs tests are able to re-duce the storage required by another 1-10% overnover (with implicit block sharing enabled). Thisis because the nover scheme has separate FBLsfor each version resulting in additional FBL over-head, and the scheme uses fixed block sizes re-gardless of similarities with previous versions re-sulting in less efficient inter-version sharing.

    Clearly, the rsync algorithm on which pStore is

    9

  • 7/28/2019 10.1.1.12.3444

    10/12

    512 1024 2048 4096 81920

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    12

    Block Size (Bytes)

    StorageRequired(MB)

    Actual File DataFBL DataChunk Overhead

    Figure 7: Storage vs. Block Size:

    based will find more overlap between file versions

    when it is comparing smaller blocks. Thus, de-creasing block size appears to be a simple way todecrease the required storage of pStore version-ing. In practice, while the storage required forthe actual data decreases due to more effectiveversioning, the size of each FBL and the amountof public metadata increases due to the greaternumber of blocks. This increase in overhead out-weighs the modest decrease in the actual storeddata (see Figure 7). The 11/30 softdev snapshotis shown; other days exhibit similar trends.

    It is reasonable to expect common blocks be-tween users running the same operating system.To test this, the amount of block sharing wasmeasured while inserting the fulldisk workloadinto pStore. Unique public/private key pairswhere used for each user to test convergent en-cryption. The Linux datasets saved 3.6% ofthe total inserted bytes through block sharing,while the Windows datasets saved 6.9%. Whenboth sets were inserted simultaneously, 5.3% wassaved. This number is much lower than the 50%found in the Farsite study [2], probably owing tothe homogenous nature of the Farsite workloadcompared to the heterogenous and sparse natureof the fulldisk workload.

    6 Future Work and Conclusion

    Our research suggests some interesting direc-tions for future work on secure peer-to-peerbackup systems. Recall that although the nover

    scheme does not use the pStore rsync style ofversioning, the nover scheme can provide simi-lar functionality through its use of logical direc-tories. If a user wants to view files from a cer-tain date, the user simply retrieves files from thelogical directory corresponding to that date. Fig-

    ures 5 and 6 show that the nover scheme hasreasonable storage and bandwidth demands, butas mentioned in Section 5, the nover scheme failsto efficiently exploit inter-version sharing. Thisis because each version is broken into fixed sizeblocks irrespective of any similarities to previousversions. Blocks will only be shared across ver-sions if the same block occurs in each version atthe same offset. The FBLs are specifically de-signed to help divide a new version into file blocksmore efficiently, especially when there are similar

    blocks between versions but at different offsets(a very common scenario when a user is insert-ing data into the middle of a new version). Thenover scheme, however, cannot keep previous ver-sion information in the FBLs since each versionessentally creates its own separate FBL.

    The above observations suggest that a modifiednover scheme could make an interesting alterna-tive to the pStore system. The nover schemewould be modified to use LBFS style version-ing to efficiently exploit inter-version similaries.

    LBFS would be used to divide a new version ofa file into FBs, and since LBFS works on fin-gerprints within the file data, portions of thefile which match an older file will be implicitlyshared (even if the similar data is at a differentoffset in the the new version). The FBLs in thisnew scheme simply contain a list of the hashes ofthe FBs which make up that file. FBLs could useconvergent encryption enabling securing sharingof FBLs between users. To conserve bandwidth,the new scheme could make use of a new peekprimitive which checks to see if a chunk with agiven content hash exists in the peer-to-peer net-work. If a duplicate content hash exists, thenthere is no need to insert a new chunk with thesame hash into the network (although sending asigned command may be necessary to correctlyupdate the ownership tag list for that chunk).The FBLs in this new scheme resemble SFSROinodes, and similar bandwidth gains based on theresulting hash tree are possible (the hash for a di-

    10

  • 7/28/2019 10.1.1.12.3444

    11/12

    rectory FBL represents not only the children ofthat directory but the entire subtree beneath thatdirectory). This new scheme would be simpler toimplement than pStore, and therefore it wouldbe interesting to compare it to pStore in terms ofbandwidth and storage requirements.4

    While experimenting with pStore, we discov-ered that digitally signing each chunk adds signif-icant performance, bandwidth, and storage over-heads. We observed several FBs for which the FBmetadata was larger than the actual FB data.This is largely a result of including the userspublic key with each chunk. Even so, we feelthat the cost of digital signatures is outweighedby two significant benefits. First, digital sig-natures are the key to secure deletion and asmentioned in Section 3.2.3, traditional deletion

    techniques are ill-suited to backup usage models.Second, digital signatures could provide a usefultool in implementing an effective quota manage-ment infrastructure, since each signature effec-tively tags data with ownership. In cooperativetrusted peer-to-peer networks, it may be suitableto assume that users will not exceed suggestedquotas. In such an environment, secure deletionmight be replaced by very long expiration dates.This would eliminate the need to include a publickey with every chunk and lead to better perfor-

    mance due to less processing, reduced bandwidth,and smaller disk space overhead.

    pStore uses chunk replication, various crypto-graphic techniques, and a revised versioning algo-rithm to achieve its three primary design goals ofreliability, security, and resource efficiency. Wehave described a proof-of-concept implementa-tion of the pStore architecture which provides acommand line interface for file insertion, update,retrieval, and delete in a network of untrustedpeers. We show that a small number of repli-cas is sufficient to provide adequate reliabilityfor a modest network size and that pStore canprovide significant bandwidth and storage sav-ings. These savings are useful in the context ofa quota system where users are concerned abouttheir effective capacity in the network and alsoin situations where users have disk or network

    4These ideas matured through various discussions withProfessor Robert Morris at the MIT Laboratory for Com-puter Science.

    bandwidth constraints. pStore is a novel systemwhich brings together developments in peer-to-peer storage with those in the domain of databackup and versioning.

    References[1] R. Anderson. The eternity service. In Proceedings

    of the 1st International Conference on the Theory

    and Applications of Cryptology, 1996.

    [2] W. J. Bolosky, J. R. Douceur, D. Ely, andM. Theimer. Feasibility of a serverless distributedfile system deployed on an existing set of desktopPCs. In Measurement and Modeling of ComputerSystems, pages 3443, 2000.

    [3] K. Brown, J. Katcher, R. Walters, and A. Wat-son. SnapMirror and SnapRestore: Advances in

    snapshot technology. Technical report, TR3043,Network Applicance, 2001.

    [4] Y. Chen, J. Edler, A. Goldberg, A. Gottlieb,S. Sobti, and P. Yianilos. A prototype implemen-tation of archival intermemory. In Proceedingsof the Fourth ACM International Conference on

    Digital Libraries, 1999.

    [5] I. Clarke, O. Sandberg, B. Wiley, and T. W.Hong. Freenet: A distributed anonymous in-formation storage and retrieval system. In Pro-ceedings of the Workshop on Design Issues in

    Anonymity and Unobservability, pages 4666,Berkeley, CA, Jul 2000. International ComputerScience Institute.

    [6] Concurrent versions system.http://www.cvshome.org.

    [7] F. Dabek, E. Brunskill, M. F. Kaashoek,D. Karger, R. Morris, I. Stoica, and H. Balakrish-nan. Building peer-to-peer systems with Chord,a distrubuted location service. In Proceedings ofthe 8th IEEE Workshop on Hot Topics in Oper-

    ating Systems, pages 7176, 2001.

    [8] F. Dabek, M. F. Kaashoek, D. Karger, R. Morris,and I. Stoica. Wide-area cooperative storage withCFS. In Proceedings of the 18th ACM Symposiumon Operating Systems Principles, October 2001.

    [9] R. Dingledine, M. Freedman, and D. Molnar. Thefree haven project: Distributed anonymous stor-age service. In Proceedings of the Workshop onDesign Issues in Anonymity and Unobservabil-

    ity, pages 6795, Berkeley, CA, Jul 2000. Inter-national Computer Science Institute.

    11

  • 7/28/2019 10.1.1.12.3444

    12/12

    [10] P. Druschel and A. Rowstron. Past: A large-scale, persistent peer-to-peer storage utility. InProceedings of the 8th IEEE Workshop on Hot

    Topics in Operating Systems, May 2001.

    [11] K. Fu, M. F. Kaashoek, and D. Mazieres. Fastand secure distributed read-only file system. In

    Proceedings of the 4th USENIX Symposium onOperating Systems Design and Implementation,pages 181196, Oct 2000.

    [12] The gnutella protocol specification v0.4. Dis-tributed Search Services, Sep 2000.

    [13] Mojo nation. http://www.mojonation.net.

    [14] A. Muthitacharoen, B. Chen, and D. Mazieres. Alow-bandwidth network file system. In Proceed-ings of the 18th ACM Symposium on Operating

    Systems Principles, pages 174187, Oct 2001.

    [15] Napster. http://www.napster.com.

    [16] National Institute of Standards and Technology.Secure hash standard. Technical Report NISTFIPS PUB 180, U.S. Department of Commerce,May 1993.

    [17] National Institute of Standards and Technology.Advanced encryption standard (AES). TechnicalReport NIST FIPS PUB 197, U.S. Departmentof Commerce, Nov 2001.

    [18] M. O. Rabin. Efficient dispersal of informationfor security, load balancing, and fault tolerance.Jounral of the ACM, 36(2):335348, Apr 1989.

    [19] R. Rivest, A. Shamir, and L. Adelman. A methodfor obtaining digital signatures and public keycryptosystems. Communications of the ACM,21(2):120126, Feb 1978.

    [20] A. Tridgell and P. Macherras. The rsync al-gorithm. Technical report, TR-CS-96-05, Aus-tralian National University, Jun 1996.

    [21] M. Waldman and D. Mazieres. Tangler: A cen-sorship resistant publishing system based on doc-ument entanglements. In 8th ACM Conferenceon Computer and Communcation Security, Nov2001.

    [22] M. Waldman, A. Rubin, and L. Cranor. Publius:A robust, tamper-evident, censorship-resistant,web publishing system. In Proceedings of the 9thUSENIX Security Symposium, Aug 2000.

    12