© copyright 2009-2013, cambridge computer services, inc. – all rights reserved – 781-250-3000...
TRANSCRIPT
![Page 1: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/1.jpg)
© Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reservedwww.CambridgeComputer.com – 781-250-3000
End to End Life Cycle Management for Research
Data
Capturing Metadata Throughout the Research Pipeline and Facilitating the Handoff to Formal Curation
Jacob Farmer, CTOCambridge Computer
![Page 2: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/2.jpg)
2
A Little Background On Cambridge Computer
![Page 3: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/3.jpg)
3
A Little Background On Cambridge Computer
Founded in 1991 as a boutique integrator for backup and archive solutions
Approximately 75 employees nationwide
Clients of all shapes and sizes across all industries• Particularly strong in research and higher ed
Industry-wide reputation for:• defining best practices for enterprise class data
protection, and
• for the early adoption of next generation storage solutions
A unique business model that allows us to straddle the fence between academia and industry
![Page 4: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/4.jpg)
4End to End Life Cycle Management for Research PASIG - 2013
Seminars and Workshops Through The Usenix Association
Tiered Storage and Archiving: Best Practices for Data Life Cycle Management and Digital Preservation
Cornell, Dartmouth, Duke, Harvard, Penn
LISA Data Storage Day• Storage Virtualization • Application Acceleration with Solid State • A Crash Course in Object Storage
LISA Conference, Broad Institute, Georgia State, University Maryland, Davenport, Princeton
![Page 5: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/5.jpg)
5
Our Product: Starfish
![Page 6: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/6.jpg)
6End to End Life Cycle Management for Research PASIG - 2013
Our Project – Defining Best Practices for File Management
Inspiration for our project comes from SRB/IRODS• Bring parts of the SRB/IRODS vision to reality
– Define a general purpose feature set
– Intuitive user interface
– Simplified API
Inspiration also comes from numerous home grown solutions in our client base. The paradigm: • Stat() your file systems • Make database records for each file and/or directory • Relate metadata to the file and directory records • Report and/or take action
![Page 7: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/7.jpg)
7End to End Life Cycle Management for Research PASIG - 2013
Starfish - *FS
Virtual Global File System • It’s not really a file system, but it
looks like one and serves as a hierarchical catalog of files
Like a file system • CIFS and POSIX permissions • File system attributes and
extended attributes
But more • User specified metadata • Persistent addresses • Versioning • Point in time collections
![Page 8: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/8.jpg)
8End to End Life Cycle Management for Research PASIG - 2013
Basic Starfish Topology
![Page 9: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/9.jpg)
9End to End Life Cycle Management for Research PASIG - 2013
Targetted Use Cases
1) Data life cycle management for unstructured data at very large scale
• Scientific research data • Media / entertainment workflows • Engineering data
2) Storage middleware for digital asset management systems at very large scale
• Fixity automation • Backup restore • Tiered storage • Persistent file addresses / links• Cloud interface
![Page 10: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/10.jpg)
10End to End Life Cycle Management for Research PASIG - 2013
Typical Content Management “Stack”
![Page 11: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/11.jpg)
11End to End Life Cycle Management for Research PASIG - 2013
Inserting File System Middleware
![Page 12: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/12.jpg)
12End to End Life Cycle Management for Research PASIG - 2013
Simple Storage Workflow While Mirroring File Systems to Object Store
![Page 13: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/13.jpg)
13End to End Life Cycle Management for Research PASIG - 2013
Metadata is the Great Enabler
Collaboration • How else would researchers know what to do with one
another’s data? • How can data be organized to meet different groups’ needs?
Storage management policies • How does a storage management system know what to do with
your files? File system attributes are not descriptive enough.
Preservation / retrieval / provenance• How do you know what to keep? • How do you find it again? • How do you know what it was used for and when?
Reporting / chargeback • File system permissions are not descriptive enough.
![Page 14: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/14.jpg)
14End to End Life Cycle Management for Research PASIG - 2013
What Would a Metadata System for Research Data Look Like?
Very flexible Allows scientists to work the way they want to work Out of the data path • The system cannot introduce latency to file I/O
Enormous scale • Billions of files, Petabytes of capacity, 1000s of file
systems
Device / vendor independence • Must work with all storage devices, object stores,
clouds, etc.
API driven
![Page 15: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/15.jpg)
15End to End Life Cycle Management for Research PASIG - 2013
The Real Trick – Getting the Metadata
The Golden Rule of Data Preservation – “Preserve at the time of creation”• Translation: Capture metadata throughout the research
pipeline
Perhaps capture metadata when storage is provisioned• The presumes that there is a structured process for
provisioning storage
Capture metadata through an API • This requires a simple API that anyone can use
Programmatically extract metadata from file headers, tags, and content Capture metadata through a GUI • Try to create incentives for users to key in metadata
![Page 16: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/16.jpg)
16
Getting from Here to There
![Page 17: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/17.jpg)
17End to End Life Cycle Management for Research PASIG - 2013
Problem Statements for Research Data Management
Scientists don’t want to enter metadata No one wants to pay for long term storage Data management planning disconnect between grant applicants and their institutions There are more pressing problems related to storing data • Collaboration • Cost control: Chargeback, Showback, Tiering • Backup
Organizational gridlock • Conflicting priorities • Unspecific mandates
![Page 18: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/18.jpg)
18End to End Life Cycle Management for Research PASIG - 2013
Yes, We Too Have a Triangle!
![Page 19: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/19.jpg)
19End to End Life Cycle Management for Research PASIG - 2013
Where it Starts: Scalable and Flexible Backup/Archive
Backup Clients Disk-BasedObject Storage
Tape Archive
NAS
NAS orFile Server
CloudService
![Page 20: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/20.jpg)
20
How To Play
![Page 21: © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved – 781-250-3000 End to End Life Cycle Management](https://reader035.vdocuments.pub/reader035/viewer/2022062717/56649e585503460f94b50d92/html5/thumbnails/21.jpg)
21End to End Life Cycle Management for Research PASIG - 2013
Looking for Collaborators
The ideal collaborator:• Has an immediate need that is within our current
feature set and scale – This tells us that you can/will invest time with us
• Has additional needs that will put us to test • Is an existing client of Cambridge Computer, or
– Is willing to become one, or
– Is able to contribute some funds
– Is able to make a meaningful investment in time
If not now, maybe next year! • Email me: [email protected]