admiral progress summary graham klyne image bioinformatics research group zoology department, oxford...
Post on 22-Dec-2015
213 Views
Preview:
TRANSCRIPT
ADMIRAL progress summary
Graham KlyneImage Bioinformatics Research Group
Zoology Department, Oxford University
ADMIRAL Project Meeting20 May 2010
Institution and subject context ADMIRAL targets small life science research
groups (3-6 people) in a prestigious research department, each with world-class leadership
Research topics are very diverse: Silk: properties and genetic factors Animal behaviour: learning and decision making Evolutionary development: evolution of genes Elephant conservation in Africa
Seasonal field and laboratory data collection, interspersed with analysis and interpretation
Diverse data, including spreadsheets, images, videos and genetic sequences
Our approach to research users
Our researchers tend to be very busy, under pressure to publish high-impact papers e.g. in Nature
They are often away conducting field studies, using external facilities or at conferences
To gain their attention, we must offer something easily perceived to directly support their aims
We've tried to focus on “pain points”, providing solutions where they can already recognize problems or foresee needs
Sheer curation“curation by addition” *
We'll take what they've got, then improve it incrementally through various tools and techniques
Start with raw data from a shared file system, with automatic backups
Add tools to support annotation, packaging and data-repository submission Where possible, new tools should add immediate value
* “curation by addition” due to Ben O'Steen: http://oxfordrepo.blogspot.com/2008/10/modelling-and-storing-phonetics.html
Project structure recap
Data usage surveys to test requirements and assess improvements in data management
Phase 1: create a minimal front-to-back framework for dataset and metadata acquisition and repository submission Actually used by researchers Acquisition via file sharing system with parallel web access Annotation using Shuffl, creating RDF in the file system Repository submission by file transfer
Phase 2: selected incremental improvements, guided by feedback from researchers
Progress to date
Surveys from 3 of 4 research groups, initial analyses Elephant conservation group field station lost in flash floods
Access-controlled shared file area with automatic backup, accessible locally and via the web in use by the Silk Group Focusing on most engaged group, others to follow (soon!) 1-2 months slower progress than anticipated – more later
Started adapting Shuffl to create annotations for repository submission
Discussing submission details with OULS
Critical path:Test repository environment
Test repository submission mechanisms Elicit researcher feedback on repository
submission process Demonstrate and gather feedback on repository
access Elicit metadata requirements in “front-to-back”
context Leading to deployment of live repository
environment to complete project phase 1
Data use surveys:data management concerns
Data loss Automatic backups
Controlled data sharing Most want easy sharing within their group Recognizing the value of data re-use, but having many valid
reasons for resisting openness
Accessing and interpreting historical data Capturing sufficient metadata to allow colleagues and
collaborators to find and understand data sets Locating and retrieving data
Some interest in funder mandates, versioning, visualization, annotation, long-term preservation
Balancing user engagement with usable outputs >>
In the style of agile development, we are aiming to engage users through working software, rather than just surveys and recommendations
Start simple, and be led by researchers' needs A tension here between allocating effort to user
engagement vs technical development more later
Survey effort has been lead by David Shotton, himself a life-science researcher, who also serves on occasion as a proxy user
Technical approach:local files and web access >>
The foundation of our technical approach has been to create an access-controlled file system accessible using common file sharing mechanisms and also using Web protocols Linux, Samba, Apache, HTTP/WebDAV, LDAP All off-the-shelf open source software
We have strived to make the access controls work uniformly for local and web access
Early attempts to connect with University SSO have been postponed
Further tooling builds on web access
Technical approach:production-quality outputs >>
This is not a toy system: we are accepting custody of real users' valuable research data
Automated, repeatable system configuration, automated testing and live system monitoring are all part of the development effort
Expandable virtualization platform specified in consultation with departmental IT support
Automated daily backup to University-run hierarchical storage manager
Reflection: creating a user-led data management environment
Less is more (work!) Creating an initial system whose visible function is as close
as possible to users' current practices is arguably harder than creating new functions
Uniform controlled access via file sharing and the web has been particularly challenging
Despite modest goals, the first project phase has presented awkward technical challenges Progress review at
http://admiral-announce.blogspot.com/2010/04/sprint-6-and-progress-to-date-review.html
We have made a platform for web-based features to support evolving requirements
top related