met soc15 roccaserra-biocrates-datasharing
TRANSCRIPT
Streamlining Datasets Deposition to Public Repository:
Biocrates/EMBL Metabolights case study
Philippe Rocca-Serra, Ph.DUniversity of Oxford e-Research Centre
Background
• Need for disclosing data supporting publication works
• MSI CIMR annotation guidelines• Data deposition mandated by Funders• Creating fast, efficient deposition
pipeline– need to engage with various stakeholders
Targeted Metabolomics
• monitoring known sets of molecules = target– either belonging to a same network or
representative of sets of pathways and used as beacon for biological processes.
• Application to high-throughput phenotyping– plasma– urine
• Large cohorts eligible for the approach– n>10000
The players (acknowledgement)
• Contact with ISA team in 2014, via Dr Marta Cascante and Dr Silvia Marin.
• Biocrates: Introduction to Dr Bernd Haas and Martin Buratti
• EMBL-EBI Metabolights: Reza Salek, Ken Haug
• ISA Team at University of Oxford:Alfie Adbul-Rahman
The components
• Biocrates MedIQ software, Boron release– XML schema (biocrates xsd)– declaration and description of key entities
• metabolite• samples• plate/well/injection runs
– quantitative measurements of metabolites– NVT based description of Samples attributes
• XSL Transformation / Java processing– conversion of study metadata to ISA-Tab format– conversion of metabolite quantitation to MAF
format– ontology and CV mark-up
A streamlined deposition process
1. Export XML document from MetIDQ
2. Aspera Protocol upload to EMBL-EBI
( XML document & raw MS data)
3. Conversion and curation at EMBL-EBI
4. Minting of an EMBL Metabolights Accession Number
Pipeline Validation
• 3 Datasets currently being handled at EBI– 147 samples (mouse, plasma)– 7000 samples (human, plasma)– 20000 samples (human, plasma & urine)
• Huge potential for meta-analysis and dataset integration
• Lost opportunities of ‘invisible’ datasets
Recovering/Rescuing Data supporting PLOS One publications
• http://journals.plos.org/plosone/s/data-availability
Take Home Message
• Data Custodians and Suppliers can work together efficiently to avoid data loss.
• Data sets (large and small) deposition does not have to be tasking.
• Engaging with platform vendors is highly beneficial to the community.– We need more of these interactions– Vendors, Help your customer publish and share!
• ISA-Tab ensure visibility of your datasets– get an EMBL Metabolights Accession Number– get a DOI submitting your data article to NPG Scientific
Data