met soc15 roccaserra-biocrates-datasharing

12
Streamlining Datasets Deposition to Public Repository: Biocrates/EMBL Metabolights case study Philippe Rocca-Serra, Ph.D University of Oxford e-Research Centre [email protected]

Upload: philippe-rocca-serra

Post on 06-Aug-2015

338 views

Category:

Science


2 download

TRANSCRIPT

Streamlining Datasets Deposition to Public Repository:

Biocrates/EMBL Metabolights case study

Philippe Rocca-Serra, Ph.DUniversity of Oxford e-Research Centre

[email protected]

Background

• Need for disclosing data supporting publication works

• MSI CIMR annotation guidelines• Data deposition mandated by Funders• Creating fast, efficient deposition

pipeline– need to engage with various stakeholders

Targeted Metabolomics

• monitoring known sets of molecules = target– either belonging to a same network or

representative of sets of pathways and used as beacon for biological processes.

• Application to high-throughput phenotyping– plasma– urine

• Large cohorts eligible for the approach– n>10000

The players (acknowledgement)

• Contact with ISA team in 2014, via Dr Marta Cascante and Dr Silvia Marin.

• Biocrates: Introduction to Dr Bernd Haas and Martin Buratti

• EMBL-EBI Metabolights: Reza Salek, Ken Haug

• ISA Team at University of Oxford:Alfie Adbul-Rahman

The components

• Biocrates MedIQ software, Boron release– XML schema (biocrates xsd)– declaration and description of key entities

• metabolite• samples• plate/well/injection runs

– quantitative measurements of metabolites– NVT based description of Samples attributes

• XSL Transformation / Java processing– conversion of study metadata to ISA-Tab format– conversion of metabolite quantitation to MAF

format– ontology and CV mark-up

A streamlined deposition process

1. Export XML document from MetIDQ

2. Aspera Protocol upload to EMBL-EBI

( XML document & raw MS data)

3. Conversion and curation at EMBL-EBI

4. Minting of an EMBL Metabolights Accession Number

Pipeline Validation

• 3 Datasets currently being handled at EBI– 147 samples (mouse, plasma)– 7000 samples (human, plasma)– 20000 samples (human, plasma & urine)

• Huge potential for meta-analysis and dataset integration

• Lost opportunities of ‘invisible’ datasets

Recovering/Rescuing Data supporting PLOS One publications

• http://journals.plos.org/plosone/s/data-availability

Recovering/Rescuing Data supporting PLOS One publications

Time bracket: 2011-2015

Take Home Message

• Data Custodians and Suppliers can work together efficiently to avoid data loss.

• Data sets (large and small) deposition does not have to be tasking.

• Engaging with platform vendors is highly beneficial to the community.– We need more of these interactions– Vendors, Help your customer publish and share!

• ISA-Tab ensure visibility of your datasets– get an EMBL Metabolights Accession Number– get a DOI submitting your data article to NPG Scientific

Data

A new category of publication that provides detailed

descriptors of scientifically valuable datasets,

associated or not to traditional

article(s)