scott edmunds & rob davidson's talk at the metabolomics society 2014 meeting on beyond dead...

27
Beyond Dead Trees: data & workflow publishing with Scott Edmunds Rob Davidson

Upload: gigascience-bgi-hong-kong

Post on 28-Jan-2015

106 views

Category:

Science


3 download

DESCRIPTION

Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience, Tsuruoka 23rd June 2014

TRANSCRIPT

Page 1: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

Beyond Dead Trees: data & workflow publishing with

Scott EdmundsRob Davidson

Page 2: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

The problems with publishing

• Scholarly articles are merely advertisement of scholarship . The actual scholarly artefacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible --- Jon B. Buckheit and David L. Donoho, WaveLab and reproducible research, 1995

• Lack of transparency, lack of credit for anything other than “regular” dead tree publication.

• Traditional publishing models, policies and practices holding things back

Page 3: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

Why is this important?

…to publish protocols BEFORE analysis…better access to supporting data/code…more transparent & accountable review

…to publish replication studies

Need:

Page 4: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

Consequences: increasing number of retractions>15X increase in last decade

1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 142. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950

At current % > by 2045 as many papers published as retracted

Page 5: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

• Data• Software• Review• Re-use…

= Credit

}

Credit where credit is overdue:“One option would be to provide researchers who release data to public repositories with a means of accreditation.”“An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. “Nature Biotechnology 27, 579 (2009)

New incentives/credit

Page 6: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

GigaSolution: deconstructing the paper

www.gigadb.orgwww.gigasciencejournal.com

Utilizes big-data infrastructure and expertise from:

Combines and integrates:Open-access journal

Data Publishing Platform

Data Analysis Platform

Page 7: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

On top of regular papers…

Page 8: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

Rewarding open data

http://gigadb.org/

Page 9: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

• Multi Omics focus (not just genomics)• 10-100x faster download than FTP• Provide (ISA) curation & integration with other DBs

(e.g. MetaboLights, SRA, etc.)

For more see: http://database.oxfordjournals.org/content/2014/bau018.abstract

Page 10: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

IRRI GALAXY

Democratization through data publishing

Page 11: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

IRRI GALAXYRice 3K project: 3,000 rice genomes, 13.4TB public data

Democratization through data publishing

Page 12: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

Two tools for reproducible research

Rob Davidson

RO:and

Page 13: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

GigaSolution: deconstructing the paper

www.gigadb.orgwww.gigasciencejournal.com

Utilizes big-data infrastructure and expertise from:

Combines and integrates:Open-access journal

Data Publishing Platform

Data Analysis Platform

Page 14: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

Visualizations & DOIs for workflows

galaxy.cbiit.cuhk.edu.hk

Page 15: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

Implement workflows in a community-accepted format

http://galaxyproject.org

Over 36,000 main Galaxy server users

Over 1,000 papersciting Galaxy use

Over 55 Galaxyservers deployed

Open source

Rewarding and aiding reproducibility

Page 16: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

Copyright NBAF-B 2013Tool list

Tool parameterisation Results panel

Rewarding and aiding reproducibilityImplement workflows in a community-accepted format

Page 17: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

Birmingham Metabo-Galaxy Workflow

Page 18: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

Birmingham Metabo-Galaxy

Tools wrapped in Python and XMLUser sees web form (easy!)Data stored centrally (secure!)Work done centrally (easy update)

Page 19: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

First RAW -> stats Galaxy Pipe

Page 20: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

SOAPdenovo2 S. aureus pipeline

Page 21: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

NO

Handling of imaging (phenotype) dataCyber-centipedes & virtual worms

Page 22: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

Aiding reproducibility

OMERO: providing access to imaging data

View, filter, measure raw images with direct links from journal article.

See all image data, not just cherry picked examples.

Download and reprocess.

Page 23: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

JCB: Aiding reproducibility, adding value

Page 24: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

The alternative...

...look but don't touch

Page 25: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

In Summary

• Reproducibility is important!!– Currently not very common!

• Many tools appearing for data publishing and sharing (images, tools, workflows).

• Data publishing → more publications, more citations, more impact!

• Are you convinced? • What barriers? Code standards? Data

standards? Too much work?

Page 26: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

Give us data, papers & pipelines*

Help us make it happen!

[email protected]@gigasciencejournal.com [email protected] [email protected]

Contact us:

* APC’s currently generously covered by BGI until 2015

www.gigasciencejournal.com

Page 27: Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

Ruibang Luo (BGI/HKU)Shaoguang Liang (BGI-SZ)Tin-Lap Lee (CUHK)Qiong Luo (HKUST)Senghong Wang (HKUST)Yan Zhou (HKUST)

Thanks to:

@gigasciencefacebook.com/GigaScienceblogs.biomedcentral.com/gigablog/

Peter LiHuayan Gao Chris HunterJesse Si ZheNicole NogoyLaurie GoodmanAmye Kenall (BMC)

Marco Roos (LUMC)Mark Thompson (LUMC)Jun Zhao (Lancaster)Susanna Sansone (Oxford)Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran (Oxford)

www.gigadb.orggalaxy.cbiit.cuhk.edu.hk

www.gigasciencejournal.com

CBIITFunding from:

Our collaborators:team: