software development by the genomics standards consortium

26
1 Bringing Standards to Life: Software Development by the Genomics Standards Consortium Renzo Kottmann Microbial Genomics Group Max Planck Institute for Marine Microbiology M 3 SIG Stockholm July 2009

Upload: renzo-kottmann

Post on 28-Jan-2015

110 views

Category:

Technology


0 download

DESCRIPTION

Presentation held at the M3 SIG meeting at the ISMB in Stockholm 2009. Purpose to show the audience the software development activities of the Genomics standards Consortium. See also http://gensc.org

TRANSCRIPT

Page 1: Software Development by the Genomics  Standards Consortium

1

Bringing Standards to Life:

Software Development by theGenomics

Standards Consortium

Renzo Kottmann Microbial Genomics Group

Max Planck Institute for Marine Microbiology

M3 SIG Stockholm July 2009

Page 2: Software Development by the Genomics  Standards Consortium

2

Genomic Standards Consortium (GSC)

Goal

• Promote mechanisms that standardize the description of genomes

exchange and integrate genomic data

Open-membership, international working body

• Established in Sept 2005

• Participants include DDBJ, EMBL, GenBank, Sanger, JCVI, JGI, EBI and a range of US, UK and EU research institutions

• Organized a series of workshops

2http://gensc.org and http://gensc.org/gc_wiki/index.php/GSC_Membership

Page 3: Software Development by the Genomics  Standards Consortium

3

Minimum Information about a Genome Sequence(MIGS) Specification

MIGS extends what DDBJ/EMBL/GenBank request upon submission of a genome sequence

• Examples:

Description of geographic location of a sample and habitat

“Minimum Information about a Metagenomic Sequence” (MIMS)

– Temperature

– pH

Description of sequence generation– Sequencing method

– Assembly method

Field et al. Nat Biotechnol. 2008 3

Page 4: Software Development by the Genomics  Standards Consortium

4Field et al. Nat Biotechnol. 2008

MIGS Checklist 2.0

4

Page 5: Software Development by the Genomics  Standards Consortium

5

MIGS Checklist 2.0

Field et al. Nat Biotechnol. 2008

M = mandatory

5

Page 6: Software Development by the Genomics  Standards Consortium

6

Software Development for MIGS/MIMS

Mechanisms for achieving compliance are needed:

• Such mechanisms involve an appropriate reporting

structure for capturing and exchanging data,

software,

databases

and controlled vocabularies and/or ontologies for defining the terms used in the annotations.

Field et al. Nat Biotechnol. 2008

Page 7: Software Development by the Genomics  Standards Consortium

7

Software Development for MIGS/MIMS

Mechanisms for achieving compliance are needed:

• Such mechanisms involve an appropriate reporting

structure for capturing and exchanging data,

software,

databases

and controlled vocabularies and/or ontologies for defining the terms used in the annotations.

Supporting Projects:

• Habitat-Lite (Ontology specification)

Field et al. Nat Biotechnol. 2008

Page 8: Software Development by the Genomics  Standards Consortium

8

Software Development for MIGS/MIMS

Mechanisms for achieving compliance are needed:

• Such mechanisms involve an appropriate reporting

structure for capturing and exchanging data,

software,

databases

and controlled vocabularies and/or ontologies for defining the terms used in the annotations.

Supporting Projects:

• Habitat-Lite (Ontology specification)

• Genomic Rosetta Stone (Identifier Mapping)

Field et al. Nat Biotechnol. 2008

Page 9: Software Development by the Genomics  Standards Consortium

9

Software Development for MIGS/MIMS

Mechanisms for achieving compliance are needed:

• Such mechanisms involve an appropriate reporting

structure for capturing and exchanging data,

software,

databases

and controlled vocabularies and/or ontologies for defining the terms used in the annotations.

Supporting Projects:

• Habitat-Lite (Ontology specification)

• Genomic Rosetta Stone (Identifier Mapping)

• GCDML (MIGS/MIMS specification in XML)

Field et al. Nat Biotechnol. 2008

Page 10: Software Development by the Genomics  Standards Consortium

10

Software Development for MIGS/MIMS

Mechanisms for achieving compliance are needed:

• Such mechanisms involve an appropriate reporting

structure for capturing and exchanging data,

software,

databases

and controlled vocabularies and/or ontologies for defining the terms used in the annotations.

Supporting Projects:

• Habitat-Lite (Ontology specification)

• Genomic Rosetta Stone (Identifier Mapping)

• GCDML (MIGS/MIMS specification in XML)

• Genomes Catalogue (Database and Web Server)

Field et al. Nat Biotechnol. 2008

Page 11: Software Development by the Genomics  Standards Consortium

11

Habitat-Lite (= EnvO-Lite)

Easy-to-use (small) set of terms

• Captures high-level information about habitat

• Derived from the Environment Ontology (EnvO).

Meet the needs of multiple users

• Annotators, database providers, biologists, and bioinformaticians alike who need to search and employ such data in comparative analyses.

11

Aquatic Aquatic: Freshwater Acquatic: Marine Terrestrial Air Fossil Food Organism-Associated Extreme Habitat Other

Hirschman et al. OMICS. 2008

Page 12: Software Development by the Genomics  Standards Consortium

12

Habitat-Lite

1. Level 2. Level

Aquatic

Aquatic: Freshwater

Aquatic: Marine

Terrestrial

Air

Fossil

Food

Organism-Associated

Extreme Habitat

Other

soil

sediment

sludge

waste water

hot spring

hydrothermal vent

biofilm

microbial mat

12

< 20 terms

Hirschman et al. OMICS. 2008

Page 13: Software Development by the Genomics  Standards Consortium

13

Habitat-Lite applied

13http://www.megx.net/genomes

Page 14: Software Development by the Genomics  Standards Consortium

14

Genomic Rosetta Stone (GRS)

14

Create a unified mapping between different genomic

resources

Improve navigation across these resources

Enable the integration of this information in the near

future.

Van Brabant et al. OMICS. 2008

Page 15: Software Development by the Genomics  Standards Consortium

15

Genomic Rosetta Stone (GRS)

15Van Brabant et al. OMICS. 2008

Page 16: Software Development by the Genomics  Standards Consortium

16

Genomic Rosetta Stone (GRS)

Enable the integration of this information in the near

future

16Van Brabant et al. OMICS. 2008

Page 17: Software Development by the Genomics  Standards Consortium

17

Genomic Contextual DataMarkup Language (GCDML)

An Extensible Markup Language (XML)

Aim

• Implement MIGS/MIMS

• Provide even more descriptors

• Facilitate exchange and integration of genomic data

Kottmann et al. OMICS. 2008 17

Page 18: Software Development by the Genomics  Standards Consortium

18

GCDML Example (excerpt)

<gcdml:originalSample>

<gcdml:physicalMaterial>

<gcdml:samplingTime><gcdml:notGiven>unknown</gcdml:notGiven></gcdml:samplingTime>

<gcdml:samplePointLocation>

<gml:LocationKeyWord>Baltic Sea</gml:LocationKeyWord>

<gml:LocationString>Kiel Fjord, Baltic Sea, Germany</gml:LocationString>

<gcdml:pos2D>54.329 10.149</gcdml:pos2D>

<gcdml:determinationMethod>derived from literature</gcdml:determinationMethod>

</gcdml:samplePointLocation>

<gcdml:marineHabitat>

<gcdml:waterBody>

<gcdml:depth>

<gcdml:measure min="0.00" max="0.05“><gcdml:values uom="m">0.00 0.05</gcdml:values></gcdml:measure>

</gcdml:depth>

</gcdml:waterBody>

</gcdml:marineHabitat>

<gcdml:materialType>seawater</gcdml:materialType>

<gcdml:amount><gcdml:measure><gcdml:values uom="ml">100</gcdml:values></gcdml:measure></gcdml:amount>

</gcdml:physicalMaterial>

</gcdml:originalSample>Kottmann et al. OMICS. 2008 18

Page 19: Software Development by the Genomics  Standards Consortium

19

GCDML Example (excerpt)

<gcdml:originalSample>

<gcdml:physicalMaterial>

<gcdml:samplingTime><gcdml:notGiven>unknown</gcdml:notGiven></gcdml:samplingTime>

<gcdml:samplePointLocation>

<gml:LocationKeyWord>Baltic Sea</gml:LocationKeyWord>

<gml:LocationString>Kiel Fjord, Baltic Sea, Germany</gml:LocationString>

<gcdml:pos2D>54.329 10.149</gcdml:pos2D>

<gcdml:determinationMethod>derived from literature</gcdml:determinationMethod>

</gcdml:samplePointLocation>

<gcdml:marineHabitat>

<gcdml:waterBody>

<gcdml:depth>

<gcdml:measure min="0.00" max="0.05“><gcdml:values uom="m">0.00 0.05</gcdml:values></gcdml:measure>

</gcdml:depth>

</gcdml:waterBody>

</gcdml:marineHabitat>

<gcdml:materialType>seawater</gcdml:materialType>

<gcdml:amount><gcdml:measure><gcdml:values uom="ml">100</gcdml:values></gcdml:measure></gcdml:amount>

</gcdml:physicalMaterial>

</gcdml:originalSample>Kottmann et al. OMICS. 2008 19

Page 20: Software Development by the Genomics  Standards Consortium

20

GCDML Example (excerpt)

<gcdml:originalSample>

<gcdml:physicalMaterial>

<gcdml:samplingTime><gcdml:notGiven>unknown</gcdml:notGiven></gcdml:samplingTime>

<gcdml:samplePointLocation>

<gml:LocationKeyWord>Baltic Sea</gml:LocationKeyWord>

<gml:LocationString>Kiel Fjord, Baltic Sea, Germany</gml:LocationString>

<gcdml:pos2D>54.329 10.149</gcdml:pos2D>

<gcdml:determinationMethod>derived from literature</gcdml:determinationMethod>

</gcdml:samplePointLocation>

<gcdml:marineHabitat>

<gcdml:waterBody>

<gcdml:depth>

<gcdml:measure min="0.00" max="0.05“><gcdml:values uom="m">0.00 0.05</gcdml:values></gcdml:measure>

</gcdml:depth>

</gcdml:waterBody>

</gcdml:marineHabitat>

<gcdml:materialType>seawater</gcdml:materialType>

<gcdml:amount><gcdml:measure><gcdml:values uom="ml">100</gcdml:values></gcdml:measure></gcdml:amount>

</gcdml:physicalMaterial>

</gcdml:originalSample>Kottmann et al. OMICS. 2008 20

Page 21: Software Development by the Genomics  Standards Consortium

21

Genome Catalogue

Online system for capturing MIGS/MIMS compliant

reports

21Field et al. Nature 2008

Page 22: Software Development by the Genomics  Standards Consortium

22

Genome Catalogue

Requirements

• A Rich toolkit/user-friendly

• Designed to give credit to all contributors

• XML-based (GCDML) Able to maintain all versions of GCDML schemas

• Web services-based Supporting the automated exchange of content

• Serve as the international GCAT identifier authority

• Comprehensive Containing reports for all taxa and metagenomes

• Ontology-supportive

• Shared by the GSC

22

Page 23: Software Development by the Genomics  Standards Consortium

23

Current Status

We have specifications:

• MIGS/MIMS

• Habitat-Lite

• Genomic Rosetta Stone

Work on supporting software is ongoing:

• Genomes Catalogue is in prototype status

• Funding This is a long-term endeavour that can not be done on a

voluntary basis

23

Page 24: Software Development by the Genomics  Standards Consortium

24

Disscusion

Need of software for:

• Creation of MIGS/MIMS data

• Storage

• Analysis

Expand standardization efforts to

• Software specification/development

• Work on a standardized genomic data management architecture / cyberinfrastructure

Data intensive science is successful if it works

towards one community with one vision

• World Wide Genomics project

24

Page 25: Software Development by the Genomics  Standards Consortium

25

Acknowledgements

All Members of GSC incl. Dawn Field

Peter Sterk

Saul Kravitz

Tanya Gray

Megx.net team

Frank Oliver Glöckner

Ivaylo Kostadinov

Melissa Beth Duhaime

Pier Luigi Buttigieg

Wolfgang Hankeln

Pelin Yilmaz

Page 26: Software Development by the Genomics  Standards Consortium

26

END

Looking forward to the discussion

26

Join the GSC

http://gensc.org