genome annotation and report system: an integrated system...

1
Joëlle Amselem 1 , Nicolas Lapalu 1, Baptiste Brault 1 , Michaël Alaux 1 , Fabrice Legeai 2 , Françoise Alfama 1 , Laetitia Brigitte 1 , Nathalie Choisne 1 , Aminah Keliet 1 , Erik Kimmel 1 , Jonathan Kreplak 1 , Isabelle Luyten 1 , Cyril Pommier 1 , Sébastien Reboux 1 ,Stéphanie Sidibe-Bocs 3 , Delphine Steinbach 1 , Marc-Henri Lebrun 4 and Hadi Quesneville 1 1 INRA-URGI, Versailles, France; 2 UMR BIO-3P, Rennes, France; 3 CIRAD Montpellier, France; 4 INRA-BIOGER, Thiverval grignon, France; Contact : [email protected] Genome Annotation and Report System: an integrated system for structural and functional genome annotation Abstract Nowadays with the development of NGS, more and more genomes are sequenced, producing very large amount of data. However, annotations can’t keep pace, introducing a lack between genome data and annotation releases. To face this challenge, the URGI (http://urgi.versailles.inra.fr ) platform aims at providing tools to annotate entirely sequenced genome comprising: pipelines, databases and user-friendly interfaces to browse and query the data. We will focus here on complementary systems developed in the frame of GnpIS Information System QuickSearch (http://urgi.versailles.inra.fr/ gnpis ) and GnpAnnot (http://www.gnpannot.org/) projects. Thanks to : Genome project consortia The integrated genome annotation system was successfully set up for fungal genome annotation of Botrytis cinerea T4 (grey mould disease) and Leptosphaeria maculans (stem canker), urgi.versailles.inra.fr/index.php/Species. GRS has been developed and set up in the frame of the Aphid sequencing and annotation project, www.aphidbase.com/aphidbase. The GRS edition service is still under development and will be soon released, for these genomes. The Genome Report System (GRS): an interface to visualize information related to a gene and its genome locus and edit functional annotation The GRS, written in Java, was developed to produce various and user- friendly Web reports. GRS uses structural and functional genomic data stored in Chado DB to provide users with a comprehensive categories of reports (functional domains, blast hits, orthologous genes, genome same locus. GRS proposes also a Gene Ontology browser and an editing module (GRE) to allow manual functional annotation. Conclusion and Perspectives Quick and Advanced search GnpIS gateway is based on the Apache Lucene™ full- featured text search engine library. GnpIS DBs have been indexed included two well known database (DB) schemas: SeqFeature:Store and Chado (gmod.org). Indexes are generated in several ways to query structural or functionnal data stored in same or separate DBs. Results are returned according to significance with term criteria to provide links to Gbrowse or Genome Report System (GRS). BioMart based datamarts were set up to be used as an advance search tool. Results of the complex search criteria could be exported in different formats or directly send to a Galaxy framework for further bioinformatic analysis. Figure 2: Biomart based advanced search request using filters (2A) ad results displayed in attributes selected columns (2B). 2A 2B Figure 1: Quick search request on whole GnpIS. Results are displayed through lists for each DBs containing at least one occurrence of the keyword search. Concerning Genome DBs links lead to Gbrowse region or Gene Report. This system relies on the well known GMOD databases and interfaces (chado / GBrowse / Apollo). Curated data are available and shared by the consortium community as soon as they are committed in the database using “pure JDBC’” direct communication protocol between Apollo and Chado. Annotation provided by Structural and functional annotation pipeline are respectively inserted into Chado and DB::Bio::SeqFeature::Store DBs schemas from GMOD. A distributed annotation system allowing gene structure & function manual curation Manual curation Apollo Advanced search BioMart Genome Report System Chado Edition Structural & functional annotation Annotation pipelines: -Gene prediction -Refinment analysis -Functional annotation Quick search Lucene/Hibernate SeqFeature:Store Gbrowse fast display Structural & Functional Annotation Figure 3: Dataflow and interoperability of the Integrated genome annotation system. Cross ref links on the different interfaces allow the interoperability between them. Black arrows: DB access in read mode Red arrows: DB access in write mode (GRS access in write mode restricted to the "Edition" report) Gbrowse 1.7 Gbrowse_Syn Ref=polypeptide Ref=genome assembly S. Sclerotiorum B. cinerea T4 B. cinerea B0510 1 2 3 4 5 Figure 4: Genome Report System screen shot for one B. cinerea gene. Each Category can be opened, each table / link /other pane displayed correspond to a specific service developped. 1. Domain/motifs interproscan result service 2. SignalP/TargetP result service 3. Link to ontology tree and cross reference to all gene sharing the same GO 4. Link to S. sclerotiorum (B. cinerea B0510) orthologous gene report 5. Link to Domain/motif graphical display (Gbrowse/SeqFeature::Store)

Upload: others

Post on 04-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genome Annotation and Report System: an integrated system ...gmod.org/mediawiki/images/d/db/PAG2011_annotation... · Genome Annotation and Report System: an integrated system for

Joëlle Amselem1, Nicolas Lapalu 1, Baptiste Brault1, Michaël Alaux1, Fabrice Legeai2, Françoise Alfama1, Laetitia Brigitte1, Nathalie Choisne1, Aminah Keliet1, Erik Kimmel1, Jonathan Kreplak1, Isabelle Luyten1, Cyril Pommier1, Sébastien Reboux1,Stéphanie Sidibe-Bocs3, Delphine Steinbach1, Marc-Henri Lebrun4 and Hadi Quesneville1

1INRA-URGI, Versailles, France; 2UMR BIO-3P, Rennes, France; 3 CIRAD Montpellier, France; 4INRA-BIOGER, Thiverval grignon, France;

Contact : [email protected]

Genome Annotation and Report System: an integrated system for structural and functional genome annotation

Abstract Nowadays with the development of NGS, more and more genomes are sequenced, producing very large amount of data. However, annotations can’t keep pace, introducing a lack between genome data and annotation releases. To face this challenge, the URGI (http://urgi.versailles.inra.fr) platform aims at providing tools to annotate entirely sequenced genome comprising: pipelines, databases and user-friendly interfaces to browse and query the data. We will focus here on complementary systems developed in the frame of GnpIS Information System QuickSearch (http://urgi.versailles.inra.fr/gnpis) and GnpAnnot (http://www.gnpannot.org/) projects.

Thanks to : Genome project consortia

The integrated genome annotation system was successfully set up for fungal genome annotation of Botrytis cinerea T4 (grey mould disease) and Leptosphaeria maculans (stem canker), urgi.versailles.inra.fr/index.php/Species. GRS has been developed and set up in the frame of the Aphid sequencing and annotation project, www.aphidbase.com/aphidbase. The GRS edition service is still under development and will be soon released, for these genomes.

The Genome Report System (GRS): an interface to visualize information related to a gene and its genome locus and edit functional annotation The GRS, written in Java, was developed to produce various and user-friendly Web reports. GRS uses structural and functional genomic data stored in Chado DB to provide users with a comprehensive categories of reports (functional domains, blast hits, orthologous genes, genome same locus. GRS proposes also a Gene Ontology browser and an editing module (GRE) to allow manual functional annotation.

Conclusion and Perspectives

Quick and Advanced search GnpIS gateway is based on the Apache Lucene™ full-featured text search engine library. GnpIS DBs have been indexed included two well known database (DB) schemas: SeqFeature:Store and Chado (gmod.org). Indexes are generated in several ways to query structural or functionnal data stored in same or separate DBs. Results are returned according to significance with term criteria to provide links to Gbrowse or Genome Report System (GRS). BioMart based datamarts were set up to be used as an advance search tool. Results of the complex search criteria could be exported in different formats or directly send to a Galaxy framework for further bioinformatic analysis.

Figure 2: Biomart based advanced search request using filters (2A) ad results displayed in attributes selected columns (2B).

2A

2B Figure 1: Quick search request on whole GnpIS. Results are displayed through lists for each DBs containing at least one occurrence of the keyword search. Concerning Genome DBs links lead to Gbrowse region or Gene Report.

This system relies on the well known GMOD databases and interfaces (chado / GBrowse / Apollo). Curated data are available and shared by the consortium community as soon as they are committed in the database using “pure JDBC’” direct communication protocol between Apollo and Chado. Annotation provided by Structural and functional annotation pipeline are respectively inserted into Chado and DB::Bio::SeqFeature::Store DBs schemas from GMOD.

A distributed annotation system allowing gene structure & function manual curation

Manual curation Apollo

Advanced search BioMart

Genome Report System

Chado Edition

Structural & functional annotation

Annotation pipelines: - Gene prediction - Refinment analysis - Functional annotation

Quick search Lucene/Hibernate

SeqFeature:Store Gbrowse fast display Structural & Functional

Annotation

Figure 3: Dataflow and interoperability of the Integrated genome annotation system. Cross ref links on the different interfaces allow the interoperability between them. Black arrows: DB access in read mode Red arrows: DB access in write mode (GRS access in write mode restricted to the "Edition" report)

Gbrowse 1.7

Gbrowse_Syn

Ref=polypeptide

Ref=genome assembly

S. Sclerotiorum

B. cinerea T4

B. cinerea B0510

1

2

3

4

5

Figure 4: Genome Report System screen shot for one B. cinerea gene. Each Category can be opened, each table / link /other pane displayed

correspond to a specific service developped. 1.  Domain/motifs interproscan result service 2.  SignalP/TargetP result service 3.  Link to ontology tree and cross reference to all gene sharing the same GO 4.  Link to S. sclerotiorum (B. cinerea B0510) orthologous gene report 5.  Link to Domain/motif graphical display (Gbrowse/SeqFeature::Store)