enriching the gene ontology via the dissection of labels using the ontology pre-processor language

22
Enriching the Gene Ontology via the Dissec4on of Labels using the Ontology PreProcessor Language Jesualdo Tomás FernándezBreis, Luigi Iannone, Ignazio Palmisano, Alan L. Rector, and Robert Stevens October 12th 2010, Lisbon, Portugal

Upload: jesualdofernandez

Post on 13-Jan-2015

406 views

Category:

Technology


1 download

DESCRIPTION

Authors: J.T. Fernandez-Breis, L. Iannone, I. Palmisano, A. Rector, R. Stevens. Presented at 17th International Conference on Knowledge Engineering and Knowledge Management, EKAW2010

TRANSCRIPT

Page 1: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Enriching  the  Gene  Ontology  via  the  Dissec4on  of  Labels  using  the  

Ontology  Pre-­‐Processor  Language  

Jesualdo  Tomás  Fernández-­‐Breis,  Luigi  Iannone,    

Ignazio  Palmisano,  Alan  L.  Rector,  and  Robert  Stevens  

October  12th  2010,  Lisbon,  Portugal  

Page 2: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Mo4va4on  

•  Biomedical  Ontologies  – The  OBO  Foundry  • More  than  200  biomedical  ontologies  • Some  proper4es  

– Delineated  content  – Reuse  of  exis4ng  ontologies  – Textual  defini4ons  – Systema4c  naming  conven4on  

• Limited  explicit  seman4cs  

Page 3: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Gene  Ontology  Consor4um  

Page 4: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language
Page 5: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Enrichment  of  GO  Molecular  Func4on  

Original  GO  MF   Dissec4on  of  the  Ontology  

Analysis  of  Labels  

Iden4fica4on  of  Linguis4c  PaQerns  

Design  of  Knowledge  PaQerns  

Execu4on  of  the  Knowledge  

PaQerns  

Enriched  GO  MF  

Page 6: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Dissec4on  of  the  ontology  into    its  seman4c  axes  

•  Normaliza4on  

•  Analysis  of  the  labels  – Biochemical  substances  – Biological  processes  – Cellular  component  

•  Reuse  and  combina4on  of  exis4ng  ontologies  

Page 7: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

MyAuxiliarOntology  

Biological  Process  

MySubstances  

FMA  Rela4ons    

Ontology  CHEBI   MyProtein  

EC-­‐Primi4ve  

Aminoacid  Biochemical  

Complex  

Cellular  Component  

Page 8: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Design  of  linguis4c  paQerns  from  labels  

• Manual  analysis  of  the  structure  of  the  labels  by  taxonomies  

•  Some  linguis4c  paQerns  – “X  binding”  – “X  codon  amino  acid  adaptor  ac4vity”    

– “base  pairing  with  X”  – “transla4on  X  factor  ac4vity”  

Page 9: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Design  of  knowledge  paQerns    

•  Some    knowledge  paQerns  

binding  =    molecular_func,on  and  enables  some  (binds  some  chemical_substance  or  binds  some  cellular_component)  

triplet_codon_amino_acid_adaptor_ac4vity=    molecular_func,on    and  enables  some  (adapts  some  (amino_acid  and  recognizes  some  triplet))  

Page 10: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Execu4on  of  the  knowledge  paQerns    

•  OPPL  Version  2    – hQp://oppl2.sourceforge.net/  

•  Bulk  manipula4on  of  OWL  ontologies  – Enrichment,  Verifica4on,  PaQerns  – Manchester  OWL  Syntax  

•  Declara4ve  – OWL  Axioms,  variables,  regular  expressions  

Page 11: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

OPPL  Use  case  

OWL  axioms  

Values  

OPPL  Script  

Lean   Rich  

Egaña  et  al.  OWLED  2008  &  EKAW  2008,  Iannone  ESWC  2009  

Page 12: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

A  paQern  as  an    OPPL  script  

?y:CLASS=Match("((\w+))_codon_amino_acid_adaptor_ac4vity"),    ?x:CLASS=create(?y.GROUPS(1))  

SELECT  ?y  subClassOf  Thing    WHERE  ?y  Match("((\w+))_codon_amino_acid_adaptor_ac4vity")  

BEGIN  

ADD  ?y  subClassOf  molecular_func4on,    ADD  ?y  subClassOf  enables  some  (adapts  some  (amino_acid  and  recognizes  some  ?x))  

END;  

Page 13: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Results-­‐  Scope  •  The  “source”  Gene  Ontology  

–  Version  1550  –   8548  classes,  5  OP,  5  DP  and  9954  subclass  axioms  –  Classifica4on  4me  :  <  1  sec  (Fact++)  

•  Scope  of  this  study  (approx  18%  GO  MF)  –  binding    –  structural  molecule  ac4vity    –  chaperone  ac4vity    –  proteasome  regulator  ac4vity    –  electron  carrier  ac4vity    –  enzyme  regulator  ac4vity    –  transla4on  regulator  ac4vity  

•  Complete  results:  hQp://miuras.inf.um.es/~mfoppl/  

Page 14: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Results  –  Effec4veness  

– 1567  descendant  classes  of  binding  

– Knowledge  paQerns:  • Binding:  1228  /  1567  (78%)  • Base  pairing:  6  /84      

– Molecular  adaptor  ac4vity    (71/72)  

• Triplet  codon  amino  acid  ac4vity  (64/64)  

• All  the  7  binding  paQerns:  1336  /1567  (85%)  

Page 15: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Results-­‐  Enrichment  (I)  Before   A(er  

Page 16: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Results-­‐  Enrichment  (II)  

•  The  enriched  GO  MF  – 58624  classes,  254  OP,  16  DP,  107631  subclass  axioms,  264  equivalent  class  axioms  and  488  disjoint  class  axioms  

– Classifica4on  4me:    approx  2  minutes  (Fact++)  

–   Due  to  the  paQerns  •  584  new  classes  

–  Subop4mal  auxiliary  ontologies:  D1  Dopamine  –  Use  of  abbreviated  forms  in  GO  MF:  MAPK,  IgX  

•  13  new  OP      •  3608  new  subclass  axioms      

Page 17: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Results-­‐  Querying  (III)  • We  can  make  queries  that  were  not  possible  with  the  original  ontology:  – Example:  Molecular  func/ons  that  bind  substances  that  play  a  chemical  role  

Page 18: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Results-­‐  Findings  (II)  • We  can  make  queries  that  were  not  possible  with  the  original  ontology:  – Example:  Molecular  func/ons  that  bind  substances  that  play  a  chemical  role  

Page 19: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Results-­‐  Time  (IV)  

•  Execu4on  4me  of  the  binding  paQerns  

Page 20: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Conclusions    

•  PaQerns  and  OPPL  are  useful  for  suppor4ng  ontology  enrichment  processes  

•  The  structure  of  the  labels  in  biomedical  ontologies  embeds  knowledge  that  can  be  extracted  

•  Benefits  of  encoding  knowledge  into  paQerns:  modularity,  maintenance  and  evolu4on  

•  Cri4cal  factor:  the  auxiliary  ontologies  

Page 21: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Further  work  

•  Bio-­‐evalua4on  of  the  paQerns  

•  Iden4fica4on  of  linguis4cs  paQerns  using  text  mining  techniques  

•  Applica4on  to  the  rest  of  GO  MF  and  the  other  GO  ontologies  

•  Alignment  with  efforts  of  the  GO  Consor4um  

Page 22: Enriching the Gene Ontology via the Dissection of Labels using the Ontology Pre-Processor Language

Jesualdo  Tomás  Fernández  Breis  [email protected]  

hQp://webs.um.es/jfernand  

Thanks  for  your  aQen4on!  

Acknowledgements