towards biomedical data integration for analyzing the evolution of cognition
DESCRIPTION
Presented at the Ontologies in Data and Life Sciences Workshop 2013: https://wiki.imise.uni-leipzig.de/Gruppen/OBML/Workshops/2013ODLSenTRANSCRIPT
Towards Biomedical Data Integration for
Analyzing the Evolution of Cognition
Amrapali Zaveri, Jens Lehmann, Katja Nowick
Outline
● Why study Evolution of Cognition? ● Research Questions ● Our Approach ● Datasets
○ Conversion ○ Interlinking ○ Querying and Preliminary Results
● Conclusions & Future Work
2/13
Why study Evolution of Cognition?
● Cognition refers to a group of mental processes that includes memory, attention, language (production and understanding), reasoning, learning, problem solving as well as decision making.
● Some aspects of cognition are human specific, and that it has been argued that human specific evolutionary innovations have made us on the one hand smarter but on the other hand more vulnerable to cognitive disorders, e.g. Autism, Alzheimers disease
3/13
Why study Evolution of Cognition?
● Mental processes involved in cognition are not controlled by a few individual genes but rather by the function and interplay of several hundreds, if not even thousands, of genes.
● Information available in disparate databases or in separate tables of publications.
● Querying across these databases is ● time consuming – data in different formats ● highly inefficient - when any one of the datasets is
updated or changed.
4/13
• Which genes have been found to be positively selected in humans, but also have been implicated with cognitive diseases?
• Which genes have been associated with human cognitive processes and evidence of evolutionary signatures "changes" within primates?
• Which genes have been associated with cognitive decline during ageing in humans? Do they show differential expression patterns when compared with other primates during development "ageing"?
• Do genes involved in cognition and behaviour show high diversity within humans and higher divergence between humans and chimpanzees?
Research Questions *
Image source: http://www.scientificamerican.com/article.cfm?id=what-makes-us-human 5/13
• Use the Linking Open Data (LOD) principles • Identify and acquire data from relevant
disparate datasets • Convert data to a single human and
machine-readable format – RDF (Resource Description Format)
• Integrate and interlink datasets • Query integrated datasets
Our Approach
6/13
● 11 datasets ● Genes –
symbol, name, alternative names
● Diseases ● Chromosome
location ● Cross-species
information
Datasets
Datasets conversion
Available Formats: ● CSV, TSV ● TXT ● PDF Transformed to RDF using: ● SPARQLIFY ● LODRefine
8/13
Datasets Interlinking
● Each gene was given a unique identifier based on the gene symbol to create a URI (Uniform Resource Identifier) ● a single globally re-usable resource.
● Common element: Gene Symbol
Example: http://aksw.cogevo.org/gene/FMR2
9/13
• Integrated datasets available at http://k41.bioinf.uni-leipzig.de:8890/sparql with the graph name http://aksw.cogevo.org
• Research Question: o “Which genes are involved in determining cognition and have
changed during primate evolution?”
• Datasets: o ID-TFs
§ Transcription Factors associated with Intelligence Disorder o Human Positive Selection Candidates
§ dN/dS ratio: no. of mutations leading to an amino acid seq. change vs. no. of mutations that do not lead to this change
§ The higher this ratio, the faster the protein is evolving. § dN/dS ratio > 1 – evolve under positive selection
Datasets Querying and Initial Results
10/13
SELECT ?symbol1 ?dnbydns FROM http://aksw.cogevo.org WHERE { ?gene1 rdf:type cog:gene . ?gene1 go:symbol ?symbol1 . ?gene1 cog:dnDs ?dnbydns . ?gene2 rdf:type cog:gene . ?gene2 go:symbol ?symbol2 . ?gene2 cog:nsid ?ns . FILTER (?symbol1 = ?symbol2) }
Initial Results Result FMR2 dN/dS = 1.33 • has changed significantly more during primate evolution & might be under positive selection in humans • Patients with mutations in FMR2 have been reported to be mentally retarded & have autistic behavior*.
SPARQL Query
* M. Bensaid, M. Meiko, E.G. Bechara, L. Davidovic, A. Berretta, M.V. Catania, J.Gecz and B. Lalli, E. Bardoni. FRAXE-associated mental retardation protein (FMR2) is an RNA binding protein with high affinity for G-quartet RNA forming structure. Nucleic Acids Research, 2009. 11/13
Conclusions & Future Work
● Preliminary work and ideas to use Linked Data publication to demonstrate its use in analyzing the evolution of cognition.
Future Work: ● Perform complex queries ● Answer more research questions ● Add more datasets ● Interlink with external datasets ● Create user interface
12/13
Thank You Questions?
http://aksw.org/AmrapaliZaveri [email protected]