2015 11 agris-medes

32
AGRIS From a bibliographical database to a linked open data application extending knowledge mining to the world wide web Fabrizio Celli and Johannes Keizer – 04/11/2015

Upload: johannes-keizer

Post on 16-Apr-2017

333 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: 2015 11 agris-medes

AGRISFrom a bibliographical database to a linked

open data application extending knowledge mining to the world wide web

Fabrizio Celli and Johannes Keizer – 04/11/2015

Page 2: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org 2

Outline

What is AGRIS? (S)Mash-up! Mining and indexing the web

Page 3: 2015 11 agris-medes

WHAT IS AGRIS?

Page 4: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org 4

AGRIS The International System for Agricultural

Science and Technology A collection of more than 8 million

multilingual bibliographic resources A network of more than 150 institutions

from 65 countries A Web portal (http://agris.fao.org/)

Page 5: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org

Page 6: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org

AGRIS 2001

Page 7: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org

AGRIS 2001

7

Page 8: 2015 11 agris-medes

johannes keizerhttp://aims.fao.org

AGRIS 2015

Page 9: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org 9

AGRIS users

• Researchers, professors, graduated students looking for bibliographies

• Librarians, cataloguers • Small journal publishers, professional

associations, conference organizers• Government officers asking for reports on a

specific topic

Page 10: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org

Impact

10

It supports both developed and developing countriesAccessed from more than 200 countries and territories

Google Analytics October 2015

Page 11: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org 11

Statistics

8,142,755 multilingual bibliographic records~ 400,000 from Latin America~ 150,000 from Africa~ 760,000 from Asia + 400,000 links to CASDD (China)

253,286,038 triples

Page 12: 2015 11 agris-medes

(S)Mash-up!

12

Page 13: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org 13

LOD infrastructure

Since December 2013 AGRIS moved to the RDF world

Generation of mashup pages• users looking for specific topics can access a

publication from the AGRIS database, combined with other related resources extracted from other preselected datasets

• external resources are not only bibliographic metadata, but also distribution maps, statistics, germplasm accessions, and so on.

Page 14: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org 14

The RDF-ization process

Translation of the AGRIS AP XML database to RDF• Selection of existing vocabularies• Data cleaning and normalization• Index all records with the AGROVOC thesaurus• Run the conversion and publish RDF data!

Selection of external datasets we want to interlink to AGRIS

Page 15: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org 15

AGRIS RDFbibo:Articlebibo:abstractbibo:doibibo:isbnbibo:presentedAt -> bibo:Conference -> dct:titlebibo:uridct:alternativedct:creator -> foaf:organization -> foaf:namedct:creator -> foaf:Person -> foaf:namedct:dateSubmitteddct:descriptiondct:extentdct:identifier

dct:languagedct:isPartOfdct:issueddct:publisher -> foaf:Organization -> foaf:namedct:sourcedct:subjectdct:titledct:typedct:rights

Page 16: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org 16

AGROVOC The FAO multilingual vocabulary containing

around 32 000 concepts in up to 21 languages

Backbone: the magic that allows the interlinking to external datasets

Two ways to implement the interlinking:• Using AGROVOC formal aligments to other thesauri • Querying external WebServices with scientific names

Page 17: 2015 11 agris-medes

johannes keizerhttp://aims.fao.org

Relationships, Relationshipshttp://aims.fao.org/aos/agrovoc/c_1474.html

Page 18: 2015 11 agris-medes

johannes keizerhttp://aims.fao.org

Page 19: 2015 11 agris-medes

johannes keizerhttp://aims.fao.org

http://agris.fao.org

http://agris.fao.org/agris-search/search.do?recordID=PH2011000084

http://agris.fao.org/agris-search/search.do?recordID=PL2003002036

Page 20: 2015 11 agris-medes

20

Mashup

Page 21: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org

From AGRIS to DBPedia

AGRIS URI

AGROVOC URI

dcterms:subject

DBPedia URI

skos:closeMatchskos:exactMatch

DBPedia Abstract

Wikipedia URL

DBPedia Picture

foaf:isPrimaryTopicOfdbpedia-owl:abstractfoaf:depiction

Entry point!

AGROVOC is the

backbone

Page 22: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org

SPARQL in action!1. From an AGRIS URI, get the list of the AGROVOC URIs (dcterms:subject)

PREFIX dct: <http://purl.org/dc/terms/>SELECT ?agrWHERE {<AGRIS_Uri> dct:subject ?agr .

}

2. For each AGROVOC URI2.1. Get skos:closeMatch and skos:exactMatch (formal alignments to other thesauri)

PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT?em ?cm {OPTIONAL { <AGROVOC_Uri> skos:exactMatch ?em } . OPTIONAL { <AGROVOC_Uri> skos:closeMatch ?cm } .

}

Page 23: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org

Get DBPedia

2.2. The JAVA code filters DBPedia URIs, to avoid adding a new FILTER in the SPARQL query (it’s heavy…)

2.3. For each DBPedia URI, query the DBPedia SPARQL endpoint to get information to display in an AGRIS widget

SELECT ?abs ?img ?wiki WHERE {

OPTIONAL {<DBP_Uri> dbpedia-owl:abstract ?abs} . OPTIONAL {<DBP_Uri> foaf:depiction ?img} . OPTIONAL {<DBP_Uri> foaf:isPrimaryTopicOf ?wiki} . FILTER ( (lang(?abs ) =\"en\") || (!bound(?abs)) ) }

Page 24: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org 24

Bibliography

«Migrating bibliographic datasets to the Semantic Web: The AGRIS case». Stefano Anibaldi, Yves Jaques, Fabrizio Celli, Armando Stellato, Johannes Keizer. Semantic Web journal «OpenAGRIS: using bibliographical data for linking into the agricultural knowledge web». Fabrizio Celli, Stefano Anibaldi, Maria Folch, Yves Jaques, Johannes Keizer. AOS 2011

Page 25: 2015 11 agris-medes

25

Mining and indexing the web

Page 26: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org 26

The context Scientists and researchers publish their

results not only in journals or at conferences, but also via web 2.0 tools and other media

Corpora of ongoing research activities, unpublished material, grey literature, quick discussions, and experiments with negative results and ideas

This information is usually unstructured and not exposed using web services

Page 27: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org 27

Goal Crawl the web (manually preselected

websites) Machine learning algorithms to index

discovered web resources using AGROVOC Select relevant resources using a

recommender system Interlink to AGRIS!

Page 28: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org

Crawling and indexing

28

https://github.com/fcproj/agrotagger

Page 29: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org

Recommender system

29

• A JAVA component that computes meaningful intersections between the Crawler Database and the AGRIS database

• Offline process, recommendations are stored in a triplestore

Page 30: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org

Interlinking

30

https://github.com/fcproj/recommender

Page 31: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org 31

Page 32: 2015 11 agris-medes

fabrizio celli johannes keizerhttp://aims.fao.org

Bibliography

32

Discovering, Indexing and Interlinking Information Resources Fabrizio Celli, Johannes Keizer, Yves Jaques, Stasinos Konstantopoulos, Dušan Vudragović. F1000 ResearchVersion 2 under revision