2015 11 agris-medes

Post on 16-Apr-2017

333 Views

Category:

Internet

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

AGRISFrom a bibliographical database to a linked

open data application extending knowledge mining to the world wide web

Fabrizio Celli and Johannes Keizer – 04/11/2015

fabrizio celli johannes keizerhttp://aims.fao.org 2

Outline

What is AGRIS? (S)Mash-up! Mining and indexing the web

WHAT IS AGRIS?

fabrizio celli johannes keizerhttp://aims.fao.org 4

AGRIS The International System for Agricultural

Science and Technology A collection of more than 8 million

multilingual bibliographic resources A network of more than 150 institutions

from 65 countries A Web portal (http://agris.fao.org/)

fabrizio celli johannes keizerhttp://aims.fao.org

fabrizio celli johannes keizerhttp://aims.fao.org

AGRIS 2001

fabrizio celli johannes keizerhttp://aims.fao.org

AGRIS 2001

7

johannes keizerhttp://aims.fao.org

AGRIS 2015

fabrizio celli johannes keizerhttp://aims.fao.org 9

AGRIS users

• Researchers, professors, graduated students looking for bibliographies

• Librarians, cataloguers • Small journal publishers, professional

associations, conference organizers• Government officers asking for reports on a

specific topic

fabrizio celli johannes keizerhttp://aims.fao.org

Impact

10

It supports both developed and developing countriesAccessed from more than 200 countries and territories

Google Analytics October 2015

fabrizio celli johannes keizerhttp://aims.fao.org 11

Statistics

8,142,755 multilingual bibliographic records~ 400,000 from Latin America~ 150,000 from Africa~ 760,000 from Asia + 400,000 links to CASDD (China)

253,286,038 triples

(S)Mash-up!

12

fabrizio celli johannes keizerhttp://aims.fao.org 13

LOD infrastructure

Since December 2013 AGRIS moved to the RDF world

Generation of mashup pages• users looking for specific topics can access a

publication from the AGRIS database, combined with other related resources extracted from other preselected datasets

• external resources are not only bibliographic metadata, but also distribution maps, statistics, germplasm accessions, and so on.

fabrizio celli johannes keizerhttp://aims.fao.org 14

The RDF-ization process

Translation of the AGRIS AP XML database to RDF• Selection of existing vocabularies• Data cleaning and normalization• Index all records with the AGROVOC thesaurus• Run the conversion and publish RDF data!

Selection of external datasets we want to interlink to AGRIS

fabrizio celli johannes keizerhttp://aims.fao.org 15

AGRIS RDFbibo:Articlebibo:abstractbibo:doibibo:isbnbibo:presentedAt -> bibo:Conference -> dct:titlebibo:uridct:alternativedct:creator -> foaf:organization -> foaf:namedct:creator -> foaf:Person -> foaf:namedct:dateSubmitteddct:descriptiondct:extentdct:identifier

dct:languagedct:isPartOfdct:issueddct:publisher -> foaf:Organization -> foaf:namedct:sourcedct:subjectdct:titledct:typedct:rights

fabrizio celli johannes keizerhttp://aims.fao.org 16

AGROVOC The FAO multilingual vocabulary containing

around 32 000 concepts in up to 21 languages

Backbone: the magic that allows the interlinking to external datasets

Two ways to implement the interlinking:• Using AGROVOC formal aligments to other thesauri • Querying external WebServices with scientific names

johannes keizerhttp://aims.fao.org

Relationships, Relationshipshttp://aims.fao.org/aos/agrovoc/c_1474.html

johannes keizerhttp://aims.fao.org

johannes keizerhttp://aims.fao.org

http://agris.fao.org

http://agris.fao.org/agris-search/search.do?recordID=PH2011000084

http://agris.fao.org/agris-search/search.do?recordID=PL2003002036

20

Mashup

fabrizio celli johannes keizerhttp://aims.fao.org

From AGRIS to DBPedia

AGRIS URI

AGROVOC URI

dcterms:subject

DBPedia URI

skos:closeMatchskos:exactMatch

DBPedia Abstract

Wikipedia URL

DBPedia Picture

foaf:isPrimaryTopicOfdbpedia-owl:abstractfoaf:depiction

Entry point!

AGROVOC is the

backbone

fabrizio celli johannes keizerhttp://aims.fao.org

SPARQL in action!1. From an AGRIS URI, get the list of the AGROVOC URIs (dcterms:subject)

PREFIX dct: <http://purl.org/dc/terms/>SELECT ?agrWHERE {<AGRIS_Uri> dct:subject ?agr .

}

2. For each AGROVOC URI2.1. Get skos:closeMatch and skos:exactMatch (formal alignments to other thesauri)

PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT?em ?cm {OPTIONAL { <AGROVOC_Uri> skos:exactMatch ?em } . OPTIONAL { <AGROVOC_Uri> skos:closeMatch ?cm } .

}

fabrizio celli johannes keizerhttp://aims.fao.org

Get DBPedia

2.2. The JAVA code filters DBPedia URIs, to avoid adding a new FILTER in the SPARQL query (it’s heavy…)

2.3. For each DBPedia URI, query the DBPedia SPARQL endpoint to get information to display in an AGRIS widget

SELECT ?abs ?img ?wiki WHERE {

OPTIONAL {<DBP_Uri> dbpedia-owl:abstract ?abs} . OPTIONAL {<DBP_Uri> foaf:depiction ?img} . OPTIONAL {<DBP_Uri> foaf:isPrimaryTopicOf ?wiki} . FILTER ( (lang(?abs ) =\"en\") || (!bound(?abs)) ) }

fabrizio celli johannes keizerhttp://aims.fao.org 24

Bibliography

«Migrating bibliographic datasets to the Semantic Web: The AGRIS case». Stefano Anibaldi, Yves Jaques, Fabrizio Celli, Armando Stellato, Johannes Keizer. Semantic Web journal «OpenAGRIS: using bibliographical data for linking into the agricultural knowledge web». Fabrizio Celli, Stefano Anibaldi, Maria Folch, Yves Jaques, Johannes Keizer. AOS 2011

25

Mining and indexing the web

fabrizio celli johannes keizerhttp://aims.fao.org 26

The context Scientists and researchers publish their

results not only in journals or at conferences, but also via web 2.0 tools and other media

Corpora of ongoing research activities, unpublished material, grey literature, quick discussions, and experiments with negative results and ideas

This information is usually unstructured and not exposed using web services

fabrizio celli johannes keizerhttp://aims.fao.org 27

Goal Crawl the web (manually preselected

websites) Machine learning algorithms to index

discovered web resources using AGROVOC Select relevant resources using a

recommender system Interlink to AGRIS!

fabrizio celli johannes keizerhttp://aims.fao.org

Crawling and indexing

28

https://github.com/fcproj/agrotagger

fabrizio celli johannes keizerhttp://aims.fao.org

Recommender system

29

• A JAVA component that computes meaningful intersections between the Crawler Database and the AGRIS database

• Offline process, recommendations are stored in a triplestore

fabrizio celli johannes keizerhttp://aims.fao.org

Interlinking

30

https://github.com/fcproj/recommender

fabrizio celli johannes keizerhttp://aims.fao.org 31

fabrizio celli johannes keizerhttp://aims.fao.org

Bibliography

32

Discovering, Indexing and Interlinking Information Resources Fabrizio Celli, Johannes Keizer, Yves Jaques, Stasinos Konstantopoulos, Dušan Vudragović. F1000 ResearchVersion 2 under revision

top related