introduction to linked data

101
Introduction to Linked Data Oscar Corcho, Asunción Gómez Pérez ({ocorcho, asun}@fi.upm.es) Universidad Politécnica de Madrid Universidad del Valle, Cali, Colombia September 10th 2010 Credits: Raúl García Castro, Oscar Muñoz, Jose Angel Ramos Gargantilla, María del Carmen Suárez de Figueroa, Boris Villazón, Alex de León, Víctor Saquicela, Luis Vilches, Miguel Angel García, Manuel Salvadores, Guillermo Alvaro, Juan Sequeda, Carlos Ruiz Moreno and many others Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0

Upload: oscar-corcho

Post on 09-May-2015

3.518 views

Category:

Technology


5 download

DESCRIPTION

Basic introductory talk about the Web of Linked Data, given to undergraduate and posgraduate students of Universidad del Valle (Cali, Colombia) in September 2010. Knowledge about Semantic Web is required

TRANSCRIPT

Page 1: Introduction to Linked Data

Introduction to Linked Data

Oscar Corcho, Asunción Gómez Pérez ({ocorcho, asun}@fi.upm.es)

Universidad Politécnica de Madrid

Universidad del Valle, Cali, ColombiaSeptember 10th 2010

Credits: Raúl García Castro, Oscar Muñoz, Jose Angel Ramos Gargantilla, María del Carmen Suárez de Figueroa, Boris Villazón, Alex de León, Víctor Saquicela, Luis Vilches, Miguel Angel García, Manuel Salvadores, Guillermo Alvaro, Juan Sequeda, Carlos Ruiz Moreno and many others

Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0

Page 2: Introduction to Linked Data

2

Contents

• Introduction to Linked Data• Linked Data publication

- Methodological guidelines for Linked Data publication- RDB2RDF tools- Technical aspects of Linked Data publication

• Linked Data consumption

Page 3: Introduction to Linked Data

What is the Web of Linked Data?

• An extension of the current Web…- … where information and services

are given well-defined and explicitly represented meaning, …

- … so that it can be shared and used by humans and machines, ...

- ... better enabling them to work in cooperation

• How? - Promoting information exchange by

tagging web content with machine processable descriptions of its meaning.

- And technologies and infrastructure to do this

- And clear principles on how to publish data

data

Page 4: Introduction to Linked Data

What is Linked Data?

• Linked Data is a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.

- Part of the Semantic Web- Exposing, sharing and connecting data- Technologies: URIs and RDF (although others are also

important)

Page 5: Introduction to Linked Data

5

The four principles (Tim Berners Lee, 2006)

1. Use URIs as names for things

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)

4. Include links to other URIs, so that they can discover more things.

• http://www.w3.org/DesignIssues/LinkedData.html

http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html

Page 6: Introduction to Linked Data

Linked Open Data evolution

2007

2008

2009

Page 7: Introduction to Linked Data

7

LOD Cloud May 2007

Figure from [4]

Facts:• Focal points:

• DBPedia: RDFized vesion of Wikipiedia; many ingoing and outgoing links

• Music-related datasets• Big datasets include FOAF, US Census data• Size approx. 1 billion triples, 250k links

Page 8: Introduction to Linked Data

8

LOD Cloud September 2008

Figure from [4]

Facts:• More than 35 datasets interlinked• Commercial players joined the cloud,

e.g., BBC• Companies began to publish and host

dataset, e.g. OpenLink, Talis, or Garlik.• Size approx. 2 billion triples, 3 million

links

Page 9: Introduction to Linked Data

9

LOD Cloud March 2009

Figure from [4]

Facts:• Big part from Linking Open Drug cloud and

the BIO2RDF project (bottom)• Notable new datasets: Freebase,

OpenCalais, ACM/IEEE• Size > 10 billion triples

Page 10: Introduction to Linked Data

LOD clouds

Page 11: Introduction to Linked Data

Why Linked Data?

• Basically, to move from a Web of documents to a Web of Data

• Let’s try an example:• Tell me which football players, born in the province of

Albacete, in Spain, have scored a goal in the World Cup final

• Disclaimer:- Sorry to use an example about football, but you have to

understand that for several years Spaniards will be talking about football a lot ;-)

Example courtesy of Guillermo Alvaro Rey

Page 12: Introduction to Linked Data

Information search in the Web of documents

¿?

Example courtesy of Guillermo Alvaro Rey

Page 13: Introduction to Linked Data

What we were actually looking for

Example courtesy of Guillermo Alvaro Rey

Page 14: Introduction to Linked Data

It would be better to make a data query…

(football players from Albacete who played Eurocup 2008)

Example courtesy of Guillermo Alvaro Rey

Page 15: Introduction to Linked Data

How should we publish data?

• Formats in which data is published nowadays…- XML- HTML- DBs- APIs- CSV- XLS- …

• However, main limitations from a Web of Data point of view- Difficult to integrate- Data is not linked to each other, as it happens with Web

documents.

Page 16: Introduction to Linked Data

Which format do we use then?

• RDF (Resource Description Framework)

- Data model- Based on triples: subject, predicate, object

• <Oscar> <vive en> <Madrid>• <Madrid> <es la capital de> <España>• <España> <es campeona de> <Mundial de Fútbol>• …

- Serialised in different formats• RDF/XML, RDFa, N3, Turtle, JSON…

Page 17: Introduction to Linked Data

URIs (Universal-Uniform Resource Identifer)

• Two types of identifiers can be used to identify Linked Data resources- URIRefs (Unique Resource Identifiers References)

• A URI and an optional Fragment Identifier separated from the URI by the hash symbol ‘#’

• http://www.ontology.org/people#Person• people:Person

- Plain URIs can also be used, as in FOAF:• http://xmlns.com/foaf/0.1/Person

17

Page 18: Introduction to Linked Data

18

How do we publish Linked Data?

1. Exposing Relational Databases or other similar formats into Linked Data- D2R- Triplify- R2O- NOR2O- Virtuoso- Ultrawrap- …

2. Using native RDF triplestores- Sesame- Jena- Owlim- Talis platform- …

3. Incorporating it in the form of RDFa in CMSs like Drupal

Page 19: Introduction to Linked Data

How do we consume Linked Data?

• Linked Data browsers- To explore things and datasets and to navigate between them.- Tabulator Browser (MIT, USA), Marbles (FU Berlin, DE),

OpenLink RDF Browser (OpenLink, UK), Zitgist RDF Browser (Zitgist, USA), Disco Hyperdata Browser (FU Berlin, DE), Fenfire (DERI, Ireland)

• Linked Data mashups- Sites that mash up (thus combine Linked data)- Revyu.com (KMI, UK), DBtune Slashfacet (Queen Mary, UK),

DBPedia Mobile (FU Berlin, DE), Semantic Web Pipes (DERI, Ireland)

• Search engines- To search for Linked Data.- Falcons (IWS, China), Sindice (DERI, Ireland), MicroSearch

(Yahoo, Spain), Watson (Open University, UK), SWSE (DERI, Ireland), Swoogle (UMBC, USA)

19Listing on this slide by T. Heath, M. Hausenblas, C. Bizer, R. Cyganiak, O. Hartig

Page 20: Introduction to Linked Data

Linked Data browsers (Disco)

Page 21: Introduction to Linked Data

Linked Data Mashup (LinkedGeoData)

© Migración de datos a la Web de los Datos - Enfoques, técnicas y herramientas Luis Manuel Vilches Blázquez

Page 22: Introduction to Linked Data

Linked Data Mashup (DBpedia Mobile)

© Migración de

datos a la Web de los Datos - Enfoqu

es, técnica

s y herramientas

Luis Manuel Vilches Blázqu

ez

http://wiki.dbpedia.org/DBpediaMobile

Page 23: Introduction to Linked Data

Linked Data Search Engines (Sindice and SIG.MA)

• Entity lookup service. Find a document that mentions a URI or a keyword.

Page 24: Introduction to Linked Data

Linked Data Search Engines (NYT)

• The New York Times: Alumni In The News- http://data.nytimes.com/schools/schools.html

Page 25: Introduction to Linked Data

Linked Data Search Engines (NYT)

• The New York Times: Source code is available

• … and is based on SPARQL queries

Page 27: Introduction to Linked Data

27

Open Government. USA and UK

TOP-DOWN

BOTTOM-UP

Page 28: Introduction to Linked Data

Linked Data Mashup (data.gov)

• Clean Air Status and Trends (CASTNET)- http://data-gov.tw.rpi.edu/demo/exhibit/demo-8-castnet.php

Page 29: Introduction to Linked Data

29

Linked Data in the UK

• Education- http://education.data.gov.uk/id/school/106661

• Parliament- http://parliament.psi.enakting.org/id/member/1227

• Maps- E.g., London:

http://data.ordnancesurvey.co.uk/id/7000000000041428- http://map.psi.enakting.org

• Transport- http://www.dft.gov.uk/naptan/

• SameAs service- http://www.sameas.org

• Challenges- http://gov.tso.co.uk/openup/sparql/gov-transport

Page 30: Introduction to Linked Data

Linked Data Mashup (data.gov.uk)

• Research Funding Explorer- http://bis.clients.talis.com/

Page 31: Introduction to Linked Data

31

Open Government Spain. Euskadi

Page 32: Introduction to Linked Data

32

Open Government Spain. Abredatos

Page 33: Introduction to Linked Data

33

Open Government Spain. Zaragoza

Page 34: Introduction to Linked Data

34

Open Government Spain. Asturias

Page 35: Introduction to Linked Data

Linked Data Mashup (Water quality)

• Water quality in Asturias’ beaches- http://datos.fundacionctic.org/sandbox/asturias/playas/

Page 36: Introduction to Linked Data

36

Contents

• Introduction to Linked Data• Linked Data publication

- Methodological guidelines for Linked Data publication- RDB2RDF tools- Technical aspects of Linked Data publication

• Linked Data consumption

Page 37: Introduction to Linked Data

GeoLinkedData

• It is an open initiative whose aim is to enrich the Web of Data with Spanish geospatial data.

• This initiative has started off by publishing diverse information sources, such as National Geographic Institute of Spain (IGN-E) and National Statistics Institute (INE)

• http://geo.linkeddata.es

Page 38: Introduction to Linked Data

Motivation

» 99.171 % English» 0.019 % Spanish

Source:Billion Triples dataset at http://km.aifb.kit.edu/projects/btc-2010/Thanks to Aidan and Richard

The Web of Data is mainly for English speakers

Poor presence of Spanish

Page 39: Introduction to Linked Data

Related Work

Page 40: Introduction to Linked Data

40

Impact of Geo.linkeddata.es

• Number of triples in Spanish (July 2010): 1.412.248 • Number of triples in Spanish (End August 2010):

21.463.088

Asunción Gómez Pérez

Before geo.linkeddata.es

en 99,1712875

ja 0,463849377

fr 0,05447229

de 0,034225134

pl 0,02532934

it 0,021982542

es 0,019584648

After geo.linkeddata.es

en 94,18744941

es 5,044085342

ja 0,440538697

fr 0,051734793

de 0,032505155

pl 0,024056418

it 0,020877812

Page 41: Introduction to Linked Data

Process for Publishing Linked Data on the Web

Identificationof the data

sources

Vocabularydevelopment

Generationof the RDF Data

Publicationof the RDF data

Linking the RDF data

Data cleansing

Enable effective discovery

Page 42: Introduction to Linked Data

1. Identification and selection of the data sources

Identificationof the data

sources

Vocabularydevelopment

Generationof the RDF Data

Publicationof the RDF data

Linking the RDF data

Data cleansing

Enable effective discovery

Instituto GeográficoNacional

Instituto Nacionalde Estadística

Page 43: Introduction to Linked Data

43

1. Identification and selection of the data sources

• Instituto Geográfico Nacional (Geographic Spanish Institute)- Multilingual (Spanish,

Vasc, Gallician, Catalan)- Conceptualization

mistmatches- Granularity (scale

concept)- Textual information- Particularaties

• Longitude• latitude

• Instituto Nacional de Estadística (Statistic Spanish Institute)• Monolingual• Numerical information• Particularaties

• Geo (textual level)• Temporal

Asunción Gómez Pérez

Page 44: Introduction to Linked Data

1. Identification and selection of the data sources

• IGN-E

Page 45: Introduction to Linked Data

1. Identification and selection of the data sources

Industry Production Index

Province

Year

Page 46: Introduction to Linked Data

2. Vocabulary development

Identificationof the data

sources

Vocabularydevelopment

Generationof the RDF Data

Publicationof the RDF data

Linking the RDF data

Data cleansing

Enable effective discovery

http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/#whichvocabs

This

is no

t eno

ugh

Page 47: Introduction to Linked Data

2. Vocabulary development

• Features- Lightweight :

• Taxonomies and a few properties- Consensuated vocabularies

• To avoid the mapping problems- Multilingual

• Linked data are multilingual

• The NeOn methodology can help to - Re-enginer Non ontological resources into ontologies

• Pros: use domain terminology already consensuated by domain experts

- Withdraw in heavyweight ontologies those features that you don’t need

- Reuse existing vocabularies

47Asunción Gómez Pérez

Identificationof the data

sources

Vocabularydevelopment

Generationof the RDF Data

Publicationof the RDF data

Linking the RDF data

Data cleansing

Enable effective discovery

Page 48: Introduction to Linked Data

48

Knowledge Resources

Non Ontological Resource

Reuse

Non Ontological Resource

Reengineering

2

2

2

Non Ontological Resources

Thesauri

DictionariesGlossaries Lexicons

TaxonomiesClassification

Schemas

O. Localization

9

Ontology Support Activities: Knowledge Acquisition (Elicitation); Documentation; Configuration Management; Evaluation (V&V); Assessment

1,2,3,4,5,6,7,8, 9

Ontological Resource

Reengineering

4

4

4

O. Aligning

O. Merging

Alignments5

5

5

6

6

6

6

3

Ontological Resource

Reuse

3Ontological Resources

O. Repositories and Registries

FlogicRDF(S)

OWL

Ontology Design

Pattern Reuse

7

O. Design Patterns

Ontology Restructuring(Pruning, Extension,

Specialization, Modularization)

8

O. Specification O. Conceptualization O. ImplementationO. Formalization

1

RDF(S)

OWL

FlogicScheduling

Page 49: Introduction to Linked Data

Vocabulary development: Specification

• Content requirements: Identify the set of questions that the ontology should answer- Which one are the provinces in Spain?- Where are the beaches?- Where are the reservoirs?- Identify the production index in Madrid- Which one is the city with higher production index?- Give me Madrid latitude and altitude- ….

• Non-content requirements- The ontology must be in the four official Spanish languages

49Asunción Gómez Pérez

Page 50: Introduction to Linked Data

2. Lightweight Ontology Development

hasStatisticalData

on

Ontology

Specification

Legend

hydrOntology

4

FAO

FAO Geopolitical ontology

WGS84

4W3C Vocabulary

GML

4GML Specification

O. Statistics

SCOVO

O. Time

W3C Time

hasLat/Long

hasGeometry

hasLat/Long

hasGeometry

hasLocation/isLocated

Thesaurus

UNESCO

4EGM / ERM

GeoNames

scv:Dimension

scv:Itemscv:Dataset

WGS84 Geo Positioning:

an RDF vocabulary

hydrographical

phenomena (rivers, lakes,

etc.)

Ontology for OGC

Geography Markup

Language

Vocabulary for instants,

intervals, durations,

etc.

Names and international

code systems for territories

and groups

Following the INSPIRE (INfrastructure for SPatial InfoRmation in Europe) recommendation.hydrOntology,SCOVO, FAO Geopolitcal, WGS84, GML, and Time

Classes 33 33

Object Properties 44 44

Data Properties 318 318

reused

Page 51: Introduction to Linked Data

• Objetivos:

- INSPIRE intenta conseguir fuentes armonizadas de Información Geográfica para dar soporte a la formulación, implementación y evaluación de políticas comunitarias (Medio Ambiente, etc).

- Fuentes de Información Geográfica: Bases de datos de los Estados Miembros (UE) a nivel local, regional, nacional e internacional.

Contexto – Directiva INSPIRE

Luis Manuel Vilches Blázquez

Page 52: Introduction to Linked Data

INSPIRE - Anexos

Luis Manuel Vilches Blázquez

Page 53: Introduction to Linked Data

hydrOntology

• Existencia de gran diversidad de problemas (múltiples fuentes, heterogeneidad de contenido y estructuración, ambigüedad del lenguaje natural, etc.) en la información geográfica.

• Necesidad de un modelo compartido para solventar los problemas de armonización y estructuración de la información hidrográfica.

• hydrOntology es una ontología global de dominio desarrollada conforme a un acercamiento top-down.

- Recubrir la mayoría de los fenómenos representables cartográficamente asociados al dominio hidrográfico.

- Servir como marco de armonización entre los diferentes productores de información geo-espacial en el entorno nacional e internacional.

- Comenzar con los pasos necesarios para obtener una mejor organización y gestión de la información geográfica (hidrográfica).

Luis Manuel Vilches Blázquez

Page 54: Introduction to Linked Data

Fuentes

GEMET

Catálogos de fenómenos

BCN25

BCN200

EGM & ERM CC.AA.

Nomenclátor Geográfico Nacional

Tesauros y Bibliografía

WFD

Nomenclátor Conciso

Diccionarios yMonografías

FTT ADL Getty

Luis Manuel Vilches Blázquez

Page 55: Introduction to Linked Data

Criterios de estructuración

• Directiva Marco del Agua- Propuesta por Parlamento y Consejo de la UE- Lista de definiciones de fenómenos hidrográficos

• Proyecto SDIGER- Proyecto piloto INSPIRE- Dos cuencas, países e idiomas

• Criterios semánticos- Diccionarios geográficos- Diccionario de la Real Academia de la Lengua- WordNet- Wikipedia- Bibliografía de varias áreas de conocimiento

• Herencia: Estructuración actual de catálogos• Asesoramiento expertos en toponimia del IGN

Luis Manuel Vilches Blázquez

Page 56: Introduction to Linked Data

Modelización del dominio hidrográfico Nivel superior

Nivel inferior

Luis Manuel Vilches Blázquez

Page 57: Introduction to Linked Data

Implementación & Formalizacón

12

3

4

5

+150 conceptos (classes) , 47 tipos de relaciones (properties)

y 64 tipos de atributos (attribute types)

+ Pellet

Luis Manuel Vilches Blázquez

Page 58: Introduction to Linked Data

2. Vocabulary development: HydrOntology

58Asunción Gómez Pérez

Page 59: Introduction to Linked Data

3. Generation of RDF

• From the Data sources- Geographic information

(Databases)- Statistic information

(.xsl)- Geospatial information

• Different technologies for RDF generation- Reengineering patterns- R20 and ODEMapster- Annotation tools- Geometry generation

Identificationof the data

sources

Vocabularydevelopment

Generationof the RDF Data

Publicationof the RDF data

Linking the RDF data

Data cleansing

Enable effective discovery

Page 60: Introduction to Linked Data

3. Generation of the RDF Data

INE

NOR2O

ODEMapster

IGN

IGN

Geospatial column

Geometry2RDF

Page 61: Introduction to Linked Data

3. Generation of the RDF Data / instances

• NOR2O is a software library that implements the transformations proposed by the Patterns for Re-engineering Non-Ontological Resources (PR-NOR). Currently we have 16 PR-NORs.

• PR-NORs define a procedure that transforms a Non-Ontological Resource (NOR) components into ontology elements. http://ontologydesignpatterns.org/

NOR2O

· Classificationschemes

· Thesauri

· Lexicons

NOR2O

FAO Water classification· Classification scheme

· Path enumeration data model· Implemented in a database

Page 62: Introduction to Linked Data

Re-engineering Model for NORs

Ontology Forward

Engineering

Implementation

Formalization

Conceptua-

lization

Speci-

fication

Implementation

Design

Con-

ceptual

Transformation

Patterns for Re-engineeringNon-Ontological Resources

(PR-NOR)

Non-Ontological Resource Ontology

Requirements

NOR Reverse

Engineering

RDF(S)

Page 63: Introduction to Linked Data

PR-NOR library at the ODP PortalTechnological support

http://ontologydesignpatterns.org/wiki/Submissions:ReengineeringODPs

Page 64: Introduction to Linked Data

3. Generation of the RDF Data – NOR2O

Industry Production Index

Province

Year

NOR2O

Page 65: Introduction to Linked Data

hydrOntology & databases

NGN1:25.000

multilingüe

Page 66: Introduction to Linked Data

3. Generation of the RDF Data – R2O & ODEMapster

• Creation of the R2O Mappings

Page 67: Introduction to Linked Data

3. Generation of the RDF Data – Geometry2RDF

Oracle STO UTIL package

SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311Geometry

FROM "BCN200"."BCN200_0301L_RIO" cWHERE c.Etiqueta='Arroyo'

Page 68: Introduction to Linked Data

3. Generation of the RDF Data – Geometry2RDF

Page 69: Introduction to Linked Data

3. Generation of the RDF Data – Geometry2RDF

Page 70: Introduction to Linked Data

3. Generation of the RDF data – RDF graphs

• IGN INE

• So far- 7 RDF Named Graphs- 1.412.248 triples

BTN25 BCN200 IPI….

http://geo.linkeddata.es/dataset/IGN/BTN25 http://geo.linkeddata.es/dataset/IGN/BCN200 http://geo.linkeddata.es/dataset/INE/IPI

Page 71: Introduction to Linked Data

4. Publication of the RDF Data

SPARQL

Pubby

Linked DataHTML

Virtuoso 6.1.0

Pubby 0.3

Including ProvenanceSupport

Identificationof the data

sources

Vocabularydevelopment

Generationof the RDF Data

Publicationof the RDF data

Linking the RDF data

Data cleansing

Enable effective discovery

Page 72: Introduction to Linked Data

4. Publication of the RDF Data

Page 73: Introduction to Linked Data

4. Publication of the RDF Data - License

• License for GeoLinkedData- Creative Commons Attribution-ShareAlike 3.0 - GNU Free Documentation License

• Each dataset will have its own specific license, IGN, INE, etc.

Page 74: Introduction to Linked Data

5. Data cleansing

• Lack of documentation of the IGN datasets• Broken links: Spain, IGN resources• Lack of documentation of the ontology• Missing english and spanish labels• Building a spanish ontology and importing

some concepts of other ontology (in English):- Importing the English ontology. Add annotations

like a Spanish label to them.- Importing the English ontology, creating new

concepts and properties with a Spanish name and map those to the English equivalents.

- Re-declaring the terms of the English ontology that we need (using the same URI as in the English ontology), and adding a Spanish label.

- Creating your own class and properties that model the same things as the English ontology.

Identificationof the data

sources

Vocabularydevelopment

Generationof the RDF Data

Publicationof the RDF data

Linking the RDF data

Data cleansing

Enable effective discovery

Page 75: Introduction to Linked Data

5. Data cleansing

• URIs in Spanish• http://geo.linkeddata.es/ontology/Río

- RDF allows UTF-8 characters for URIs- But, Linked Data URIs has to be URLs as well- So, non ASCII-US characters have to be %code

• http://geo.linkeddata.es/ontology/R%C3%ADo

Page 76: Introduction to Linked Data

6. Linking of the RDF Data

• Silk - A Link Discovery Framework for the Web of Data

• First set of links: Provinces of Spain- 86% accuracy

GeoLinkedDataDBPedia Geonames

Identificationof the data

sources

Vocabularydevelopment

Generationof the RDF Data

Publicationof the RDF data

Linking the RDF data

Data cleansing

Enable effective discovery

Page 77: Introduction to Linked Data

6. Linking of the RDF Data

• http://geo.linkeddata.es/page/Provincia/Granada

77Asunción Gómez Pérez

Page 78: Introduction to Linked Data

7. Enable effective discovery

Identificationof the data

sources

Vocabularydevelopment

Generationof the RDF Data

Publicationof the RDF data

Linking the RDF data

Data cleansing

Enable effective discovery

Page 79: Introduction to Linked Data

DEMO

http://geo.linkeddata.es/

Page 80: Introduction to Linked Data

Provinces

Page 81: Introduction to Linked Data

Industry Production Index – Capital of Province

Page 82: Introduction to Linked Data

Rivers

Page 83: Introduction to Linked Data

Beaches

Page 84: Introduction to Linked Data

84

Contents

• Introduction to Linked Data• Linked Data publication

- Methodological guidelines for Linked Data publication- RDB2RDF tools- Technical aspects of Linked Data publication

• Linked Data consumption

Page 85: Introduction to Linked Data

Ontology-based Access to DBs

1. Build a new ontology from 1 DB schema and 1 DB

2. Align the ontology built with approach 1 with a legacy ontology

3. Align an existing DB with a legacy ontology

a) Massive dump (semantic data warehouse)

b) Query-driven

4. Align an ontology network with n DB schemas and other data sources

a) Massive dump (semantic data warehouse)

b) Query-driven

new ontology

existing ontology

1 23

4

Page 86: Introduction to Linked Data

Ontology-based Access to Databases

BDR ModeloRelacional

Personal

Organización

Pregunta: Nombre de los profesores de la universidad

UPM

* Un profesor es una persona cuyo puesto es “docente”

* Una universidad es una organización de tipo “3”

Consulta: valores de la columna nombre de los registros de la tabla Personal para los que el valor de la columna puesto is

“docente” que estén relacionados con al menos un registro de la tabla Organización con el valor “3” en la columna tipo y “UPM” en

la columna nombre.

? Procesado de la consulta de acuerdo a la descripción formal

de correspondencia

Ontología

Profesor

Doctorando

Universidad

Procesador

Page 87: Introduction to Linked Data

Align data sources with legacy ontologies

Punto Europeo

PuntoGPS

PuntoAsiatico

=

Aeropuertos

f (Aeropuertos)

Ontología O2Ontología O2

Modelo Relacional M1Modelo Relacional M1

PuntoEuropeo

PuntoEspañol

Estación

CentroComunicaciones

Aeropuerto

=Aeropuerto

f (Aeropuertos)

Ontología O1Ontología O1

RC(O2,M1)RC(O1,M1)

Page 88: Introduction to Linked Data

R2O

• is a declarative language to specify mappings between relational data sources and ontologies.

RDB

Relational Model

Persons

Organization <xml>R2O Mapping

</xml>

Ontology

Professor

Student

University

Page 89: Introduction to Linked Data

Example: types of mappings needed

Attibute Direct Mapping

Attibute Mapping with transformation(Regular Expression)

Relation Mapping w. Transformation(Regular Expression)

Relation Mapping w. Transformation(Keyword search)

Page 90: Introduction to Linked Data

Population example (II)

The Operation element defines a transformation based on a regular

expression to be applied to the database column for extracting property values

Population example (II)

Page 91: Introduction to Linked Data

A view maps exactly one concept in the

ontology.

A subset of the columns in the view map a concept in

the ontology.

A subset (selection) of the records of a database view map a concept in the ontology.

A subset of the records of a database view map a concept in the onto. but the selection cannot be made using SQL.

A column in a database view maps directly an attribute or a relation.

A column in a database view maps an attribute or a relation after some transformation.

A set of columns in a database view map an attribute or a relation.

One or more concepts can be extracted from a single data field (not in 1NF).

For concepts...

For attributes...

R2O (Relational-to-Ontology) Language

Page 92: Introduction to Linked Data

R2O Basic Syntax

<conceptmap-def name="Customer"> <identified-by> Table key </identified-by>

<uri-as> operation </uri-as> <applies-if> condition </applies-if> <joins-via> expression </joins-via>

<documentation>description …</documentation> <described-by>attributes,relations</described-by>

</conceptmap-def>

<conceptmap-def name="Customer"> <identified-by> Table key </identified-by>

<uri-as> operation </uri-as> <applies-if> condition </applies-if> <joins-via> expression </joins-via>

<documentation>description …</documentation> <described-by>attributes,relations</described-by>

</conceptmap-def>

<attributemap-def name="http://esperonto/ff#Title"> <aftertransform>

<operation oper-id="constant"> <arg-restriction on-param="const-val">

<has-column>fsb_ajut.titol</has-column> </arg-restriction>

</operation> </aftertransform></attributemap-def>

<attributemap-def name="http://esperonto/ff#Title"> <aftertransform>

<operation oper-id="constant"> <arg-restriction on-param="const-val">

<has-column>fsb_ajut.titol</has-column> </arg-restriction>

</operation> </aftertransform></attributemap-def>

<relationmap-def name="http://esperonto/ff#isCandidateFor"> <to-concept name="http://esperonto/ff#FundOpp">

<joins-via> <operation oper-id=“equals">

<arg-restriction on-param="value1"> <has-column>fsb_ajut.id</has-column>

</arg-restriction> <arg-restriction on-param="value2">

<has-column>fsb_candidate.forFund</has-column> </arg-restriction>

</operation> </joins-via>

</relationmap-def>

<relationmap-def name="http://esperonto/ff#isCandidateFor"> <to-concept name="http://esperonto/ff#FundOpp">

<joins-via> <operation oper-id=“equals">

<arg-restriction on-param="value1"> <has-column>fsb_ajut.id</has-column>

</arg-restriction> <arg-restriction on-param="value2">

<has-column>fsb_candidate.forFund</has-column> </arg-restriction>

</operation> </joins-via>

</relationmap-def>

Page 93: Introduction to Linked Data

ODEMapster

generates RDF instances from relational instances based on the mapping description expressed in the R2O document

Page 94: Introduction to Linked Data

94

Mapping Design

• 3 Mapping Creation Steps

– Load Ontology– Load Database(s)– Create mapping

2 Usage Modes– Online mode (run

time query execution)

– Offline mode (materialized RDF

dump)

Page 95: Introduction to Linked Data

101

Contents

• Introduction to Linked Data• Linked Data publication

- Methodological guidelines for Linked Data publication- RDB2RDF tools- Technical aspects of Linked Data publication

• Linked Data consumption

Page 96: Introduction to Linked Data

RelFinder: finding relations in Linked Data

• E.g., relations between films- “Pulp Fiction”, “Kill Bill” y “Reservoir Dogs”

Page 97: Introduction to Linked Data

Exercise on data.gov.uk

- Public schools in London that contain the word “music”

Page 98: Introduction to Linked Data

Exercise: find information in DBPedia

http://dbpedia.org/resource/Darth_Vader)

(etc)

Find ficticious serial killers in DBPedia

Image by http://www.flickr.com/photos/bflv/

Page 99: Introduction to Linked Data

105

Designing URI sets for the Public Sector (UK)

• http://www.cabinetoffice.gov.uk/media/301253/puiblic_sector_uri.pdf

Page 100: Introduction to Linked Data

106

Asociación Española de Linked Data

Page 101: Introduction to Linked Data

Introduction to Linked Data

Oscar Corcho, Asunción Gómez Pérez ({ocorcho, asun}@fi.upm.es)

Universidad Politécnica de Madrid

Universidad del Valle, Cali, ColombiaSeptember 10th 2010

Credits: Raúl García Castro, Oscar Muñoz, Jose Angel Ramos Gargantilla, María del Carmen Suárez de Figueroa, Boris Villazón, Alex de León, Víctor Saquicela, Luis Vilches, Miguel Angel García, Manuel Salvadores, Guillermo Alvaro, Juan Sequeda, Carlos Ruiz Moreno and many others

Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0