pal gov.tutorial2.session15 1.linkeddata

22
1 PalGov © 2011 فلسطينيةلكترونية الديمية الحكومة ا أكاThe Palestinian eGovernment Academy www.egovacademy.ps Tutorial II: Data Integration and Open Information Systems Session 15.1 The Data Web and Linked Data Dr. Mustafa Jarrar University of Birzeit [email protected] www.jarrar.info

Upload: mustafa-jarrar

Post on 06-Jul-2015

576 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Pal gov.tutorial2.session15 1.linkeddata

1PalGov © 2011

أكاديمية الحكومة اإللكترونية الفلسطينيةThe Palestinian eGovernment Academy

www.egovacademy.ps

Tutorial II: Data Integration and Open Information Systems

Session 15.1

The Data Web and Linked Data

Dr. Mustafa Jarrar

University of Birzeit

[email protected]

www.jarrar.info

Page 2: Pal gov.tutorial2.session15 1.linkeddata

2PalGov © 2011

About

This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the

Commission of the European Communities, grant agreement 511159-TEMPUS-1-

2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps

University of Trento, Italy

University of Namur, Belgium

Vrije Universiteit Brussel, Belgium

TrueTrust, UK

Birzeit University, Palestine

(Coordinator )

Palestine Polytechnic University, Palestine

Palestine Technical University, PalestineUniversité de Savoie, France

Ministry of Local Government, Palestine

Ministry of Telecom and IT, Palestine

Ministry of Interior, Palestine

Project Consortium:

Coordinator:

Dr. Mustafa Jarrar

Birzeit University, P.O.Box 14- Birzeit, Palestine

Telfax:+972 2 2982935 [email protected]

Page 3: Pal gov.tutorial2.session15 1.linkeddata

3PalGov © 2011

© Copyright Notes

Everyone is encouraged to use this material, or part of it, but should

properly cite the project (logo and website), and the author of that part.

No part of this tutorial may be reproduced or modified in any form or by

any means, without prior written permission from the project, who have

the full copyrights on the material.

Attribution-NonCommercial-ShareAlike

CC-BY-NC-SA

This license lets others remix, tweak, and build upon your work non-

commercially, as long as they credit you and license their new creations

under the identical terms.

Page 4: Pal gov.tutorial2.session15 1.linkeddata

PalGov © 2011 4

Tutorial Map

Topic h

Session 1: XML Basics and Namespaces 3

Session 2: XML DTD’s 3

Session 3: XML Schemas 3

Session 4: Lab-XML Schemas 3

Session 5: RDF and RDFs 3

Session 6: Lab-RDF and RDFs 3

Session 7: OWL (Ontology Web Language) 3

Session 8: Lab-OWL 3

Session 9: Lab-RDF Stores -Challenges and Solutions 3

Session 10: Lab-SPARQL 3

Session 11: Lab-Oracle Semantic Technology 3

Session 12_1: The problem of Data Integration 1.5

Session 12_2: Architectural Solutions for the Integration Issues 1.5

Session 13_1: Data Schema Integration 1

Session 13_2: GAV and LAV Integration 1

Session 13_3: Data Integration and Fusion using RDF 1

Session 14: Lab-Data Integration and Fusion using RDF 3

Session 15_1: Data Web and Linked Data 1.5

Session 15_2: RDFa 1.5

Session 16: Lab-RDFa 3

Intended Learning Objectives

A: Knowledge and Understanding

2a1: Describe tree and graph data models.

2a2: Understand the notation of XML, RDF, RDFS, and OWL.

2a3: Demonstrate knowledge about querying techniques for data

models as SPARQL and XPath.

2a4: Explain the concepts of identity management and Linked data.

2a5: Demonstrate knowledge about Integration &fusion of

heterogeneous data.

B: Intellectual Skills

2b1: Represent data using tree and graph data models (XML &

RDF).

2b2: Describe data semantics using RDFS and OWL.

2b3: Manage and query data represented in RDF, XML, OWL.

2b4: Integrate and fuse heterogeneous data.

C: Professional and Practical Skills

2c1: Using Oracle Semantic Technology and/or Virtuoso to store

and query RDF stores.

D: General and Transferable Skills2d1: Working with team.

2d2: Presenting and defending ideas.

2d3: Use of creativity and innovation in problem solving.

2d4: Develop communication skills and logical reasoning abilities.

Page 5: Pal gov.tutorial2.session15 1.linkeddata

5PalGov © 2011

Module ILOs

After completing this module students will be

able to:

-Explain the concepts of identity management and

linked data.

- Understand basic concepts of the Data Web.

-Integrate and fuse heterogeneous data.

Page 6: Pal gov.tutorial2.session15 1.linkeddata

6PalGov © 2011

Semantic/ Data Web/ Web 3.0?

“The Semantic Web is a web of data, in

some ways like a global database”,Tim Berners-Lee – Inventor of the WWW.

“The goal of the Semantic Web is

to create a universal medium for the

exchange of DATA”, W3C.

Page 7: Pal gov.tutorial2.session15 1.linkeddata

7PalGov © 2011

Web of Data

• The Data Web envisions the web as a world-wide

interlinked structured data.

• The Web as we know it today is a global information space

of linked documents.

• The same vision is applied to data: publishing and

connecting structured data on the web.

Page 8: Pal gov.tutorial2.session15 1.linkeddata

8PalGov © 2011

Classical Web

• The classical web a global

information space of linked

documents.

• Primary Units of the hypertext

Web are:

– HTML Documents,

– Connected by Hyperlinks

Diagram Source: Christian Bizer

Page 9: Pal gov.tutorial2.session15 1.linkeddata

9PalGov © 2011

The challenge

• The problem is that the information on the

classical web is not structured.

– Programs cannot use such information in a useful way.

• The Solution is to increase the structure of

published information.

Page 10: Pal gov.tutorial2.session15 1.linkeddata

10PalGov © 2011

Web APIs and Mashups

• Many major data sources such as Amazon, Yahoo!, eBay,

and Google provide access to their data through APIs.

• Currently, programmableweb.com lists 3891 APIs and

6101 mashups (up to 14. Sep 2011).

API

API

API

MashUp

Diagram Source: Christian Bizer

Page 11: Pal gov.tutorial2.session15 1.linkeddata

11PalGov © 2011

Web APIs and Mashups

• However,

– APIs provide proprietary interfaces,

– Data retrieved from these APIs is represented using different formats

(different data models).

– Mashups created using these APIs are based on a fixed set of data

sources. This is because entities in different APIs are not linked.

– You can not set hyperlinks between entities.

APIs separates

data

Picture Source: Christian Bizer

Page 12: Pal gov.tutorial2.session15 1.linkeddata

12PalGov © 2011

Beyond Web APIs and Mashups:

The Data Web and Linked Data

• The Data Web envisions the web as a world-wide

interlinked structured data.

• Linked data refers to the set of best practices for

publishing and connecting structured data on the web.

• Linked data best practices has lead to the extension of the

web connecting data from diverse domains such as:– People, companies, books, scientific publications, films, music, television

and radio programs, genes, proteins drugs, clinical trials, online

communities, statistical and scientific data, reviews, …

Page 13: Pal gov.tutorial2.session15 1.linkeddata

13PalGov © 2011

The Data Web and Linked Data

• While the primary units of the hypertext Web are HTML documents

connected by un-typed Hyperlinks, Linked Data relies on documents

containing data in RDF. However, rather than simply connecting these

documents, Linked Data uses RDF to make typed statements that link

arbitrary things in the world.

• The result is a web of things in the world, described by data on the Web

Diagram Source: Christian Bizer

Page 14: Pal gov.tutorial2.session15 1.linkeddata

14PalGov © 2011

The Data Web and Linked Data

Berners-Lee (2006) outlined a set of 'rules' for publishing

data on the Web in a way that all published data becomes

part of a single global data space:

1. Use URIs as names for things

2. Use HTTP URIs so that people can look up those

names

3. When someone looks up a URI, provide useful

information, using the standards (RDF, SPARQL)

4. Include links to other URIs, so that they can discover

more things

Page 15: Pal gov.tutorial2.session15 1.linkeddata

15PalGov © 2011

Properties of the Web of Linked Data

• Anyone can publish data to the Web of Linked Data

• Entities are connected by links

– creating a global data graph that spans data sources and enables

the discovery of new data sources.

• Data is self-describing

– If an application encounters data represented using an unfamiliar

vocabulary, the application can resolve the URIs that identify

vocabulary terms in order to find their RDFS or OWL definition.

• The Web of Data is open

– meaning that applications can discover new data sources at run-

time by following links.

Page 16: Pal gov.tutorial2.session15 1.linkeddata

16PalGov © 2011

Realization: Linking Open Data Project

Page 17: Pal gov.tutorial2.session15 1.linkeddata

17PalGov © 2011

Realization: Linking Open Data Project

• Grassroots community effort to:

– publish existing open license datasets as Linked Data

on the Web

– interlink things between different data sources

• By September 2010 the cloud had grown to 25

billion RDF triples, interlinked by around 395

million RDF links.

Page 18: Pal gov.tutorial2.session15 1.linkeddata

18PalGov © 2011

Linking Data

• How are same entities described in different datasets linked?

• AGAIN: By linking the Global Identifier, that is, the URI!

• Let’s have a look at real examples from real datasets:

<http://dbpedia.org/resource/Bethlehem> owl:sameAs

<http://sws.geonames.org/284315/>

<http://dbpedia.org/resource/Tim_Berners-Lee> owl:sameAs

<http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007>

• Linking the entity “Bethlehem” between the DBPedia dataset and the Geonames dataset

in the Linking Open Data cloud.

• This is done by linking the URIs of “Bethlehem” in both datasets using owl:sameAs.

• Linking the entity “Tim Berners-Lee” between the DBPedia dataset and the DBLP dataset .

• This is done by linking the URIs of “Tim Berners-Lee” in both datasets using owl:sameAs.

NOTE: The student is encouraged to visit the URIs specified above.

Page 19: Pal gov.tutorial2.session15 1.linkeddata

19PalGov © 2011

Resources

http://dbpedia.org/resource/Bethlehem

(Bethlehem URI in DBPedia)

http://sws.geonames.org/284315/

(Bethlehem URI in Geonames)

Page 20: Pal gov.tutorial2.session15 1.linkeddata

20PalGov © 2011

Applications

• What Can I do with this?

Diagram Source: Christian Bizer

Page 21: Pal gov.tutorial2.session15 1.linkeddata

21PalGov © 2011

Let’s draw a graph of our example!

v:Person

“George Mousa”

“Geno”

v:Address

“Nablus”

“Palestine”

“George Mousa”

v:nickname

v:city

Page 22: Pal gov.tutorial2.session15 1.linkeddata

22PalGov © 2011

References

• Christian Bizer: The Emerging Web of Linked Data. Presentation at

SRI International, Artificial Intelligence Center. Menlo Park, USA. 2009.

• W3C: www.w3c.org

• Linking Open Data: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData