scripting user contributed interlinking

17
Institute of Information Systems & Information Management Scripting User Contributed Interlinking Michael Hausenblas, Wolfgang Halb, and Yves Raimond SFSW08, Tenerife, Spain 2008-06-02

Upload: whalb

Post on 18-May-2015

885 views

Category:

Technology


0 download

DESCRIPTION

Presentation about User Contributed Interlinking at Scripting for the Semantic Web (SFSW) 2008 workshop at European Semantic Web Conference (ESWC) 2008

TRANSCRIPT

Page 1: Scripting User Contributed Interlinking

Institute of Information Systems & Information Management

Scripting User Contributed Interlinking

Michael Hausenblas, Wolfgang Halb, and Yves RaimondSFSW08, Tenerife, Spain

2008-06-02

Page 2: Scripting User Contributed Interlinking

2

Agenda Linked Data 101 A first step in UCI – http://riese.joanneum.at Towards Generalising UCI Demo

Page 3: Scripting User Contributed Interlinking

3

Linked Data: Principles Items should be identified using URI references [

URIrefs] (and: don’t use bNodes) URIrefs should be dereferenceable: using HTTP

URIs allows looking up the items identified through URIrefs, cf. [http-range-14 TAG finding]

Looking up an URIref leads to more data [follow-your-nose principle]

Links to other URIrefs should be included in order to enable the discovery of more data [How to Publish Linked Data on the Web]

Page 4: Scripting User Contributed Interlinking

4

Linked Data: Datasets (2008)

By courtesy of Richard Cyganiak, http://richard.cyganiak.de/2007/10/lod/

Page 5: Scripting User Contributed Interlinking

5

Linked Data: Issues Building

RDFising process (schema, mapping) Interlinking (automagically, manual) Deployment (SPARQL end point, dump, RDFa, etc.)

Using Provenance, trust, rights, etc. Access (depending on deployment) Performance (deref chain, reliability) Discovery (which is the right LOD dataset for my task ?)

Page 6: Scripting User Contributed Interlinking

6

A first step in UCI - riese

http://riese.joanneum.at

Page 7: Scripting User Contributed Interlinking

7

riese: A first step in UCI riese, the ‘RDFizing and Interlinking the EuroStat

Dataset Effort’ aims to offer an RDFised and interlinked version of the Eurostat data (http://ec.europa.eu/eurostat)

Eurostat data is high-volume data (5 GB data dump in approx. 4,000 TSV files; 350 million data values 80,000 different data codes)

Currently we serve 3.6 million triples, interlinking with Geonames (DBpedia and Wordnet upcoming)

Data is exposed as XHTML+RDFa, SPARQL end-point and as dump (+semantic sitemap description)

Page 8: Scripting User Contributed Interlinking

8

riese: architecture

Page 9: Scripting User Contributed Interlinking

9

riese: inside Server

Apache 2.2 SWI-Prolog PHP 5 p2r/Ceriese (see Yves’s blog post) (RDF/XML documents in the file system)

Client XHTML+RDFa Javascript/Yahoo! Interface Library [YUI]

Vocabulary (triggered the development of scovo, the Statistical Core Vocabulary together with Talis and Lee Feigenbaum, see http://purl.org/NET/scovo)

Page 10: Scripting User Contributed Interlinking

10

riese: User Contributed Interlinking

Page 11: Scripting User Contributed Interlinking

11

riese: User Contributed Interlinking

Page 12: Scripting User Contributed Interlinking

12

riese: issues Dynamic content (Ajax) vs. embedded metadata

(RDFa). Local agent has the data in the DOM, but external agent can not access it. No real solution, yet.

Scalability & Performance. When data is fine-granular and high-volume, how much to embed directly in a page?

How to notify users about data updates? We currently experiment with AtomOwl deployed in RDFa (http://riese.joanneum.at/updates/)

Page 13: Scripting User Contributed Interlinking

13

Towards Generalising UCI Next step after riese was to decouple the UCI and

generalise it. The result is: I R S (interlinking of resources with semantics, see also poster session)

I R S features query, add, remove semantic links (owl:sameAs, rdfs:seeAlso,

foaf:topic, etc.) subject and object can be set by user (restriction: URIs only) resource preview (debug) expose data in XHTML+RDFa + SPARQL end point lookup in http://sindice.com for unknown resources simple provenance tracking through named graphs

Page 14: Scripting User Contributed Interlinking

14

Towards Generalising UCI: I R S

Page 15: Scripting User Contributed Interlinking

15

Towards Generalising UCI: I R S

Page 16: Scripting User Contributed Interlinking

16

I R S issues

Motivation for end-user to contribute has yet to be researched

Trust issues arise (experimenting with OpenID) Generic UCI requires high level of abstraction

(maybe only for geeks and not suitable for an end-user)

To get an overview of what is available some other mechanism should be offered (currently only SPARQL end point)

Validation of resources is desirable (e.g. type of target, information vs. non-information resource, etc.)

Page 17: Scripting User Contributed Interlinking

17

Discussion UCI can help creating high-quality semantic links Social process needs to be researched (might turn

out that it is pretty similar to the Wiki ecosystem) Some type of content such as multimedia content

might benefit more from UCI than others Is generic UCI only for geeks? To really be

successful, the UCI likely needs to be embedded into a domain-specific application

BTW, I R S is also a nice LOD debugger ;) Questions?