rdfa: introduction, comparison with microdata and microformats and how to use it

19
Navid Mahlouji Jose Luis Lopez Pino Edgar Isaac Hiroshi León Saiki 15/03/13 RDFa

Upload: jose-luis-lopez-pino

Post on 06-May-2015

5.370 views

Category:

Education


0 download

DESCRIPTION

Report for the course 'XML and Web Technologies' of the IT4BI Erasmus Mundus Master's Programme. Introduction, motivation, target domain, schema, attributes, comparing RDFa with RDF, comparing RDFa with Microformats, comparing RDFa with Microdata, how to use RDFa to improve websites, how to extract metadata defined with RDFa, GRDDL and a simple exercise.

TRANSCRIPT

Page 1: RDFa: introduction, comparison with microdata and microformats and how to use it

Navid Mahlouji

Jose Luis Lopez Pino

Edgar Isaac Hiroshi León Saiki

15/03/13

RDFa

Page 2: RDFa: introduction, comparison with microdata and microformats and how to use it

1

Table of Contents

Introduction ....................................................................................................................2

1) Target domain ................................................................................................................3

2) Schema………………………………………………………………………………...4

3) Attributes……….……………………………………………………….…………......4

3.1 Property…………………………………………….………………………….5

3.2 Vocab…………………………………………………..……………………...6

3.3 Resource…………………………………………………..…………………...6

3.4 Typeof………………………………………………………..………………..7

4) Comparisons of RDFa…………………………………………………………………8

4.1 Comparing RDFa with RDF…………………………………………………….8

4.2 Comparing RDFa with Microformats…………………………………………9

4.3 Comparing RDFa with Microdata………………………………………...…11

4.4 Conclusions…………………………………………………………………..11

5) Using RDFa .................................................................................................................12

5.1 Using RDFa to improve websites……………………………………………12

5.2 Extracting the data embedded in RDFa……………………………………...13

5.3 Exercise………………………………………………………………………15

Page 3: RDFa: introduction, comparison with microdata and microformats and how to use it

2

Introduction

In the recent years and by the advancement in web technologies, humans are not

the only consumers of the data available on World Wide Web. There are more and more

machines searching the Internet for data and knowledge than before. It is not enough

anymore to just present your data in a website for people to make a visit. One reason is

that the data quantity on Internet is ever growing. To be able to make this data available

for different sources and purposes we need to find a way to make the data not only

readable for users but also understandable for machines and software. For instance let’s

consider search engines, they use web crawlers tocrawl Internet and gather data and

classify them to be used in search engines. However the density of data is ever growing

and search engines are getting more precise an efficient in finding information in more

detailed format.

Although there are many criticisms on the feasibility of Semantic Web, it aims to

give the massive data which is available on Internet, a structure. Having a structure data

available on Internet can be more readable to machines therefore more useful for humans.

RDFa is a tool by which we can give the data on web pages a structure.

Page 4: RDFa: introduction, comparison with microdata and microformats and how to use it

3

1 Target domain

HTML is a very good and efficient way of presenting data, however when it

comes to machine understanding that data is not efficient at all. In a usual web page, an

author can specify some HTML code like for example a headline, a sub-headline, a block

containing some italicized text, another text block with different size and several links.

While web browsers will effectively represent the HTML code for people to understand

it, nevertheless the computers cannot understand the structure of that data. For instance,

the headline expresses a blog post title, the italicized text the publication date and the

links are categories. Here is an example explaining what browsers and humans see[17].

On the left, we can see what browsers see, and on the right what humans observe [17].

To cover this need we can use XML technology which is very near to HTML and

it can provide structure and semantic to our data. RDFa provides meaningful data for

machines. This information can be available in the XHTML elements that are in the web

page. For example when someone announces a dinner meeting and put it on a web page,

there are applications that extract that information and easily copy to the user’s calendar.

Or when, the contact information from the author’s blog can be registered to the address

book of the user automatically. Once structure of the data is provided, the computer

programs become more useful to understand the meaning of the data so they can use it

efficiently [17].

Page 5: RDFa: introduction, comparison with microdata and microformats and how to use it

4

2 Schema

According to W3C standards, an XHTML-RDFa document identifier should have

the following header :

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"

"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">

You may notice that to conform with XML syntax the header above should

appear after XML declaration:

<?xml version="1.0" encoding="utf-8"?>

Having the above declarations we made sure that our document is being

validated according to XML and RDfa schemas. Also the document can be validated

using W3C Markup Validation Service.[28]

It seems that RDFa schema does not imply any constraints, however since in

RDFa we can use other vocabularies by using full IRI, they might force some integrity

constraints which we have to follow and our document will be validated according to

that.

3Attributes

In the following we will introduce RDFa document and we will show some of its

features and usages. The essence of RDFa is to provide a set of attributes that can be used

to carry metadata in an XML language (hence the 'a' in RDFa).These attributes are: [12]

about – a URI or CURIE specifying the resource the metadata is about.

rel and rev – specifying a relationship and reverse-relationship with another

resource, respectively.

src, href and resource – specifying the partner resource.

property – specifying a property for the content of an element or the partner

resource.

content – optional attribute that overrides the content of the element when using

the property attribute.

datatype – optional attribute that specifies the datatype of text specified for use

with the property attribute.

Page 6: RDFa: introduction, comparison with microdata and microformats and how to use it

5

typeof – optional attribute that specifies the RDF type(s) of the subject or the

partner resource (the resource that the metadata is about).

Vocab – optional attribute that defines a portion of a document from a specific

vocabulary [20].

These attributes will add to HTML, XHTML extension to embed rich metadata

within Web documents. Adding this attribute will not have any effects on the presentation

of data provided by HTML because browsers are only sensitive to some predefined tags

and RDFa attributes are not among them. In this case we can add metadata to the existing

WebPages without any intact to their structure or data [12].

3.1Property

In the following we are presenting a very simple RDFa example from W3C website [17].

This code presents a very simple HTML page which has a title and a date in its

body. From the visual presentation of the page a human can understand that “The Trouble

with Bob” is the title of the topic of this document while the date is the date in which this

document has been created. However the question is that whether a machine like a

crawler can understand this semantic or just look at them as a string and a date? The

answer is obvious; machines need a structure through which they will be able to

understand the meaning of the content. RDFa attributes provide that structure.

Almost everything in RDFa is presented using URL (As it is also the case in the

above example). The reason behind this is rooted to data portability, information sharing

and consistency. Using this method prevents terminologies to be presented ambiguously.

In our example without this, the term "title" might mean “the title of a research paper”, or

“a job title” while it is not. Including all the vocabularies by URL provides detailed

information for both machine and human. To prevent possible errors in typing URLs for

<html> <head> ... </head> <body> ... <h2 property="http://purl.org/dc/terms/title">The Trouble with Bob</h2> <p>Date: <span property="http://purl.org/dc/terms/created">2011- 09-10</span> </p> ...

</body>

Page 7: RDFa: introduction, comparison with microdata and microformats and how to use it

6

every use, RDFa introduces the attribute vocab. This attribute provides the facility to the

author to declare a URL once and use it multiple times.

3.2 Vocab

The following example shows the use of vocab attribute to facilitate the use of URL [17].

In this example we can see that using vocab we are not obliged to only reference

one URL in our document and we can still include new URL presenting new attributes.

3.3 Resource

Sometime in one page multiple terms of one nature have to be presented. In that

case the attribute resource, which specifies the context, is being used. In the following

example two different terms of blog post nature have been presented [17].

In this example we used vocab attribute to be able to avoid retyping the URL.

<html> <head> ... </head> <body vocab="http://purl.org/dc/terms/"> ... <h2 property="title">The Trouble with Bob</h2> <p>Date: <span property="created">2011-09-10</span></p> ... <p>All content on this site is licensed under <a property="http://creativecommons.org/ns#license"href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. ©2011 Alice Birpemswick.</p> </body> </html>

<body vocab="http://purl.org/dc/terms/"> ... <div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> <p>Date: <span property="created">2011-09-10</span></p> <h3 property="creator">Alice</h3> ... </div> ... <div resource="/alice/posts/jos_barbecue"> <h2 property="title">Jo's Barbecue</h2> <p>Date: <span property="created">2011-09-14</span></p> <h3 property="creator">Eve</h3> ... </div> ...

</body>

Page 8: RDFa: introduction, comparison with microdata and microformats and how to use it

7

This page includes two different blog entries each of which has title and created

properties, to be able to distinguish between the two blog entries RDFa introduces

resource attribute.

3.4 Typeof

Alternatively instead of resource attribute we can use typeof attribute, which

specifically helps us to declare a new data item with a certain type. The following

example represents a social network page in which we defined a new type person and

used that type for the owner of the page along with all her friends [17].

<div vocab="http://xmlns.com/foaf/0.1/" typeof="Person"> <p> <span property="name">Alice Birpemswick</span>, Email: <a property="mbox"href="mailto:[email protected]">[email protected]</a>, Phone: <a property="phone"href="tel:+1-617-555-7332">+1 617.555.7332</a> </p> <ul> <li property="knows"typeof="Person"> <a property="homepage" href="http://example.com/bob/"><span property="name">Bob</span> </a> </li> <li property="knows"typeof="Person"> <a property="homepage" href="http://example.com/eve/"><span property="name">Eve</span> </a> </li> <li property="knows"typeof="Person"> <a property="homepage" href="http://example.com/manu/"><span property="name">Manu</span> </a> </li> </ul>

</div>

Page 9: RDFa: introduction, comparison with microdata and microformats and how to use it

8

Here we can see that the page has an owner of type person, who knows three

other persons. The tree representation of the above social network has been shown in the

following figure [17].

4 Comparisons of RDFa

4. 1 Comparing RDFa with RDF

RDFa is related to RDF, which is a standard for sharing data that is

understandable for machines. The Resource Description Framework or RDF is an

abstract representation of the data that can be shown as a graph model, with the idea

of describing in a certain domain, the relationship between the web resources, using a

form of subject-predicate-object and this expression is called a triple.

The subject is represented at the beginning of the arrow, followed by the property

which is the arrow and at the end of the arrow is the object [17], this structure of the RDF

is called triples.

Page 10: RDFa: introduction, comparison with microdata and microformats and how to use it

9

Illustration of the triple [15]

Actually triples represent relationships between related nodes. The objective of

RDF is to present a language to express relationships and data [17].

RDF is an abstract data model which function is to reuse vocabularies in order to

find resources and the relations between them [16]. On the other hand, RDFa express

RDF data within XHTML, letting the computer understand its meaning, while reusing

the existing data that is understandable for human in the document [17].

4.2 Comparing RDFa with Microformats

Microformats are just extensions of HTML which main objective is in fact the

same of RDFa, both encode information to the XHTML documents making the code that

is displayed for humans, readable for computers. In a similar approach, both add

attributes to the XHTML code which is in general hidden from the users, while its only

purpose is to add meaning to the data [13]. Both technologies give structure to the data in

webpages, constructing the so called Semantic Web [23].

In order to apply the Microformats in XHTML code, we can use the following

attributes [18]:

Class– This one contains data to describe properties and behaviour of an

element.

Rel – Describes the location of the element.

Rev– Description of the referenced document.

The rel and rev attributes are applied in RDFa; nevertheless class attribute is not,

RDFa uses the attribute property for describing the resource instead.

In the following example, the contact information of a person is described using

hCard Microformat[14]:

Page 11: RDFa: introduction, comparison with microdata and microformats and how to use it

10

<div class="vcard"> <div class="fn">Toby Segaran</div> <div class="org">The Semantic Programmers</div> <div class="tel">919-555-1234</div> <a class="url" href="http://kiwitobes.com/">http://kiwitobes.com/</a>

</div>

As we can see in the example, a set of classes were embedded in a class vCard

meaning that all belong to the Microformat hCard and with this, some software can

extract this information using hCard Microformat structure and add it to your address

book. Web crawlers can use Microformat hCard to build a database of contacts with their

names, telephones, locations, etc. They extract that information from web sites using

Microformats. For instance, the Microformat hCalendar can be used to create a timeline

based on all the historical data about past events [14], these applications can also be done

through RDFa.

Microformats are predefined, each of them deal with a different purpose. Among

them we can mention hCard that is used for contact information, hResume for CVs,

hNews for news content, etc.

RDFa allows developers to define a namespace. Meaning that publishers are not

restricted only to official vocabularies, this feature makes possible to define their own

vocabulary [14]. For example, if there is a specific domain like chemical data and there

is no Microformat structure in that domain, in this case it is necessary to use RDFa [18].

Other differences between Microformats and RDFa are:

1- In RDFa it is possible to identify a resource by IRI (Internationalized Resource

Identifier) making easier to locate a specific resource, unlike Microformats that

does not support it.

2- Microformats do not support typed literal properties, which means that is not

possible to specify things like units of measurement, such as kilogram or pound,

or some specific numbers, for instance, “+323243453” whether it is an integer or

a phone number. RDFa does support these properties.

3- RDFa allows specifying multiple IRI types per item, in which the web

developers can indicate that a resource on a page is associated with more than one

type, for instance, “AutoPartsStore” and “RepairShop” can be both a business.

Microformats do not support this feature [19].

In conclusion, RDFa is more complex and effective. However there are more

services using Microformats rather than RDFa at the present time. This is because of its

simplicity, since it is not necessary in Microformat to specify an XML Schema[14].

Page 12: RDFa: introduction, comparison with microdata and microformats and how to use it

11

Microformats are heavenly used specially to extract events, contact information

geographical coordinates and social relationships since 2010 [18].

4.3Comparing RDFa with Microdata

Microdata is another XHTML specification which its main goal is to add

semantics to the web content. Microdata follows a similar approach like RDFa, in which

it is possible to define custom vocabulary by the web developers [21]. Moreover in

Microdata it is necessary to follow a standard body when designing vocabularies, which

leads to better designed vocabularies than RDFa. However RDFa can be more complete

since it does not need to follow a standard body [19].

Microdata was designed to be a subset of RDFa with the intention to make it

simpler, for this reason, most of the functions in RDFa are equivalent, just the names of

the attributes are different. It is important to mention that almost 99% of the code

expressed with Microformat can easily be shifted to RDFa by just using its equivalent

functions.

There are some reasons why it is better to use RDFa approach rather than Microdata:

1- RDFa is supported by most of the common search engines, unlike Microdata.

2- RDFa supports some advanced features that still are not available in Microdata.

For instance, it does not support defining units of measurement.

3- RDFa features are improving constantly, unlike Microdata.

In summary, Microdata is just an attempt to do the same as RDFa, with the idea of

reducing complexity, even though is not up till now a standard like RDFa, for this reason,

it lacks compatibility with many applications [22].

4.4 Conclusions

In summary, all three approaches have their own advantages and disadvantages.

While microformats are more used in the market, it lacks some important features. For

instance, it is not possible to create our own vocabularies; one needs to use the

vocabularies that were developed only for microformats. On the other hand, Microdata

can have custom vocabularies; nevertheless some properties are missing, for instance,

advanced features presented in RDFa like units of measurement. Even if RDFa is not the

most used nowadays, it is the one that has more features and cover more domains. In

other words, RDFa is the most powerful among these three approaches.

Page 13: RDFa: introduction, comparison with microdata and microformats and how to use it

12

5 Using RDFa

5.1 Using RDFa to improve websites

RDFa is not supported by schema.org, a shared markup vocabulary defined in

collaboration by Google, Microsoft and Yahoo!, Google [2] has defined specific

vocabulary for reviews, people, products, businesses, organizations, recipes, events and

videos. For instance, in the following picture we can observe how they use the metadata

stored in RDF attributes to improve the result of website reviews, they call those results

"rich snippets".

This is an example from the W3C blog [1] that uses RDFa 1.0 to add metadata to a

review, helping Google to index it:

Page 14: RDFa: introduction, comparison with microdata and microformats and how to use it

13

In 2009 the Central Office of Information had to face a big problem: organise the

job vacancies and they needed to find the way of doing it without changing the websites

of the different public agencies, because they use diverse web technologies. [3]

For this purpose they defined a vocabulary that could also be usable by others.

With this vocabulary they are able to define the details of the job vacancy: the title, the

type, the description, the requirements, language, etc. [4] After that they started to use

this vocabulary implementing RDFa in different websites.

Another case of successful use of RDFa is GoodRelations [5], a vocabulary for e-

commerce that helps to standarise the metadata of different vendors. It helps vertical

searchs, for instance users that look for products in different websites or companies that

need different suppliers. Multiple shop applications like Magent have already included it

in their software solutions and it is possible to define it using RDFa, for example BestBuy

use RDFa to define information about their stores like the opening hours, the location, the

telephone number, etc. [6]

5.2 Extracting the data embedded in RDFa

As we have already mentioned, Google defines a specific vocabulary for

people.To add metadata in the RDF attributes of a XHTML document we can use any

text or source code editor. However it is tricky to check a whole document for extracting

only the metadata from it. In this case, we can use multiple tools to make this task easier,

for instance we can install an extension for Chrome called RDFa Triples Lister that

extracts the metadata of the website we at visiting with this browser:

Page 15: RDFa: introduction, comparison with microdata and microformats and how to use it

14

We can use RDF parsing tools that exact the RDFa embedded in a web page, for

example with the rdfquery [7] tool we can read the RDFa information of BBC

programmes and use it to create links to Spotify and stream the songs[8]. The following

graph, created with RDFa play [9], shows the RDF information extracted from a

programme of the BBC [10]:

Finally, the W3C has defined a mechanism to extract data compatible with the

Resource Description Framework, including RDFa. For this purpose we have to define

transformations that are instructions for extracting any embedded data properly [24].

For RDFa we can find a style sheet that defines the transformations that has to be

done to a XHTML+RDFa document to extract the RDF data.[26]

For instance, the following image illustrates an interesting example: we can find

in the web different calendars and probably the metadata is defined using different

techniques (microformats, RDFa, etc.). The GRDDL transformations specify how to

extract the RDF data from each document. Once we have extracted the RDF triples we

can process them using for example SPARQL (query language for RDF).[25]

Page 16: RDFa: introduction, comparison with microdata and microformats and how to use it

15

5.3 Exercise

As we have already mentioned, Google defines a specific vocabulary for people.

This vocabulary is very useful to make our social networking information accessible. The

properties define in the vocabulary are:

We need to use the vocabulary presented above, to modify this webpage adding

metadata to it with RDFa:

Page 17: RDFa: introduction, comparison with microdata and microformats and how to use it

16

A possible solution could be:

Page 18: RDFa: introduction, comparison with microdata and microformats and how to use it

17

REFERENCES

[1] W3C. RDFa 1.1 with a rich snippet example. Retrieved from W3C org:

http://www.w3.org/QA/2011/05/rdfa_11_with_a_rich_snippet_ex.html

[2] Google. Rich snippets (microdata, microformats, RDFa, and Data Highlighter). Retrieved from Google

support: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=99170

[3] Birbeck, M. More RDFa goodness from UK government web-sites. Retrieved from Internet-Apps :

http://internet-apps.blogspot.fr/2009/04/more-rdfa-goodness-from-uk-government.html

[4] Birbeck Mark. (n.d.).Argot Vacancy. Retrieved from Google code: https://code.google.com/p/argot-

hub/wiki/ArgotVacancy

[5] GoodRelations Wiki. (n.d.).The Web Vocabulary for E-Commerce. Retrieved from Good Relations

Vocabulary: http://wiki.goodrelations-vocabulary.org/Quickstart

[6] Myers, J. CREATING LOCAL VISIBILITY TO OPEN BOX PRODUCTS WITH FRONT-END

SEMANTIC WEB. Retrieved from Beweep: http://jay.beweep.com/2010/03/30/creating-local-visibility-to-

open-box-products-with-front-end-semantic-web/

[7] Google. rdfquery. Retrieved from Google Code: https://code.google.com/p/rdfquery/

[8] Adding Spotify links to BBC Radio playlists, via RDFa, using GreasemonkeyandrdfQuery. Retrieved

from http://hublog.hubmed.org/archives/001913.html

[9] RDFa Group. RDFa Info. Retrieved from http://rdfa.info/play/

[10] Use of Semantic Web technologies on the BBC Web Sites. Retrieved from:

http://www.cmswire.com/cms/information-management/bbcs-adoption-of-semantic-web-technologies-an-

interview-017981.php

[11] Rich snipplets – People. Retrieved from

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146646

[12] Wikipedia. RDFa. Retrieved from http://en.wikipedia.org/wiki/RDFa

[13] Prodromou, E. (2008, 10 12). RDFavsmicroformats . Retrieved from http://evan.prodromou.name:

http://evan.prodromou.name/RDFa_vs_microformats

[14] Toby Segaran, C. E. (2009). Programming the semantic web. O'Reilly.

[15] W3C. (2004, February 10). Resource Description Framework (RDF):. Retrieved from W3.org:

http://www.w3.org/TR/rdf-concepts/#section-triples

[16] W3C. (2004, February 10). W3.org. Retrieved from RDF Vocabulary Description Language 1.0: RDF

Schema: http://www.w3.org/TR/2004/REC-rdf-schema-20040210/

[17] W3C Working Group . (2012, June 07). RDFa 1.1 Primer. Retrieved from W3C :

http://www.w3.org/TR/xhtml-rdfa-primer/

[18] Wikipedia. Microformat. Retrieved from: http://en.wikipedia.org/wiki/Microformats#cite_note-

Wharton000-2

Page 19: RDFa: introduction, comparison with microdata and microformats and how to use it

18

[19] Sporny, M. An Uber-comparison of RDFa, Microdata and Microformats. Retrieved from many sporny

organization: http://manu.sporny.org/2011/uber-comparison-rdfa-md-uf/

[20] W3C Group. RDFa Syntax. Retrieved from: http://www.w3.org/TR/rdfa-syntax/

[21] Wikipedia. Microdata. Retrieved from http://en.wikipedia.org/wiki/Microdata_(HTML)#cite_note-

DIVE-4

[22] Sporny, M. (n.d.). Mythical Differences: RDFaLite vs. Microdata. Retrieved from Manu Sporny

Organization: http://manu.sporny.org/2012/mythical-differences/

[23] Wikipedia. Semantic Web. Retrieved from: http://en.wikipedia.org/wiki/Semantic_Web

[24] Gleaning Resource Descriptions from Dialects of Languages (GRDDL). http://www.w3.org/TR/grddl/

[25] GRDDL Use Cases: Scenarios of extracting RDF data from XML

documentshttp://www.w3.org/TR/2007/NOTE-grddl-scenarios-20070406/

[26] RDFa2RDFXML style sheet. http://www.w3.org/TR/grddl-primer/RDFa2RDFXML.xsl

[28] W3C Markup Validation Service http://validator.w3.org/