cornell cs 502 metadata for the web from discovery to description cs 502 – 20020226 carl lagoze...
Post on 22-Dec-2015
216 Views
Preview:
TRANSCRIPT
Cornell CS 502
Metadata for the WebFrom Discovery to Description
CS 502 – 20020226Carl Lagoze – Cornell University
Cornell CS 502
Co-existing Cost/Functionality Levels
Gre
ate
r Fun
ction
ality
&
Cost
Cornell CS 502
Dublin Core Qualifiers
• From fuzzy buckets to more specific description
• Model of “graceful degradation”– Support both simplicity and specificity– Intra-domain and inter-domain semantics
Cornell CS 502
Resource has property
DC:CreatorDC:TitleDC:SubjectDC:Date...
X
implied subject
impliedverb
one of 15properties
property value(an appropriateliteral)
[optional qualifier]
[optional qualifier]
qualifiers(adjectives)
Cornell CS 502
Varieties of qualifiers: Element Refinements
• Make the meaning of an element narrower or more specific.
• Narrowing implies an is a relationship – a "date created“ is a "date“– an "is part of relation“ is a "relation“
• If your software does not understand the qualifier, you can safely ignore it.
Cornell CS 502
Varieties of Qualifiers: Value Encoding Schemes
• Says that the value is– a term from a controlled vocabulary (e.g., Library of
Congress Subject Headings)– a string formatted in a standard way (e.g., "2001-05-
02" means May 3, not February 5)
• Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.
Cornell CS 502
Resource has Date "2000-06-13"Revised
ISO8601
Resource has Subject "Languages -- Grammar"LCSH
Cornell CS 502
Dumb-Down Principle for Qualifiers
• The fifteen elements should be usable and understandable with or without the qualifiers
• Qualifiers refine meaning (but may be harder to understand)
• Nouns can stand on their own without adjectives
• If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!
• "has a“ relations break the model– E.g., a creator has a hair color
Cornell CS 502
Resource has Date "2000-06-13"Revised
ISO8601
Resource has Subject "Languages -- Grammar"LCSH
Test for “good““ qualifiers:cover and ask: -- Does the statement still make sense? -- Is it still correct?
Cornell CS 502
Resource has subjectaudience
Resource has creatoraffiliation
“Incorrect” Qualification
“Cornell University”
“pre-schoolers”
Cornell CS 502
Open questions in this model
• Are uncontrolled and unconstrained values really useful for discovery?
• Is it possible for an organization (DCMI) to control the evolution of a language?
• How can "simple discovery metadata" be combined with complex descriptions? Is there a notion of graceful degradation?
• Can DC serve as a lingua franca (mapping template) among more complex models
Cornell CS 502
Models for Deploying Metadata
• Embedded in the resource– low deployment threshold– Limited flexibility, limited model
• Linked to from resource– Using xlink– Is there only one source of metadata?
• Independent resource referencing resource– Model of accessing the object through its surrogate
Cornell CS 502
Syntax Alternatives:HTML
• Advantages:– Simple Mechanism – META tags embedded in content– Widely deployed tools and knowledge
• Disadvantages– Limited structural richness (won’t support
hierarchical,tree-structured data or entity distinctions).
Cornell CS 502
Dublin Core in HTML
• http://www.dublincore.org/documents/2000/08/15/dcq-html/
• HTML constructs– <link> to establish pseudo-namespace– <meta> for metadata statements
• name attribute for DC element (DC.element.ER)
• content attribute for element value
• scheme attribute for encoding scheme or controlled vocabulary
• lang attribute for language of element value
Cornell CS 502
Dublin Core in HTML example
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1"> <meta name="DC.Title" content="Business Unusual”><meta name=“DC.Title” lang=“es” content=“negocio inusual”> <meta name="DC.Creator" content="Carl Lagoze"> <meta name="DC.Subject" content="bibliographic control web cataloging "> <meta name="DC.Date.Created" scheme="W3CDTF"
content="2000-10-23"> <meta name="DC.Format" content="text/html"> <meta name="DC.Identifier" content="http://lcweb.loc.gov/lagoze_paper.html">
Cornell CS 502
Unqualified Dublin Core in XML
http://www.dublincore.org/documents/2000/11/dcmes-xml/
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF SYSTEM "http://dublincore.org/2000/12/01-dcmes-xml-dtd.dtd">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://www.ilrt.bristol.ac.uk/people/cmdjb/">
<dc:title>Dave Beckett's Home Page</dc:title>
<dc:creator>Dave Beckett</dc:creator>
<dc:publisher>ILRT, University of Bristol</dc:publisher>
<dc:date>2000-06-06</dc:date>
</rdf:Description>
</rdf:RDF>
Cornell CS 502
Example of Dublin Core Use
A map in the United States Library of Congress on-line American Memory Collection
Cornell CS 502
Title
The name given to the resource
< META name = “DC.Title” content = “Novi Belgii Novæque Angliæ:nec non partis Virginiæ tabula multis in locis emendata ” lang = “la” >
Cornell CS 502
Creator
An entity primarily responsible for making the content of the resource
< META name = “DC.Creator” content = “Nicolaum Visscher” >
Cornell CS 502
Subject
The topic of the content of the resource
< META name = “DC.Subject” content = “Middle Atlantic States” scheme = “LCSH”>< META name = “DC.Subject” content = “Maps” scheme = “LCSH”>< META name = “DC.Subject” content = “Early works to 1800” scheme = “LCSH”>
Cornell CS 502
Description
An account of the content of the description
< META name = “DC.Description.Abstract” content = “An historical map showing the coast of New Jersey as perceived in the seventeenth century”>
Cornell CS 502
Publisher
An entity responsible for making the resource available
< META name = “DC.Publisher” content = “Library of Congress, United States”>
Cornell CS 502
Contributor
An entity responsible for making contributions to the content of the resource.
< META name = “DC.Contributor” content = “Historic Urban Plans”>
Cornell CS 502
Date
A date associated with an event in the lifecycle of the resource
< META name = “DC.Date.Created” content = “1996-04-17” scheme = “W3C-DTF” >
Cornell CS 502
Type
The nature or genre of the content of the resource
< META name = “DC.Type” content = “image”
scheme = “DCMIType”>
Cornell CS 502
Format
The physical or digital manifestation of the resource
< META name = “DC.Format.Medium” content = “image/gif” scheme = “IMT”>
< META name = “DC.Format.Extent” content = “556K”>
Cornell CS 502
Identifier
An unambiguous reference to the resource in the current context
< META name = “DC.Identifier” content = “http://loc.gov/coll1/img456.jpg” scheme = “URI”>
Cornell CS 502
Source
A reference to a resource from which the present resource is derived.
< META name = “DC.Source” content = “G3715 1685 .V5 1969 (LOC catalog #)” >
Cornell CS 502
Language
Language of the intellectual content of the object
< META name = “DC.Language” content = “nl”
scheme = “ISO 639-2”>
Cornell CS 502
Relation
A reference to a related resource
< META name = “DC.Relation.isPartOf” content = “http://lcweb2.loc.gov/ammem/
gmdhtml/dsxpimg.html” scheme = “URI”>
Cornell CS 502
Coverage
The extent or scope of the content of the resource
< META name = “DC.Coverage.Spatial” content = “New Jersey” scheme = “TGN" >< META name = “DC.Coverage.Temporal” content = “1650” scheme = W3C-DTF”>
Cornell CS 502
Rights
Information about rights in and over the resource
< META name = “DC.Rights” content = “http://www.loc.gov/ rights_statement.htm”>
Cornell CS 502
Distributed ContentThe Metadata Challenge
• From fixed, contained physical artifacts to fluid, distributed digital objects
• Need for basis of trust and authenticity in network environment
• Decentralization and specialization of resource description and need for mapping formalisms
Cornell CS 502
Multi-entity nature of object description
Photographer
Camera type Software
Computer artist
Cornell CS 502
Understanding Metadata based on Query Capabilities
• Simple boolean tags?– Creator=“Tom Baker” and “Title” contains “Dublin
Core”
• Agent, time, place questions?– Who was responsible for what and when and where
Cornell CS 502
Attribute/Value approaches to metadata…
Hamlet has a creator Shakespeare
subject implied verb metadata noun literal
Play
wrig
ht
metadata adjective
The playwright of Hamlet was Shakespeare
R1
“Shakespeare”
“Hamlet”
dc:creator.playwright
dc:title
Cornell CS 502
…run into problems for richer descriptions…
Hamlet has a creator Stratford
birt
hpla
ce
The playwright of Hamlet was Shakespeare,who was born in Stratford
“Stratford”R1
“Shakespeare”dc:creator.playwright
dc:creator.birthplace
Cornell CS 502
…because of their failure to model entity distinctions
R1
“Stratford”
creatorR2
name “Shakespeare”
birthplacetitle
“Hamlet”
Cornell CS 502
Applying a Model-Centric Approach
• Formally define common entities and relationships underlying multiple metadata vocabularies
• Describe them (and their inter-relationships) in a simple logical model
• Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.
Cornell CS 502
Events are key to understanding metadata relationships?
• Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles.
• Clarifying attachment points facilitates understanding and querying “who was responsible for what when”.
Cornell CS 502
ABC/Harmony Event-aware metadata ontology• Recognizing inherent lifecycle aspects of
description (esp. of digital content)• Modeling incorporates time (events and
situations) as first-class objects– Supplies clear attachment points for agents, roles,
existential properties
• Resource description as a “story-telling” activity
Cornell CS 502
Resource-centric Metadata
Title Anna Karenina
Author Leo Tolstoy
Illustrator Orest Vereisky
Translator Margaret Wettlin
Date Created 1877
Date Translated 1978
Description Adultery & Depression
Birthplace Moscow
Birthdate 1828
?
Cornell CS 502
“translator”
“Margaret Wettlin”“Orest Vereisky”
“illustrator”
“Anna Karenina”
“Tragic adultery andthe search for meaningfullove”
“English”
“author”
“creation”
“1877”“1978”
“translation”
“Russian”
“Leo Tolstoy”"Moscow"
“1828”
Cornell CS 502
Queries over complex descriptive graphs
• Ability to ask questions like “show me all the translations of War and Peace between 1980 and 1990”
top related