數位圖書館 – 知識架構的理論與發展 jian-hua yeh ( 葉建華 )...

53
數數數數數 – 數數數數數數數數數數 Jian-hua Yeh ( 葉葉葉 ) 葉葉葉葉葉葉葉葉葉葉葉葉葉 [email protected]

Upload: shannon-black

Post on 27-Dec-2015

241 views

Category:

Documents


8 download

TRANSCRIPT

數位圖書館 –知識架構的理論與發展

Jian-hua Yeh (葉建華 )

真理大學資訊科學系助理教授[email protected]

2

Outline

• Ontology

– The problem

– What is an ontology?

– Why develop an ontology?

– Usage of ontology

– Complexity and Processing of ontology

– OWL introduction

• Topic maps

– Concepts

3

The Problem

• With the increasing complexity of our systems and our IT needs, we need to go to human level interaction

• We need to maximize the amount of Semantics we can utilize

• From data and information level, we need to go to human semantic level interaction

DATA Information Knowledge

Run84

ID=08

NULLPARRT

ACC

ID=34

e

5

&

#

~

@

¥

¥

�

Å

Tank

¥

Noise Human Meaning

VehicleLocated at

Semi-mountainous terrainobscured

decide

Vise maneuver

• And represented semantics means multiple represented semantics, requiring semantic integration

4

Simple Metadata: XML

Advancing Along the Interpretation Continuum

Human interpreted Computer interpreted

DATA KNOWLEDGE• Relatively unstructured• Random

• Very structured• Logical

Moving to the right depends on increasing automated semantic interpretation

• Info retrieval

• Web search

• Text summarization• Content extraction• Topic maps

• Reasoning services

• Ontology Induction

...Display raw documents;All interpretation done by humans

Find and correlate patterns in raw docs; display matches only

Store and connect patterns via conceptual model (i.e,. an ontology); link to docs to aid retrieval

Automatically acquire concepts; evolve ontologies into domain theories; link to institution repositories (e.g., MII)

Richer Metadata: RDF/S

Very Rich Metadata: DAML+OIL

Automatically span domain theories and institution repositories; inter-operate with fully interpreting computer

Interpretation Continuum

5

Dimensions of Interoperability & Integration

Enterprise

Object

Data

System

Application

Component

0% 100%

6 Levels o

f Inte

ropera

bility

3 Kinds of Integration

Interoperability Scale

Our interest lies here

Community

6

Information Semantics

• Provide semantic representation (meaning) for our systems, our data, our documents, our agents

• Focus on machines more closely interacting at human conceptual level

• Spans Ontologies, Knowledge Representation, Semantic Web, Semantics in NLP, Knowledge Management

• Linking notion is Ontologies (rich formal models)

7

The Smart Data Enterprise

Data has progressed through four stages of increasing intelligence

8

Triangle of Signification

Terms

Concepts

Real (& Possible)World Referents

SenseReference/Denotation

<Joe_ Montana >

“Joe” + “Montana”

Syntax: Symbols

Semantics: Meaning

Pragmatics: Use

Intension

Extension

9

What is an Ontology?

• Many definitions of an ontology contradict one another.

• One formal definition

– A formal explicit description of concepts in a domain of discourse (classes), properties of each concept describing various features and attributes of the concept (slots), and restriction on slots.

10

What is an Ontology? (2)

• Another definition

– The subject of ontology is the study of the categories of things that exist or may exist in some domain.

• A simple definition

– Ontology is about the exact description of things and their relationships.

• “An ontology is a specification of a conceptualization” [Gruber 95]

11

0 2000

1613384-322 BC

Aristotle‘Ontology’

coined

1967

First occurrence of ontology in Information Science

10001721

First occurrence in OED

Ontology Background

Timeline (Smith 2002)

12

Aristotle's Categories

13

Genus and Differentiate

14

Cyc Project: Large Ontology

• Cyc contains about 100,000 concept types

15

Why Develop an Ontology?

• Semantic Interoperability

– Generalized database integration

– Virtual Enterprises

– e-commerce

• Information Retrieval

– Decoupling user vocabulary from data vocabulary

– Query answering over document sets

– Natural Language Processing

16

Different Uses of Ontologies

• Application ontologies (run time)– Offer terminological services, checking constraints between terms

– Limited expressivity (stringent computational reqs)

• Reference ontologies (develop. time)– Establish consensus about meaning of terms (in general)

– Higher expressivity (less stringent computational reqs)

• Mutual understanding more important than mass interoperability– Understanding disagreements

– Establish trustable mappingsamong application ontologies

17

Ontology Structure Levels

• The term ontology has been used to describe models with different degrees of structure (Ontology Spectrum)

– Less structure: Taxonomies (Semio taxonomies, Yahoo hierarchy, biological taxonomy), Database Schemas (many) and metadata schemes (ICML, ebXML, WSDL)

– More Structure: Thesauri (WordNet, CALL, DTIC), Conceptual Models (OO models, UML)

– Most Structure: Logical Theories (Ontolingua, TOVE, CYC, Semantic Web)

• Ontologies are usually expressed in a logic-based language

– Enabling detailed, sound, meaningful distinctions to be made among the classes, properties, & relations

– More expressive meaning but maintain “computability”

• Using ontologies, tomorrow's applications can be "intelligent”

– Work at the human conceptual level

18

E-commerceArea ofInterestMostly This

Middle Ontology(Domain-spanning

Knowledge)

Most General Thing

Upper Ontology(Generic Common

Knowledge)Products/Services

Processes

Organizations

Locations

Lower Ontology(individual domains)

Metal PartsArt Supplies

Lowest Ontology(sub-domains)

Washers

But Also This!

Ontology: General Picture at Object Level

19

Complexity of Ontology

20

Ontology Processing

21

Steps:

•Determine the domain and scope of ontology

•Consider reusing existing ontologies

•Enumerate important terms in the ontology

•Define classes and the class hierarchy

•Define the properties of the classes ─ slots

•Define the facets of the slots (cardinality, value-type)

•Create instances

How to Build an Ontology

22

Kno

wle

dge s

hari

ng a

nd re

use

Building an ontology is not a goal in itself.

Communication between people

Interoperability between software agents

Reuse of domain knowledge

Make domain knowledge explicit

Analyze domain knowledge

Benefits of Building Ontologies

23

The benefits:Modularisation

Bridging Scales and context with Ontologies

GenesSpecies

Protein

Function

Disease

Protein coded bygene in humans

Function ofProtein coded bygene in humans

Disease caused by abnormality inFunction ofProtein coded bygene in humans

Gene in humans

24

Thesaurus vs. Ontology

Concepts

‘‘Semantic’ Relations:Semantic’ Relations:

Equivalent =

Used For (Synonym) UF

Broader Term BT

Narrower Term NT

Related Term RT

Thesaurus

Ontology

Term Semantics

(Weak)

Logical-Conceptual Semantics

(Strong)

Semantic Relations:Semantic Relations:

Subclass Of

Part Of

Arbitrary Relations

Meta-Properties on Relations

Terms: Metal working machinery, equipment and supplies, metal-cutting machinery, metal-turning equipment, metal-milling equipment, milling insert,turning insert, etc.Relations: use, used-for, broader-term, narrower-term, related-term

Controlled Vocabulary

TermsReal (& Possible)World Referents

Entities: Metal working machinery, equipment and supplies, metal-cutting machinery, metal-turning equipment, metal-milling equipment, milling insert, turning insert, etc.Relations: subclass-of; instance-of; part-of; has-geometry; performs, used-on;etc.Properties: geometry; material; length; operation; UN/SPSC-code; ISO-code; etc.Values: 1; 2; 3; “2.5 inches”; “85-degree-diamond”; “231716”; “boring”; “drilling”; etc.Axioms/Rules: If milling-insert(X) & operation(Y) & material(Z)=HG_Steel & performs(X, Y, Z), then has-geometry(X, 85-degree-diamond).

Logical Concepts

25

weak semanticsweak semantics

strong semanticsstrong semantics

Is Disjoint Subclass of with transitivity property

Modal Logic

Logical Theory

Thesaurus Has Narrower Meaning Than

TaxonomyIs Sub-Classification of

Conceptual Model Is Subclass of

DB Schemas, XML Schema

UML

First Order Logic

RelationalModel, XML

ER

Extended ER

Description LogicDAML+OIL, OWL

RDF/SXTM

Ontology Spectrum: One View

Syntactic Interoperability

Structural Interoperability

Semantic Interoperability

Source: Obrst, L. 2004

26

Logical Theory

Thesaurus Has Narrower Meaning Than

TaxonomyIs Sub-Classification of

Conceptual Model Is Subclass of

Is Disjoint Subclass of with transitivity property

weak semanticsweak semantics

strong semanticsstrong semantics

DB Schemas, XML Schema

UML

Modal LogicFirst Order Logic

RelationalModel, XML

ER

Extended ER

Description LogicDAML+OIL, OWL

RDF/SXTM

Ontology Spectrum: One View (cont.)

Problem: Very GeneralSemantic Expressivity: Very High

Problem: Local Semantic Expressivity: Low

Problem: GeneralSemantic Expressivity: Medium

Problem: Local Semantic Expressivity: High

Syntactic Interoperability

Structural Interoperability

Semantic Interoperability

Source: Obrst, L. 2004

27

Semantic Web Wedding Cake

28

Emerging XML Stack Architecture for the Semantic Web + Grid + Agents

• Semantic Brokers

• Intelligent Agents

• Advanced Applications

• Use, Intent: Pragmatics

• Trust: Proof + Security + Identity

• Reasoning/Proof Methods

• OWL: Ontologies

• RDF Schema: Ontologies

• RDF: Instances (assertions)

• XML Schema: Encodings of Data Elements & Descriptions, Data Types, Local Models

• XML: Base Documents

• Grid & Semantic Grid: New System Services, Intelligent QoS

Sem-Grid Services Water, LISP?

Syntax: Data

Structure

Semantics

Higher Semantics

Reasoning/Proof

XML

XML Schema

RDF/RDF Schema

OWL

Inference Engine

Trust Security/Identity

Use, Intent Pragmatic Web

Intelligent Domain Services, Applications

Agents, Brokers, Policies

29

Where We Are We Are Here

30

What Problems Do Ontologies Help Solve?

• Heterogeneous database problem

– Different organizational units, Service Needers/Providers have radically different databases

– Different syntactically: what’s the format?

– Different structurally: how are they structured?

– Different semantically: what do they mean?

– They all speak different languages (access, description, schemas, meaning)

– Integration: rather than N2 problem, with single, adequate Ontology reduces to N

• Enterprise-wide system interoperability problem

– Currently: system-of-systems, vertical stovepipes

– Ontologies act as conceptual model representing enterprise consensus semantics

• Relevant document retrieval/question-answering problem

– What is the meaning of your query?

– What is the meaning of documents that would satisfy your query?

– Can you obtain only meaningful, relevant documents?

31

OWL: Web Ontology Language

• OWL is built on top of RDF

• OWL is for processing information on the web

• OWL was designed to be interpreted by computers

• OWL was not designed for being read by people

• OWL is written in XML

• OWL has three sublanguages

• OWL is a web standard

32

Why OWL?

• OWL is a part of the "Semantic Web Vision" - a future where:

– Web information has exact meaning

– Web information can be processed by computers

– Computers can integrate information from the web

33

Origins of OWL

DAML

DAML+OIL

DAML = DARPA Agent Markup LanguageOIL = Ontology Inference Layer

OWL is now on track tobecome a W3C Recommendation!

OIL

OWL

RDF

All were influenced by RDF

34

OWL Sublanguages

• OWL has three sublanguages:

– OWL Lite

– OWL DL (includes OWL Lite)

– OWL Full (includes OWL DL)

35

OWL is Different from RDF

• OWL and RDF are much of the same thing, but OWL is a stronger language with greater machine interpretability than RDF.

• OWL comes with a larger vocabulary and stronger syntax than RDF.

36

An OWL Example

37

Where is the Technology Going

• “The Semantic Web is very exciting, and now just starting off in the same grassroots mode as the Web did 10 years ago ... In 10 years it will in turn have revolutionized the way we do business, collaborate and learn.”

– Tim Berners-Lee, CNET.com interview, 2001-12-12

• We can look forward to:

– Semantic Integration/Interoperability, not just data interoperability

– Applications with trans-community semantics

– Device interoperability in the ubiquitous computing future: achieved through semantics & contextual awareness

– True realization of intelligent agent interoperability

– Intelligent semantic information retrieval & search engines

– Next generation electronic commerce/business & web services

– Semantics beginning to be used once again in NLP: information extraction becomes knowledge extraction

Key to all of this is effective & efficient use of explicitly represented semantics (ontologies)!

38

What do we want the future to be?

• 2100 A.D: models, models, models

• There are no human-programmed programming languages

• There are only Models

Ontological Models

Knowledge Models

Belief Models

Application Models

Presentation Models

Target Platform Models

Transformations, Compilations

Executable Code

INFRASTRUCTURE

39

Ontology Example from Electronic Commerce: the general domain of machine tooling & manufacturing; note that these are expressed in English, but usually would be in expressed in a logic-based language Concept Example

Classes (general things)

Metal working machinery, equipment and supplies, metal-cutting machinery, metal-turning equipment, metal-milling equipment, milling insert, turning insert, etc.

Instances (particular things)

An instance of metal-cutting machinery is the “OKK KCV 600 15L Vertical Spindle Direction, 1530x640x640mm 60.24"x25.20"x25.20 X-Y-Z Travels Coordinates, 30 Magazine Capacity, 50 Spindle Taper, 20kg 44 lbs Max Tool Weight, 1500 kg 3307 lbs Max Loadable Weight on Table, 27,600 lbs Machine Weight, CNC Vertical Machining Center”

Relations: subclass-of, (kind_of), instance-of, part-of, has-geometry, performs, used-on, etc.

A kind of metal working machinery is metal cutting machinery, A kind of metal cutting machinery is milling insert.

Properties Geometry, material, length, operation, ISO-code, etc.

Values: 1; 2; 3; “2.5”, inches”; “85-degree-diamond”; “231716”; “boring”; “drilling”; etc.

Rules

If milling-insert(X) & operation(Y) & material(Z)=HG_Steel & performs(X, Y, Z), then has-geometry(X, 85-degree-diamond). [Meaning: if you need to do milling on High Grade Steel, then you need to use a milling insert (blade) which has a 85-degree diamond shape.]

40

Topic Map: Knowledge Management Concept in Digital Libraries

41

Topic Maps Introduction

• Goal: organize information for navigation

• Topic Maps are the online equivalent of printed indexes

• A powerful way to manage link information, such as glossaries, cross-references, thesauri, catalogs, they enable the merging of structured, unstructured information.

42

Different Levels of Information Organization

• Metadata

• Thesauri

• Taxonomies

• Topic Maps

43

Objects and Their Metadata

• What is metadata?

• Metadata as a finding aid

• Subjects and precision

44

Subject-based Classification

• Controlled vocabularies

• Taxonomies

• Thesauri

• Faceted classification

• Ontologies

• Other subject-based techniques

45

Topic Maps Concepts

• Topic

– A topic is a multi-headed link, that points to all its occurrences

– Topic occurrence

– A topic type is a category to which one given topic instance belong("person", "city", "product"…,etc)

– Topic name: base name, display name, sort name

46

Topic Maps Concepts (2)

• Types

– is-a relationships

• Occurrences

– Relate topics to the information they are relevant to

47

Topic Maps Concepts (3)

• Association

– Topics can be related together through some association expressing given semantic

– Describes relationships

• Facet

– Multiple facets can be applied to view the topic in different ways

48

Example

49

Example: Shakespeare’s Plays

50

XTM Element Types

• <topicRef>: Reference to a Topic element

• <subjectIndicatorRef>: Reference to a Subject Indicator

• <scope>: Reference to Topic(s) that comprise the Scope

• <instanceOf>: Points to a Topic representing a class

• <topicMap>: Topic Map document element

• <topic>: Topic element

• <subjectIdentity>: Subject reified by Topic

• <baseName>: Base Name of a Topic

• <baseNameString>: Base Name String container

• <variant>: Alternate forms of Base Name

• <variantName>: Container for Variant Name

• <parameters>: Processing context for Variant

• <association>: Topic Association

• <member>: Member in Topic Association

• <roleSpec>: Points to a Topic serving as an Association Role

• <occurrence>: Resources regarded as an Occurrence

• <resourceRef>: Reference to a Resource

• <resourceData>: Container for Resource data

• <mergeMap>: Merge with another Topic Map

51

The Comparison

• Traditional classifications in topic maps

• Merging metadata and classification

• Benefits and costs

• Searching

• Schemas

• Identity and merging

52

Conclusion

53

Questions?