some thoughts about the gaps across languages and domains through the experience on building the...

35
Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies Hideaki Takeda National Institute of Informatics [email protected] Glocal KO Workshop, Thursday August 13, 2015, Copenhagen

Upload: hideaki-takeda

Post on 11-Apr-2017

667 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Some thoughts about the gaps across languages and domains

through the experience on building the core common vocabularies

Hideaki TakedaNational Institute of Informatics

[email protected]

Glocal KO Workshop, Thursday August 13, 2015, Copenhagen

Page 2: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Who am I? Hideaki Takeda, Dr., Eng.

• Professor, National Institute of Informatics– Research Institute mainly for Computer Science

• Background: Computer Science, in particular, Artificial Intelligence

• Current interest: Semantic Web, Ontology, Linked Open Data (LOD), Social Media Analysis

• Social activities– President, Linked Open Data Initiative (NPO)– Founder, Dbpedia Japanese Chapter – Specialist, Information-technology Promotion Agency, Japan (IPA)– Chair, Japan Link Center (Registration Agency of International DOI

Foundation)– Board, ORCID

Page 3: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Core Vocabularies

• Background– Everything is on infosphere, i.e., web– Lots of information, lots of data, lots of systems

• Problems– Misunderstanding/mis-matching/”missing links“

across different domains– Gap between human and machines (computers)

Page 4: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Core Vocabularies

• Aim– Increase interoperability of information/data– Bridge human and machine understanding

• Target– Governmental documents/data

• Method– Define a set of concepts which bridge (human-readable)

terms and (computer-processable) symbols (URIs)– Starting from the most common concepts

Page 5: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Core Vocabularies

• Activities worldwide– USA: NIEM Core• NIEM (National Information Exchange Model)

– Europe: ISA Core Vocabularies– UN: United Nations Centre for Trade Facilitation

and Electronic Business (UN/CEFACT)• Core Components Library (UN/CCL)

– Japan: IMI Core Vocabulary

Page 6: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies
Page 7: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

ISA Core Vocabularies v 1.1

Page 8: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

NIEM Architecture

http://niem.github.io/technical/iepd-versions/

Page 9: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

NIEM

http://reference.niem.gov/niem/guidance/user-guide/vol1/user-guide-vol1.pdf

http://www.epa.gov/oei/symposium/2010/roy.pdf

Page 10: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

10

IMI Project• Supported by– Ministry of Economy, Trade,

and Industry, Japan• Technical Framework– Data Model– Core Vocabulary– Design Rules

• Support Framework– Tools

• for data developer• for schema developer

– Database• schema / tools / templates/ …

rdfxml

Person Type

Name

Gender

Gender Code

Birth Date

Address

Name Type

TypeName

Family Name

Given Name

Address Type

TypeNotation

Zip Code

Prefecture

City…

String

String

String

Code Type String

String

String

String

String

String

Code Type

TypeValue

Name Type

Address Type

Codelist Type

String

Thing Type

Page 11: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

IMI as a template for schema

Registration form for Conference X

Name : Address :Gender :          Affiliation :Affiliation Address :Attending date :   -    -

M / F

Person Type  Name  Gender  Gender Code  Birth Date  Address  …

Name Type  Type  Name  Family Name  Given Name  …

Address Type  Type  Notation  Zip Code  Prefecture  City …

StringStringString

Code Type String String

StringStringString

String

Code Type  Type  Value

Name Type

Address Type

Codelist TypeString

Thing Type

IMI Individual Form

Person Type  Name  Gender  Address  Affiliation 

Name Type  Name

Address Type  Notation  Zip-code

String String

StringString

Name

AddressOrg.

PersonDate

Event Participation Type  Participant Date

Design Schema

Remove unnecessary items

Add necessary items

Page 12: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Roles of IMI

• Structured concept dictionary– Concept dictionary

• Terms as notation of concepts– The entry is concept, not term

• Class concept and relation concept• General-specific relation

– Structured dictionary• Concepts form a network of concepts which in tern represents meaning of

individual concepts• A class concept consists of relation concepts representing attributes and

general/specific relations• A relation concept consists of class concepts connected as domains and

ranges and general/specific relations

• Template for schemata– Add or remove items for the specific needs

Page 13: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Use of IMI• Define the concept model• “Serialize” it into specific “physical” forms• Use suitable a physical form

IMI Concept Model

RDF XML Natural Language Form

For Open Data For data exchange For spread sheets and documents• Relax definition• Interoperability

with other open data schemata

• Strict definition• Interoperability with DB

schemata

• Relax definition with simple structure

• Readability by humans

Page 14: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

14

IMI Core vocabulary v2.2• Published on Feb.3 2015• 48 core class terms– person, address, facility, location, date, …

• 206 core property terms– name of person, birth date, birth country, …

• Multi format – rdf schema, xml schema

and documents for human

http://imi.ipa.go.jp/ns/core/2/

Page 15: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies
Page 16: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

16

Class definition (person class)person 人

説明:人の情報を表現するためのデータ型  Data Type to describe a person継承 (inherit from) : ic: 実体型

property Data type cardinality 説明 (ja) Description (en)ID ID ic:ID 型 0..n ID Identification of a Person

Name of person 氏名 ic: 氏名型 0..n 氏名 Name of a PersonGender 性別 xsd:string 0..1 性別の表記 Gender of a Person

Gender code 性別コード ic: コード型 0..1 性別コード Gender of a PersonBirth date 生年月日 ic: 日付型 0..1 生年月日 Date of Birth of a Person

Death date 死亡年月日 ic: 日付型 0..1 死亡年月日 Date of Death of a PersonResidence

address 住所 ic: 住所型 0..n 現住所 Present   address of a PersonDomicile of origin 本籍 ic: 住所型 0..1 本籍 Legal residence address of a Person

Contact information 連絡先 ic: 連絡先型 0..n 連絡先 Contact information of a Person

Nationality 国籍 xsd:string 0..n 国籍の表記

A county that assigns rights, duties, and privileges to a person because of the birth or naturalization of the person in that country.

Nationality code 国籍コード ic: コード型 0..n住民基本台帳で利用されている国籍コード

A county that assigns rights, duties, and privileges to a person because of the birth or naturalization of the person in that country.

Birth country 出生国 xsd:string 0..1 生まれた国名 A location where a person was born.Birth country

code 出生国コード ic: コード型 0..1 生まれた国のコード A location where a person was born.Birth place 出生地 ic: 住所型 0..1 生まれた場所 A location where a person was born.

Page 17: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Class Structure

person 人name ic: 氏名型Contact ic: 連絡先型  : :

氏名Family name xsd:string

Romanized Family name xsd:string

  : :

contact 連絡先Phone number ic: 電話番号型Address ic: 住所型  : :

電話番号  : :

address 住所Country xsd:string

Prefecture xsd:string

  : :

A class term has a property term as a sub element and the property term can refer a class term. Again, the class term has a list of property terms. That constructs a layered structure of terms as the following figure.

phone number

name

Page 18: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

18

Concept of the IMI framework

International interoperability is highly considered in preparing IMI.

Core Vocabulary

Shelter

Location

Hospital

Station

Geographical Space/Facilities

Transportation

Disaster Prevention

FinanceDomain-specific

Vocabularies

Disaster Restoration Cost

Cross Domain Vocabulary

IMI

Japanese Local

government Standard(APPLIC)

DE fact Standards(DC, foaf,

etc)

NIEM(US)

ISA(EU)

Schema.org

Page 19: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Mapping between concepts in different core vocabularies

• Difficulty of concept-concept mapping– Matching of meaning tends to be very abstract

discussion

Concept

reference

Ontology

Real world

Concept

reference

?

Page 20: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Mapping between concepts in different core vocabularies

• Difficulty of concept-concept mapping– Matching of meaning tends to be very abstract

discussion– Matching of references is easier

Concept

reference

Ontology

Real world

Concept

reference

?

Page 21: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Mapping between concepts in different core vocabularies

• Difficulty of concept-concept mapping– Syntactical mapping vs. semantic mapping• Just consider what it refers in the real world, not how it

is represented in systems. Concept

reference

Ontology

Concept

reference

?

Systems World

Cognitive World

Page 22: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Person

person 人説明:人の情報を表現するためのデータ型  Data Type to describe a person継承 (inherit from) : ic: 実体型

property

Data type

cardinalit

y 説明 (ja) Description (en)

ID ID ic:ID 型 0..n ID Identification of a PersonName of

person 氏名 ic: 氏名型 0..n 氏名 Name of a Person

Gender 性別 xsd:string 0..1 性別の表記 Gender of a Person

Gender code

性別コード

ic: コード型 0..1 性別コード Gender of a Person

Birth date 生年月日

ic: 日付型 0..1 生年月日 Date of Birth of a Person

Death date

死亡年月日

ic: 日付型 0..1 死亡年月日 Date of Death of a Person

Residence address 住所 ic: 住所

型 0..n 現住所 Present   address of a Person

Domicile of origin 本籍 ic: 住所

型 0..1 本籍 Legal residence address of a Person

Contact informatio

n連絡先

ic: 連絡先型 0..n 連絡先 Contact information of a

Person

Nationality 国籍 xsd:string 0..n 国籍の表記

A county that assigns rights, duties, and privileges to a person because of the birth or naturalization of the person in that country.

Nationality code

国籍コード

ic: コード型 0..n

住民基本台帳で利用されている国籍コード

A county that assigns rights, duties, and privileges to a person because of the birth or naturalization of the person in that country.

Birth country 出生国 xsd:stri

ng 0..1 生まれた国名 A location where a person was born.

Birth country

code出生国コード

ic: コード型 0..1 生まれた国の

コードA location where a person was born.

Birth place 出生地 ic: 住所型 0..1 生まれた場所 A location where a person

was born.

?

?

Systems WorldCognitive World

Page 23: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Postal Code

?

?

“101-8430” ^^xsd:string “SW1A 0AA”@en

(postal code in Japan) (postal code in Europe)

Systems WorldCognitive World

Page 24: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Semantic Mapping

• Semantic Mapping– Mapping on the cognitive layer– Two ways of judging mapping

• Extensional Mapping– Check whether ‘things’ are shared– e.g., person– Mostly for Class Mapping

• Intensional Mapping– Check whether ‘values’ are shared– e.g., postal-code– Mostly for Property Mapping

• Syntactical Mapping– Mapping on the systems layer

Page 25: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Types of matching: SKOS

• Exact Match• Close Match• Broad/Narrow Match• Related Match

Page 26: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Close match

• Close match: nearly matched but not exactly matched.

• Extensional mapping– Coverage of ‘things’ are overlapped so much

• Coverage of ‘Country’ is slightly different – ‘things’ are close

• Reference of ‘Person’ is slightly different (person vs. legal Person)

• Intensional mapping– Coverage of ‘values’ are overlapped so much

Page 27: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Broad match/narrow match

• Broad/narrow match– One subsumes the other

• Extensional mapping– Coverage of ‘things’ are subsumed, i.e., the subset

is exact match• Intensional mapping– Coverage of ‘values’ are subsumed, i.e., the subset

is exact match

Page 28: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

More different matching

• Complicated match– An element of a system matches a combination of

two or more elements.– “Pathway” match• A single property matches the combination of two or

more properties

– “Conditional” match• An element matches the other element if some condition

is hold

IdentifierIssuingAuthority Link Has related match IMI ic:ID 型 .ic:ID 体系 .ic: 発行者

LegalEntityRegisteredAddress Link Has broad

match IMI ic:法人型.ic:住所 It is exact match if the value of ic: 住所 . 種別 should be " 登記住所 ".

Page 29: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Results

Core Vocabulary Identifier Link Mapping relation Data model IdentifierAddress Link Has exact match IMI ic:住所型AddressAddressArea Link Has narrow match IMI ic:住所型.ic:町名AddressAddressArea Link Has narrow match IMI ic:住所型.ic:丁目AddressAddressArea Link Has narrow match IMI ic:住所型.ic:番地補足AddressAddressArea Link Has narrow match IMI ic:住所型.ic:番地AddressAddressArea Link Has narrow match IMI ic:住所型.ic:号AddressAddressID Link Has exact match IMI ic:住所型.ic:IDAddressAdminUnitL1 Link Has exact match IMI ic:住所型.ic:国AddressAdminUnitL2 Link Has narrow match IMI ic:住所型.ic:都道府県AddressFullAddress Link Has exact match IMI ic:住所型.ic:表記AddressLocatorDesignator Link Has narrow match IMI ic:住所型.ic:ビル番号AddressLocatorDesignator Link Has narrow match IMI ic:住所型.ic:部屋番号AddressLocatorName Link Has narrow match IMI ic:住所型.ic:ビル名AddressPOBox Link Has related match IMI ic:住所型.ic:方書AddressPostCode Link Has exact match IMI ic:住所型.ic:郵便番号AddressPostName Link Has narrow match IMI ic:住所型.ic:市区町村AddressPostName Link Has narrow match IMI ic:住所型.ic:区AddressThoroughfare Link Has no match IMIAgent Link Has exact match IMI ic:実体型

Page 30: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

ResultsIdentifier Link Has exact match IMI ic:ID型IdentifierIdentifier Link Has exact match IMI ic:ID型.ic:識別値IdentifierIssueDate Link Has no match IMIIdentifierIssuingAuthority Link Has related match IMI ic:ID 型 .ic:ID 体系 .ic: 発行者IdentifierIssuingAuthorityURI Link Has exact match IMI ic:ID型.ic:ID体系.ic:URIIdentifierType Link Has no match IMI

JurisdictionIdentifier Link Has related match IMI ic:国籍コードJurisdictionName Link Has related match IMI ic:国籍LegalEntity Link Has exact match IMI ic:法人型LegalEntityAddress Link Has broad match IMI ic:法人型.ic:住所LegalEntityAlternativeName Link Has no match IMILegalEntityCompanyActivity Link Has close match IMI ic:法人型.ic:事業種目LegalEntityCompanyStatus Link Has related match IMI ic:法人型.ic:活動状況LegalEntityCompanyType Link Has exact match IMI ic:法人型.ic:組織種別LegalEntityIdentifier Link Has exact match IMI ic:法人型.ic:IDLegalEntityLegalIdentifier Link Has no match IMILegalEntityLegalName Link Has broad match IMI ic: 法人型 .ic: 名称 . 表記LegalEntityLocation Link Has related match IMI ic: 法人型 .ic: 地物 . 説明LegalEntityRegisteredAddress Link Has broad match IMI ic:法人型.ic:住所Location Link Has exact match IMI ic:場所型LocationAddress Link Has exact match IMI ic:場所型.ic:住所LocationGeographicIdentifier Link Has broad match IMI ic:場所型.ic:地理識別子LocationGeographicName Link Has exact match IMI ic:場所型.ic:名称.ic:表記LocationGeometry Link Has exact match IMI ic:場所型.ic:地理座標

Page 31: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Results

Person Link Has exact match IMI ic:人型PersonAddress Link Has exact match IMI ic:人型.ic:住所PersonAlternativeName Link Has broad match IMI ic:人型.ic:氏名.ic:姓名PersonBirthName Link Has broad match IMI ic:人型.ic:氏名.ic:姓名PersonCitizenship Link Has no match IMI  PersonCountryOfBirth Link Has exact match IMI ic:人型.ic:出生国PersonCountryOfDeath Link Has no match IMI  PersonDateOfBirth Link Has exact match IMI ic:人型.ic:生年月日PersonDateOfDeath Link Has exact match IMI ic:人型.ic:死亡年月日PersonFamilyName Link Has exact match IMI ic:人型.ic:氏名.ic:姓PersonFullName Link Has exact match IMI ic:人型.ic:氏名.ic:姓名PersonGender Link Has exact match IMI ic:人型.ic:性別コードPersonGivenName Link Has exact match IMI ic:人型.ic:氏名.ic:名PersonIdentifier Link Has broad match IMI ic:人型.ic:IDPersonPatronymicName Link Has no match IMI ic:人型.ic:氏名.ic:姓名PersonPlaceOfBirth Link Has narrow match IMI ic:人型.ic:出生地

Page 32: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Bridging core and domain vocabularies (working in progress)

• Aim: Core vocabulary would be extended to domain vocabularies– Agriculture– Finance– Traffic– …

• Task: – Can concepts be shared between core and domains?really?

Page 33: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Agricultural Activity Ontology (AAO)

Agricultural activity

crop production activityactivity for propagationactivity in the vegetative growth stageactivity in the reproductive growth stage

activity for environment controlactivity for soil controlactivity for climate controlactivity for water controlactivity for biotic controlactivity for chemical control

post production activityactivity for harvestingactivity for processingactivity for extending shelf-lifeactivity for wrapping

indirect activity

activity for preparing materialsactivity for cleaningactivity for transportactivity for monitoringactivity for maintaining farm equipment

administrative activityactivity for business administration

http://cavoc.org/aao/

Page 34: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

An example: “activity” (and “event”)• S: (n) activity (any specific behavior) "they avoided all recreational activity"

– direct hyponym / full hyponym– direct hypernym / inherited hypernym / sister term

• S: (n) act, deed, human action, human activity (something that people do or cause to happen)– S: (n) event (something that happens at a given place and time)

– [WordNet]• Each activity is a Happening which involves volition and participants. It has

temporal dimension. It is distinguished from Events by the fact that the activity does not trigger change of state and does not have a conceptual end point. – [PROTON Extent module (a lightweight upper-level ontology)]

• Activity: This class represents the abstract content of an event, which may be repeated many times, once or never. For example a training course, or a play. – [The Event Programme Vocabulary (prog)]

• E5 Event– Subclass of:                 E4 Period– Superclass of:               E7 Activity, E63 Beginning of Existence, E64 End of

Existence• E7 Activity

– Subclass of: E5 Event– Superclass of: E8 Acquisition, E9 Move, E10 Transfer of Custody, E11 Modification,

E13 Attribute Assignment, E65 Creation …– [CIDOC Conceptual Reference Model]

Page 35: Some thoughts about the gaps across languages and domains through the experience on building the core common vocabularies

Summary

• Sharing concepts is a very long way• No ground truth– Step-by-step understanding of the world– Careful consensus making

• More flexible framework is needed– Simple mapping is not so happy