hong kong chinese authority (name) project – the hkcan xml version joanna yi-hang pong city...

33
Hong Kong Chinese Authority (Name) Project – the HKCAN XML version Joanna Yi-hang PONG City University of Hong Kong XML and Authority Control XML and Authority Control ALA Annual Conference ALA Annual Conference June 26, 2005 June 26, 2005 Chicago, IL Chicago, IL

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Hong Kong Chinese Authority (Name) Project

– the HKCAN XML version

Joanna Yi-hang PONGCity University of Hong Kong

XML and Authority Control XML and Authority Control ALA Annual ConferenceALA Annual ConferenceJune 26, 2005June 26, 2005Chicago, ILChicago, IL

2

HKCAN Hong Kong Chinese Authority (Name)

A collaborative project since 1999

7 Hong Kong university libraries– Chinese University of Hong Kong– City University of Hong Kong– Hong Kong Baptist University– Hong Kong Institute of Education– Hong Kong Polytechnic University– Lingnan University– University of Hong Kong

3

Aims To build up a Chinese name authority file with

CJK (Chinese, Japanese, Korean) scripts that meets the need of the bilingual community

To improve and streamline authority-control operations by setting up standardization for name headings and principles for authority record selection to achieve “Better”, “Faster” and “Cheaper”

To participate in regional and global cooperative activities on authority work

4

HKCAN authority record model

7xx model Based on MARC discussion paper 2001-

DP05 Multilingual Authority Records in MARC21 Authority Format – http://www.loc.gov/marc/marbi/2001/2001-dp05.html

This paper explores how to handle multilingual records in the MARC21 authority format

HKCAN based on the recommended Model C

5

Record model – Model C (cont’d)

Established headings (1xx) are defined by a catalog context (i.e. cataloging rules and language of catalog)

Context designation are shown in field 008/10-11(rules) or field 040$b (lang. of cataloging)

4xx and 5xx reference tracing are appropriate to the 1xx heading in that context

Alternative established headings are recorded in 7xx fields, along with the indication of the context in which they are appropriate

6

Record model – Model C (cont’d)

Context designation (cat. rules, descriptive

convention, lang. of cat.)

Authority record

Establishedheading

(1xx)

Referencetracings

(4xx, 5xx)

AlternativeEstablished

Heading (7xx)

Contextindicator

1 1

1 1

m1

1

1

m

11

7

Record model - Model C (cont’d)

Example:008/10 (Cataloging rules): c (AACR2)008/11 (Subject system/thesaurus rules): a (LCSH) Context designation040 $b (Language of cataloging): eng (English)100 0# $a John Paul $b II, $c Pope, $d 1920- Established heading (1xx)400 0# $a Joannes Paulus $b II, $c Pope, $d 1920-400 0# $a Juan Pablo $b II, $c Pope, $d 1920-400 0# $a Jean Paul $b II, $c Pope, $d 1920- Reference tracing (4/5xx)400 0# $a Johannes Paul $b II, $c Pope, $d 1920-400 0# $a Joann Pavel $b II, $c Pope, $d 1920- <etc.>510 2# $a Catholic Church. $b Pope (1978-: John Paul II)700 04 $a Juan Pablo $b II, $c Papa, $d 1920- $7 aacr//spa700 06 $a Jean Paul $b II $c pape, $d 1920- $7 aacr//fre700 04 $a Jean Paul $b II $c (Pape), $d 1920- $7 ncafnor context700 04 $a Johannes Paul $c Papst, $b II $7 rak indicator700 04 $a <Chinese heading for John Paul> $b II $c <Chinese designation for

Pope> $d 1920- $7 aacr//chi Alternative established heading (7xx)

8

Record model - HKCAN record 008 941020nc acannaabn |a aaa |||

010 $anr 94034993035 $a(DLC#)nr 94034993a040 $aDLC-R$beng$cDLC-R$dOCoLC$dHkCU$dHkCAN066 $c$1100 1 $aZhou, Ying,$d17th cent.400 1 $wnne$aChou, Ying,$d17th cent.400 1 $aZhou, Fangshu,$d17th cent.400 1 $a 周方叔 ,$d17th cent.400 1 $aChou, Fang-shu,$d17th cent.670 $aChih lin ( 卮林 ), 1992:$bt.p. (Chou Ying)670 $aChung wen ta tz{176}u tien ( 中文大詞典 ):$bv. 6, p. 290 (Chou Ying; of

Ming; native of P{176}u-t{176}ien; t. Fang-shu; author of Chih lin; lived around the mid

of Emperor Ch{176}ung-chen reign)670 $aHis 卮林 : 10 卷 , 附補遺 1 卷 , [1963]:$bt.p. ( 周嬰 )670 $a 中國人名大辭典 , 1934:$bp. 545 ( 周嬰 , 明莆田人 , 字方叔 , 崇禎中以貢

生知上猶縣 , 所著卮林 , 体近類書 )700 1 $a 周嬰 ,$d17th cent

9

Record model – HKCAN record (cont’d)

008   9802097n| acannaab| |a ana040   $aHkCU $cHkCU 066   $c$1 110 1 $aHong Kong (China). $bCensus and Statistics Dept.410 2 $aCensus and Statistics Dept. (Hong Kong, China) 410 2 $aZheng fu tong ji chu (Hong Kong, China) 410 2 $a 政府統計處 ( 香港 , 中國 ) 510 1 $aHong Kong. $bCensus and Statistics Dept. $wa 510 1 $a 香港 . $b 統計處 $wa 510 1 $a 香港 . $b 政府統計處 $wa 667 $aNon-conventional pinyin pairing of 1xx/7xx|5HkCAN670   $aIts Hong Kong social & economic trends = 香港社會及經濟趨勢 , 1995-

: $b1997 ed., cover (Census and Statistics Department Hong Kong Special Administrative Region People's Republic of China 中華人民共和國 香港特別行政區 政府統計處 )

710 1 $a 香港 ( 中國 ). $b 政府統計處

10

Record model – in library system

INNOPAC library system

7xx enhanced redirection function

Re-indexed as “Equivalent heading”

Displayed in WebPAC as “Equivalent heading”

11

Record model – in library system (cont’d)

12

Record model – in library system (cont’d)

13

TTS (Big5) version vs. XML version

In 1999, developed by a Taiwan software vendor

Further developed and maintained by TTS, another Taiwan software company

Based on Big5 – restriction in character encoding

Lack of Z39.50 design October 2003, a new HKCAN software

developed by Chinese University of Hong Kong – XML version

14

Document Type Definition

HKCAN DTD (Document Type Definition) – to specify the structure of each XML authority record

With this DTD, records can be output to the XML schema or other related schemas if needed

DTD has well-served all the necessary functionality in the present XML platform

15

DTD (cont’d)<?xml version="1.0" encoding="UTF-8"?>

<!--DTD generated by XMLSPY v2004 rel. 3 U (http://www.xmlspy.com)-->

<!ELEMENT Leader (#PCDATA)>

<!ELEMENT Name (Leader, (Tag* | tag_type00* | tag_type10* | tag_type11* | tag_type30* | tag_1xx* | tag_4xx* | tag_5xx* | tag_7xx* | tag_670*)*)>

<!ATTLIST Name

tag001 CDATA #IMPLIED

record_type CDATA #IMPLIED

>

<!ELEMENT Subfield (#PCDATA)>

<!ATTLIST Subfield

subfield_code CDATA #IMPLIED

>

<!ELEMENT Tag (#PCDATA | Subfield*)*>

<!ATTLIST Tag

tagcode CDATA #IMPLIED

record_type CDATA #IMPLIED

ind1 CDATA #IMPLIED

ind2 CDATA #IMPLIED

>

<!ELEMENT tag_1xx (#PCDATA)>

<!ELEMENT tag_4xx (#PCDATA)>

<!ELEMENT tag_5xx (#PCDATA)>

<!ELEMENT tag_670 (#PCDATA)>

<!ELEMENT tag_7xx (#PCDATA)>

<!ELEMENT tag_type00 (#PCDATA)>

<!ELEMENT tag_type10 (#PCDATA)>

<!ELEMENT tag_type11 (#PCDATA)>

<!ELEMENT tag_type30 (#PCDATA)>

16

HKCAN XML platform

Web interface

Records in Communication MARC format with EACC encoding

Records in XML format with EACC encoding

Records in XML format with UTF-8 encoding

HKCAN XML full text search server (Tamino)

for full text search, records display & download

HKCAN index search server

(SQL anywhere 8.0)

Full text searchIndex search

Program to convert records from Communication MARC format to XML format

Program to convert the records from CCCII encoding to UTF-8 encoding

Import the records to a relational database for index search

Retrieve the full record from HKCAN

XML server

MARC

17

Record conversion

00681cz 2200193n 4504001001000000003000600010005001700016008004100033010001600074035002300090040003000113066000700143100003000150670002600180670018700206670005700393678000900450700002800459000000001HkCAN19960504052613.5800523n| acannabb| |n aaa an 50026575 a(DLC#)n 50026575a aDLCcDLCdCUdDLC-RdHkCU c$11 aNakayama, Shigeru,d1928- aHis Senseijutsu, 1963 aKagaku gijutsu to ekoroj{229}i, 1995:bt.p. (Nakayama Shigeru) colophon (r; b. 1928; Ph.D. (from Harvard Univ.); prof., Kanagawa Daigaku; former asst. prof., T{229}oky{229}o Daigaku) a$1!Bs!Ci!O(':`!5=(B, 1999:bp. 2 ($1!04!;e!Th(B) aSc.D1 a$1!04!;e!Th(B,d1928-

From MARC record with EACC encoding

18

Record conversion (cont’d)To XML record with EACC

encoding<Name tag001 = "000000001" record_type = "00">

<Leader>00681cz 2200193n 4504</Leader>

<Tag tagcode = "003" record_type = "" ind1 = "" ind2 = "">HkCAN</Tag>

<Tag tagcode = "005" record_type = "" ind1 = "" ind2 = "">19960504052613.5</Tag>

<Tag tagcode = "008" record_type = "" ind1="" ind2="">800523n| acannabb| |n aaa </Tag>

<Tag tagcode = "010" record_type = "" ind1=" " ind2="">

<Subfield subfield_code = "a">n 50026575 </Subfield>

</Tag>

…..

<Tag tagcode = "670" record_type = "" ind1=" " ind2="">

<Subfield subfield_code = "a">$1!Bs!Ci!O(':`!5=(B, 1999: </Subfield>

<Subfield subfield_code = "b">p. 2 ($1!04!;e!Th(B) </Subfield>

</Tag>

…….

<tag_type00>$1!04!;e!Th(B, | 1928- | </tag_type00>

<tag_7xx>$1!04!;e!Th(B, | 1928- | </tag_7xx>

</Name>

19

Record conversion (cont’d)To XML record with UTF-8

encoding<Name ino:docname="Name:000000001" ino:id="1" record_type="00" tag001="000000001">

<Leader>00681cz 2200193n 4504</Leader>

<Tag ind1="" ind2="" record_type="" tagcode="003">HkCAN</Tag>

<Tag ind1="" ind2="" record_type="" tagcode="005">19960504052613.5</Tag>

<Tag ind1="" ind2="" record_type="" tagcode="008">800523n| acannabb| |n aaa</Tag>

<Tag ind1="" ind2="" record_type="" tagcode="010">

<Subfield subfield_code="a">n 50026575</Subfield>

</Tag>

. ….

<Tag ind1="" ind2="" record_type="" tagcode="670">

<Subfield subfield_code="a"> 日本科学史 , 1999:</Subfield>

<Subfield subfield_code="b">p. 2 ( 中山茂 )</Subfield>

</Tag>

……..

<tag_type00> 中山茂 , | 1928- |</tag_type00>

<tag_7xx> 中山茂 , | 1928- |</tag_7xx>

</Name>

20

Record search Index search (browse search) Full text search (phrase/keyword search) Tamino server: Full text search, record display, record d

ownload SQL Anywhere server: Index search

21

Support of Unicode encoding & display

100 1 $aLu, Dunji 400 1 $aLu, Tun-chi $wnne 670   $aHuang, Z.X. Huang Zongxi shi wen xuan yi ( 黃宗羲

詩文選譯 ), 1991 670   $aHis 自由报人 : 曹聚仁传 , 2003: $bt.p. ( 卢敦基 )

biog. ( 男 , 1962 年生于浙江永康 , 1978 年 3 月入杭州大学中文系 , 1987 年获文学硕士学位 , 现为浙江省社会科学院越文化研究所所长 , 研究员 , 浙江大学在读论文博士 , 中囯作家协会会员 , 著有《风起云扬 -- 汉书随笔》 ,《金庸小说论》等 )

700 1 $a 盧敦基

Traditional Chinese characters

Simplified Chinese characters

Unicode character set – able to accommodate both simplified & traditional Chinese characters

Enables faithful transcription of Chinese names

22

Support of traditional vs. simplified Chinese character

mapping

Traditional/simplified mapping table Tamino server: built-in function to provide l

inked searching Chinese index and phrase searching irres

pective of input characters in simplified or traditional formats can all be supported

23

Support of traditional vs. simplified Chinese character mappi

ng (cont’d)

24

Export options Options to export records in:

– text – MARC21 or – MARC21 XML

Swapping of 1xx and 7xx fieldsFrom: To:100 1 $aChen, Peigui, $dju ren 1846 100 1 $a 陳培桂 , $d 舉人 1846

700 1 $a 陳培桂 , $d 舉人 1846 700 1 $aChen, Peigui, $dju ren 1846

25

Support of Z39.50 protocol & full integration with INNOPAC

library system

26

One stop search Inspired by VIAF & the LEAF Project Search across multiple authority files

concurrently HKCAN, Chinese Authority Name Database

(Taiwan), LC Authority File, National Library of China

27

One stop search (cont’d)

28

One stop search (cont’d) Coordinating Committee on Chinese Name

Authority was set up in 2003 Members

– JULAC-HKCAN (Hong Kong)– CALIS (China Academic Library & Information

System)– NLC (National Library of China)

Agreements– Chinese name entry should follow international

recommended format– Share each other’s authority databases– Develop authority databases with Unicode encoding

29

Latest statistics Over 123000 records in HKCAN database

Original records created

by HKCAN Workgroup– Over 73800 records

(60%)

Records from Library of

Congress, enhanced by HKCAN Workgroup– Over 49200 records (40%)

40%

60%

LC

Records created bymembers

30

Latest statistics (cont’d) Statistical ratio between different types of authority

records– Personal Names : 85,000 records

– Corporate Names : 15,000 records

– Conference Names : 1,000 records– Uniform Titles: 22,000 records Total: over 123,000 records

31

Difficulties encountered Character-mapping

– from EACC to Unicode– Library of Congress EACC to UTF-8 mapping

tables http://www.loc.gov/marc/specifications/specchareacc.html ; Unihan database http://www.unicode.org/charts/unihan.html ; or INNOPAC UTF-8 mapping table?

– Task Force of UTF-8 mapping in Hong Kong Diacritics

– Searching and display– Diacritics mapping table in INNOPAC library

system

32

Future development & further cooperation

Database in practice - develop editing module, duplication checking function

Promote sharing of existing resources among libraries in Hong Kong, Taiwan & Mainland China

Enhance the effectiveness of Chinese authority works among libraries on an international scale

33

Thank [email protected]