hong kong chinese authority (name) project – the hkcan xml version joanna yi-hang pong city...
Post on 19-Dec-2015
213 views
TRANSCRIPT
Hong Kong Chinese Authority (Name) Project
– the HKCAN XML version
Joanna Yi-hang PONGCity University of Hong Kong
XML and Authority Control XML and Authority Control ALA Annual ConferenceALA Annual ConferenceJune 26, 2005June 26, 2005Chicago, ILChicago, IL
2
HKCAN Hong Kong Chinese Authority (Name)
A collaborative project since 1999
7 Hong Kong university libraries– Chinese University of Hong Kong– City University of Hong Kong– Hong Kong Baptist University– Hong Kong Institute of Education– Hong Kong Polytechnic University– Lingnan University– University of Hong Kong
3
Aims To build up a Chinese name authority file with
CJK (Chinese, Japanese, Korean) scripts that meets the need of the bilingual community
To improve and streamline authority-control operations by setting up standardization for name headings and principles for authority record selection to achieve “Better”, “Faster” and “Cheaper”
To participate in regional and global cooperative activities on authority work
4
HKCAN authority record model
7xx model Based on MARC discussion paper 2001-
DP05 Multilingual Authority Records in MARC21 Authority Format – http://www.loc.gov/marc/marbi/2001/2001-dp05.html
This paper explores how to handle multilingual records in the MARC21 authority format
HKCAN based on the recommended Model C
5
Record model – Model C (cont’d)
Established headings (1xx) are defined by a catalog context (i.e. cataloging rules and language of catalog)
Context designation are shown in field 008/10-11(rules) or field 040$b (lang. of cataloging)
4xx and 5xx reference tracing are appropriate to the 1xx heading in that context
Alternative established headings are recorded in 7xx fields, along with the indication of the context in which they are appropriate
6
Record model – Model C (cont’d)
Context designation (cat. rules, descriptive
convention, lang. of cat.)
Authority record
Establishedheading
(1xx)
Referencetracings
(4xx, 5xx)
AlternativeEstablished
Heading (7xx)
Contextindicator
1 1
1 1
m1
1
1
m
11
7
Record model - Model C (cont’d)
Example:008/10 (Cataloging rules): c (AACR2)008/11 (Subject system/thesaurus rules): a (LCSH) Context designation040 $b (Language of cataloging): eng (English)100 0# $a John Paul $b II, $c Pope, $d 1920- Established heading (1xx)400 0# $a Joannes Paulus $b II, $c Pope, $d 1920-400 0# $a Juan Pablo $b II, $c Pope, $d 1920-400 0# $a Jean Paul $b II, $c Pope, $d 1920- Reference tracing (4/5xx)400 0# $a Johannes Paul $b II, $c Pope, $d 1920-400 0# $a Joann Pavel $b II, $c Pope, $d 1920- <etc.>510 2# $a Catholic Church. $b Pope (1978-: John Paul II)700 04 $a Juan Pablo $b II, $c Papa, $d 1920- $7 aacr//spa700 06 $a Jean Paul $b II $c pape, $d 1920- $7 aacr//fre700 04 $a Jean Paul $b II $c (Pape), $d 1920- $7 ncafnor context700 04 $a Johannes Paul $c Papst, $b II $7 rak indicator700 04 $a <Chinese heading for John Paul> $b II $c <Chinese designation for
Pope> $d 1920- $7 aacr//chi Alternative established heading (7xx)
8
Record model - HKCAN record 008 941020nc acannaabn |a aaa |||
010 $anr 94034993035 $a(DLC#)nr 94034993a040 $aDLC-R$beng$cDLC-R$dOCoLC$dHkCU$dHkCAN066 $c$1100 1 $aZhou, Ying,$d17th cent.400 1 $wnne$aChou, Ying,$d17th cent.400 1 $aZhou, Fangshu,$d17th cent.400 1 $a 周方叔 ,$d17th cent.400 1 $aChou, Fang-shu,$d17th cent.670 $aChih lin ( 卮林 ), 1992:$bt.p. (Chou Ying)670 $aChung wen ta tz{176}u tien ( 中文大詞典 ):$bv. 6, p. 290 (Chou Ying; of
Ming; native of P{176}u-t{176}ien; t. Fang-shu; author of Chih lin; lived around the mid
of Emperor Ch{176}ung-chen reign)670 $aHis 卮林 : 10 卷 , 附補遺 1 卷 , [1963]:$bt.p. ( 周嬰 )670 $a 中國人名大辭典 , 1934:$bp. 545 ( 周嬰 , 明莆田人 , 字方叔 , 崇禎中以貢
生知上猶縣 , 所著卮林 , 体近類書 )700 1 $a 周嬰 ,$d17th cent
9
Record model – HKCAN record (cont’d)
008 9802097n| acannaab| |a ana040 $aHkCU $cHkCU 066 $c$1 110 1 $aHong Kong (China). $bCensus and Statistics Dept.410 2 $aCensus and Statistics Dept. (Hong Kong, China) 410 2 $aZheng fu tong ji chu (Hong Kong, China) 410 2 $a 政府統計處 ( 香港 , 中國 ) 510 1 $aHong Kong. $bCensus and Statistics Dept. $wa 510 1 $a 香港 . $b 統計處 $wa 510 1 $a 香港 . $b 政府統計處 $wa 667 $aNon-conventional pinyin pairing of 1xx/7xx|5HkCAN670 $aIts Hong Kong social & economic trends = 香港社會及經濟趨勢 , 1995-
: $b1997 ed., cover (Census and Statistics Department Hong Kong Special Administrative Region People's Republic of China 中華人民共和國 香港特別行政區 政府統計處 )
710 1 $a 香港 ( 中國 ). $b 政府統計處
10
Record model – in library system
INNOPAC library system
7xx enhanced redirection function
Re-indexed as “Equivalent heading”
Displayed in WebPAC as “Equivalent heading”
13
TTS (Big5) version vs. XML version
In 1999, developed by a Taiwan software vendor
Further developed and maintained by TTS, another Taiwan software company
Based on Big5 – restriction in character encoding
Lack of Z39.50 design October 2003, a new HKCAN software
developed by Chinese University of Hong Kong – XML version
14
Document Type Definition
HKCAN DTD (Document Type Definition) – to specify the structure of each XML authority record
With this DTD, records can be output to the XML schema or other related schemas if needed
DTD has well-served all the necessary functionality in the present XML platform
15
DTD (cont’d)<?xml version="1.0" encoding="UTF-8"?>
<!--DTD generated by XMLSPY v2004 rel. 3 U (http://www.xmlspy.com)-->
<!ELEMENT Leader (#PCDATA)>
<!ELEMENT Name (Leader, (Tag* | tag_type00* | tag_type10* | tag_type11* | tag_type30* | tag_1xx* | tag_4xx* | tag_5xx* | tag_7xx* | tag_670*)*)>
<!ATTLIST Name
tag001 CDATA #IMPLIED
record_type CDATA #IMPLIED
>
<!ELEMENT Subfield (#PCDATA)>
<!ATTLIST Subfield
subfield_code CDATA #IMPLIED
>
<!ELEMENT Tag (#PCDATA | Subfield*)*>
<!ATTLIST Tag
tagcode CDATA #IMPLIED
record_type CDATA #IMPLIED
ind1 CDATA #IMPLIED
ind2 CDATA #IMPLIED
>
<!ELEMENT tag_1xx (#PCDATA)>
<!ELEMENT tag_4xx (#PCDATA)>
<!ELEMENT tag_5xx (#PCDATA)>
<!ELEMENT tag_670 (#PCDATA)>
<!ELEMENT tag_7xx (#PCDATA)>
<!ELEMENT tag_type00 (#PCDATA)>
<!ELEMENT tag_type10 (#PCDATA)>
<!ELEMENT tag_type11 (#PCDATA)>
<!ELEMENT tag_type30 (#PCDATA)>
16
HKCAN XML platform
Web interface
Records in Communication MARC format with EACC encoding
Records in XML format with EACC encoding
Records in XML format with UTF-8 encoding
HKCAN XML full text search server (Tamino)
for full text search, records display & download
HKCAN index search server
(SQL anywhere 8.0)
Full text searchIndex search
Program to convert records from Communication MARC format to XML format
Program to convert the records from CCCII encoding to UTF-8 encoding
Import the records to a relational database for index search
Retrieve the full record from HKCAN
XML server
MARC
17
Record conversion
00681cz 2200193n 4504001001000000003000600010005001700016008004100033010001600074035002300090040003000113066000700143100003000150670002600180670018700206670005700393678000900450700002800459000000001HkCAN19960504052613.5800523n| acannabb| |n aaa an 50026575 a(DLC#)n 50026575a aDLCcDLCdCUdDLC-RdHkCU c$11 aNakayama, Shigeru,d1928- aHis Senseijutsu, 1963 aKagaku gijutsu to ekoroj{229}i, 1995:bt.p. (Nakayama Shigeru) colophon (r; b. 1928; Ph.D. (from Harvard Univ.); prof., Kanagawa Daigaku; former asst. prof., T{229}oky{229}o Daigaku) a$1!Bs!Ci!O(':`!5=(B, 1999:bp. 2 ($1!04!;e!Th(B) aSc.D1 a$1!04!;e!Th(B,d1928-
From MARC record with EACC encoding
18
Record conversion (cont’d)To XML record with EACC
encoding<Name tag001 = "000000001" record_type = "00">
<Leader>00681cz 2200193n 4504</Leader>
<Tag tagcode = "003" record_type = "" ind1 = "" ind2 = "">HkCAN</Tag>
<Tag tagcode = "005" record_type = "" ind1 = "" ind2 = "">19960504052613.5</Tag>
<Tag tagcode = "008" record_type = "" ind1="" ind2="">800523n| acannabb| |n aaa </Tag>
<Tag tagcode = "010" record_type = "" ind1=" " ind2="">
<Subfield subfield_code = "a">n 50026575 </Subfield>
</Tag>
…..
<Tag tagcode = "670" record_type = "" ind1=" " ind2="">
<Subfield subfield_code = "a">$1!Bs!Ci!O(':`!5=(B, 1999: </Subfield>
<Subfield subfield_code = "b">p. 2 ($1!04!;e!Th(B) </Subfield>
</Tag>
…….
<tag_type00>$1!04!;e!Th(B, | 1928- | </tag_type00>
<tag_7xx>$1!04!;e!Th(B, | 1928- | </tag_7xx>
</Name>
19
Record conversion (cont’d)To XML record with UTF-8
encoding<Name ino:docname="Name:000000001" ino:id="1" record_type="00" tag001="000000001">
<Leader>00681cz 2200193n 4504</Leader>
<Tag ind1="" ind2="" record_type="" tagcode="003">HkCAN</Tag>
<Tag ind1="" ind2="" record_type="" tagcode="005">19960504052613.5</Tag>
<Tag ind1="" ind2="" record_type="" tagcode="008">800523n| acannabb| |n aaa</Tag>
<Tag ind1="" ind2="" record_type="" tagcode="010">
<Subfield subfield_code="a">n 50026575</Subfield>
</Tag>
. ….
<Tag ind1="" ind2="" record_type="" tagcode="670">
<Subfield subfield_code="a"> 日本科学史 , 1999:</Subfield>
<Subfield subfield_code="b">p. 2 ( 中山茂 )</Subfield>
</Tag>
……..
<tag_type00> 中山茂 , | 1928- |</tag_type00>
<tag_7xx> 中山茂 , | 1928- |</tag_7xx>
</Name>
20
Record search Index search (browse search) Full text search (phrase/keyword search) Tamino server: Full text search, record display, record d
ownload SQL Anywhere server: Index search
21
Support of Unicode encoding & display
100 1 $aLu, Dunji 400 1 $aLu, Tun-chi $wnne 670 $aHuang, Z.X. Huang Zongxi shi wen xuan yi ( 黃宗羲
詩文選譯 ), 1991 670 $aHis 自由报人 : 曹聚仁传 , 2003: $bt.p. ( 卢敦基 )
biog. ( 男 , 1962 年生于浙江永康 , 1978 年 3 月入杭州大学中文系 , 1987 年获文学硕士学位 , 现为浙江省社会科学院越文化研究所所长 , 研究员 , 浙江大学在读论文博士 , 中囯作家协会会员 , 著有《风起云扬 -- 汉书随笔》 ,《金庸小说论》等 )
700 1 $a 盧敦基
Traditional Chinese characters
Simplified Chinese characters
Unicode character set – able to accommodate both simplified & traditional Chinese characters
Enables faithful transcription of Chinese names
22
Support of traditional vs. simplified Chinese character
mapping
Traditional/simplified mapping table Tamino server: built-in function to provide l
inked searching Chinese index and phrase searching irres
pective of input characters in simplified or traditional formats can all be supported
24
Export options Options to export records in:
– text – MARC21 or – MARC21 XML
Swapping of 1xx and 7xx fieldsFrom: To:100 1 $aChen, Peigui, $dju ren 1846 100 1 $a 陳培桂 , $d 舉人 1846
700 1 $a 陳培桂 , $d 舉人 1846 700 1 $aChen, Peigui, $dju ren 1846
26
One stop search Inspired by VIAF & the LEAF Project Search across multiple authority files
concurrently HKCAN, Chinese Authority Name Database
(Taiwan), LC Authority File, National Library of China
28
One stop search (cont’d) Coordinating Committee on Chinese Name
Authority was set up in 2003 Members
– JULAC-HKCAN (Hong Kong)– CALIS (China Academic Library & Information
System)– NLC (National Library of China)
Agreements– Chinese name entry should follow international
recommended format– Share each other’s authority databases– Develop authority databases with Unicode encoding
29
Latest statistics Over 123000 records in HKCAN database
Original records created
by HKCAN Workgroup– Over 73800 records
(60%)
Records from Library of
Congress, enhanced by HKCAN Workgroup– Over 49200 records (40%)
40%
60%
LC
Records created bymembers
30
Latest statistics (cont’d) Statistical ratio between different types of authority
records– Personal Names : 85,000 records
– Corporate Names : 15,000 records
– Conference Names : 1,000 records– Uniform Titles: 22,000 records Total: over 123,000 records
31
Difficulties encountered Character-mapping
– from EACC to Unicode– Library of Congress EACC to UTF-8 mapping
tables http://www.loc.gov/marc/specifications/specchareacc.html ; Unihan database http://www.unicode.org/charts/unihan.html ; or INNOPAC UTF-8 mapping table?
– Task Force of UTF-8 mapping in Hong Kong Diacritics
– Searching and display– Diacritics mapping table in INNOPAC library
system
32
Future development & further cooperation
Database in practice - develop editing module, duplication checking function
Promote sharing of existing resources among libraries in Hong Kong, Taiwan & Mainland China
Enhance the effectiveness of Chinese authority works among libraries on an international scale