let's explore chinese i18n/l10n on gnu/linux!anthony fok, thizlinux laboratory ltd.hklug linux...

45
Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002 齊齊齊齊 齊齊齊齊 GNU/Linux GNU/Linux 齊齊齊 齊齊齊 Let's Explore Chinese Let's Explore Chinese internationalization and internationalization and localization on GNU/Linux! localization on GNU/Linux! 齊齊齊 齊齊齊齊齊齊齊齊齊齊 齊齊齊 齊齊齊齊齊齊齊齊齊齊 Anthony Fok, ThizLinux Laboratory Ltd. Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002 HKLUG Linux Talk, 13 April 2002

Post on 30-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

齊來探討 齊來探討 GNU/Linux GNU/Linux 中文化中文化Let's Explore Chinese Let's Explore Chinese

internationalization and localization internationalization and localization on GNU/Linux!on GNU/Linux!

霍東靈,即時系統科研有限公司霍東靈,即時系統科研有限公司Anthony Fok, ThizLinux Laboratory Ltd.Anthony Fok, ThizLinux Laboratory Ltd.

HKLUG Linux Talk, 13 April 2002HKLUG Linux Talk, 13 April 2002

Page 2: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

概覽 概覽 OverviewOverview

● 中文字符集及編碼簡介Introduction to Chinese charsets and encodings– GB 18030-2000 和 HKSCS-2001

● GNU/Linux 系統上的中文 i18n/L10n 架構Chinese i18n/L10n infrastructure on GNU/Linux

● 如何參與中文化的工作Participating in Chinese i18n/L10n

● 待辦工作及未來展望Todo list and future developments

Page 3: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

中文字符集及編碼簡介Chinese character sets and

encodings

Page 4: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

在起初,只有 在起初,只有 0 0 和 和 11In the beginning, there's In the beginning, there's

only 0 and 1only 0 and 1● Computer sees all data as 0s and 1s

● Each “on-off switch” unit is a “bit” (位元、比特 )● 8-bits make up 1“byte”or“octet” (位元組、字節 )● 0000 0000 to 1111 1111 (0x00 to 0xFF) make up

256 code points● Initially, each character is stored in 1 byte

– ASCII (ISO 646 IRV)– ISO 8859-1 至 ISO 8859-16 (Latin1, Latin2,

Greek, Hebrew, Thai, Cyrillic, etc.)– 256 codepoints is NOT enough for Chinese!

Page 5: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

萬「碼」奔騰:眾多中文編碼標準萬「碼」奔騰:眾多中文編碼標準So many charsets and So many charsets and

encodings!encodings!● All Chinese (Han) characters that have

ever existed exceeds 100,000– Unicode 3.2 / ISO 10646 includes over

70,000– CCCII includes over 75,000– Invented in China; adopted by Japan, Korea,

and Vietnam: “CJKV”– Sources include:

● 漢語大字典 (Hanyu Da Zidian)● 康熙字典 (Kangxi Zidian)● Regional Standards (GB, CNS, HKSCS, JIS, KSC)

Page 6: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

1 byte not enough? Let's 1 byte not enough? Let's use more!use more!

● If all bits are available:– 1 byte, 8 bits, 2^8 = 256 (0x00..0xFF)– 2 bytes, 16 bits, 2^16 = 65536

(0x0000..0xFFFF)– 3 bytes, 24 bits, 2^24 = 16,777,216

(0x000000..0xFFFFFF)– 4 bytes, 32 bits, 4,294,967,296

(0x00000000..0xFFFFFFFF)● Most legacy encodings must ensure ASCII

compatibility, so cannot use all the space

Page 7: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

GB 2312-80GB 2312-80

● GB2312 是中國大陸國家標準(國標)– ─ ─《信息技術 信息交換用漢字編碼字符集 基本集》 ,

published in 1980– 2-byte, {0xA1-0xFE}{0xA1-0xFE}, or 94x94,

for a total of 8836 possible 2-byte codepoints.– 6500+ Han characters, for a total of 6700+

chars● Sidenote: GB 12345-T provides a Traditional Chinese

charset encoded in the same space as GB 2312-80● Called zh_CN.GB2312 or zh_CN.EUC-CN on

GNU/Linux– Too few characters! (朱鎔 基 -> 朱容基 )

Page 8: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

GBK GBK 規範 規範 SpecificationSpecification

● China actively participates in ISO 10646● GB13000.1 = Unicode 2.1 (ISO 10646-1993)● Too many legacy GB2312 applications● Need a migration plan, an intermediate solution

● GBK is the first step in that direction (1995)

● Includes the repertoire of the CJK Unified Ideographs in GB13000.1 / Unicode 2.1

● U+4E00 to U+9FA5, over 20000 Han ideographs● Backward compatible with GB2312● Implemented in Windows 95 (simp. Chin) (CP936)● {0x81-0xFE}{0x40-0x7E, 0x80-0xFE}

Page 9: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

Big-5 Big-5 「五大碼」「五大碼」

● A “round-table” standard made up by the “Big-5” companies in Taiwan

● Implemented by all major Chinese OS's– 倚天、零一、國喬、繁體中文 Windows 等等

● Not very well designed, 選字不夠規範– Two characters are duplicated– Missing 「 」 and other chars used in HK– In Taiwan, attempts to fix/extend Big5

basically failed (CMEX's Big-5+, Big-5E...)

Page 10: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

First steps beyond Big-5First steps beyond Big-5

– 倚天 ETen added some characters (Hirigana, Katagana, 「裏、銹」 , etc. (Some call it Big5-ETen). De facto Big5 standard on GNU/Linux

– Microsoft Code Page 950 includes 「裏、銹」etc., but not all of ETen's extensions

● User-Defined Areas (UDA), Vendor-Defined Areas (VDA), EUDC (End-User Defined Characters), Private User Areas (PUA)

– Different people use EUDC differently... a messy situation

– The demise of CMEX's Big-5+ standard

Page 11: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

Unicode / ISO 10646Unicode / ISO 10646

● Unicode Consortium (Industry)● ISO/IEC 10646 (Academic/Int'l Standard)● The two join in their efforts to produce

Unicode / UCS– Universal Multiple-Octet Coded Character Set– ISO: Design, adding characters to repertoire– Unicode Consortium: Technical

implementation● Code range: U+0000 to U+10FFFF

– 1,114,112 possible code points

Page 12: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

Unicode / ISO 10646Unicode / ISO 10646

● Think “integers”: UCS2, UCS4● Think “strings”

– UTF-7– UTF-8

● Variable width, 1 to 4 bytes (up to – UTF-16

● Fixed width 16-bit, with surrogates (U+D800-U+DFFF, high and low doubles up), up to U+10FFFF

– UTF-32● Fixed width 32-bit, up to U+7FFFFFFF

Page 13: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

Unicode / ISO 10646Unicode / ISO 10646

● ISO 10646-1:1993● ISO 10646-1:2000● ISO 10646-2:2001● Unicode 3.2 just came out● More world languages are being

researched and added, a truly worldwide effort.

Page 14: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

香港增補字符集香港增補字符集 -2001-2001HKSCS-2001HKSCS-2001

– A brief history● GCCS ( 政府通用字庫 Government Common

Character Set), 1995● HKSCS-1999

– Official encoding name: BIG5-HKSCS (IANA Registry)● HKSCS-2001

– Actively promoted by ITSD– ITSD (HKSARG) wishes HKSCS-2001 to be

implemented on GNU/Linux too, and actively assists the community by providing guidance and advice

– Excellent official website, open standard(starts from http://www.digital21.gov.hk/eng/hkscs/

Page 15: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

香港中文字範例香港中文字範例Sample HKSCS Chinese TextSample HKSCS Chinese Text● 大家好!你同我一齊玩!● 李、仔、魚涌、深水● 大廈 /有啊!● ( ……仲好似有五個粗口字 ) Hehe...

Page 16: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

GB 18030-2000GB 18030-2000

● GB 18030-2000 Standard● Rationale for a new standard: The 70207+ unified

Han ideographs in Unicode 3.1 won't all fit in the 2-byte codespace of the GBK specification

– ─ ─全名為《信息技術 信息交換用漢字編碼字符集 基本 集的擴充》 (2000-03-17, 2000-11-30)

– Further extends GBK to add 4-byte codespace● More than enough to cover U+0000 to U+10FFFF● Compatible with all future versions of ISO 10646● Backward compatible with GB2312 and GBK

Page 17: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

GB 18030-2000GB 18030-2000

● Why is GB18030 significant?– It solves a pressing issue in China. Finally,

all people's names, geographic names, and ancient text can be properly processed

– It is mandatory: all operating systems sold after 2001-08-31 must support GB18030

– Products must pass GB18030 certification to ensure proper input, editing, screen display, and printing of GB18030 text

– Thiz Linux Desktop was awarded A+ Grade in GB18030 Certification Test!

Page 18: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

GB 18030-2000GB 18030-2000

● 1-byte = ISO 646-IRV (US-ASCII)– {0x00-0x7F}

● 2-byte =~ GBK– {0x81-0xFE}{0x40-0x7E}

● 4-byte● Mapped linearly with Unicode while skipping all

existing mappings● Can be calculated algorithmically● {0x81-0xFE}{0x30-0x39}{0x81-0xFE}{0x30-

0x39)

Page 19: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

GB 18030-2000GB 18030-2000

● Official information hard to find– Hard to obtain the printed version of the

GB18030 standard outside China● Fortunately, many early implementers

and charsets experts have provided info:– Dirk Meyer (Adobe) translated the summary– Markus Scherer (IBM, Unicode Consortium)

provides gb-18030-2000.xml conv. table– Many efforts and interests from others,

including ThizLinux Laboratory

Page 20: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

UnicodeData.txt, UnicodeData.txt, Unihan.txtUnihan.txt

● UnicodeData.txt– Important information on the character

repertoires and control codes in Unicode● Unihan.txt

– Valuable information (attributes) of over 70,000 CJK Unified ideographs

● Source● Pronunciations in CJKV (+ Cantonese and

Mandarin)● Meaning

Page 21: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

實施 實施 HKSCS HKSCS 和 和 GB18030 GB18030 的難的難處處

● HKSCS-2001● CJK Extension B etc. (U+20000 – U+2FFFF), but

not all programs support beyond U+FFFF yet● Lack of fonts

● GB18030● Huge! 4-byte ● Certification● Fonts available, expensive (TrueType or bitmap)

– Both are Unicode solutions, so as Unicode support improves, so will HKSCS and GB18030

Page 22: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

其他中文編碼標準其他中文編碼標準

● CCCII (Chinese Character Codes for Information Exchange)– http://public.ptl.edu.tw/publish/suyan/42/

text_07.htm● CNS 11643● Big-5+, Big-5E● 使用倉頡進行編碼● And many more

Page 23: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

GNU/Linux GNU/Linux 及 及 *BSD *BSD 中文化團中文化團隊隊

● CLE (Chinese GNU/Linux Extension)– A group of pioneering volunteers originally

led by Platin (小虫 )● Debian 中文計劃● FreeBSD 中文化小組● 中、港、台三地的翻譯團隊● Many more CJKV teams and i18n/L10n

worldwide, including Chinese and non-Chinese!

Page 24: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

各大中文 各大中文 GNU/Linux GNU/Linux 發行版本發行版本Major Chinese GNU/Linux Major Chinese GNU/Linux

DistributionsDistributions● 各大中文 GNU/Linux 發行版本

– 即時 Linux 桌面環境 6.0 (Thiz Linux Desktop 6.0)

– Turbolinux 7.0 中文版– 中文 2000 (Chinese 2000)– 沖浪 (Xteam) 、 紅旗 (Red Flag) 、中軟

(COSIX) 、幸福 (Happy) 、百資 (Linpus) 、網虎(XLinux)

● 國外著名而有中文化的 GNU/Linux 發行版本– Debian GNU/Linux, Red Hat Linux, Linux

Mandrake, (SuSE, Slackware), FreeBSD

Page 25: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

Page 26: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

Page 27: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

Page 28: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

GNU C Library (GLIBC)GNU C Library (GLIBC)

● Libc5● Glibc 2.1● Glibc 2.2● Conversion tables

– Big5 (CLE), GBK (Justin Yu, Sean Chen)– big5hkscs.c (Roger So, Ulrich Drepper,

ThizLinux, James Su)– GB18030 (Wu Jian, Ulrich, ThizLinux, James

Su, another version by Yu Shao)

Page 29: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

XFree86 / X XFree86 / X 視窗系統視窗系統X Window SystemX Window System

● XFLD, fontset● Xrender / Xft (Keith Packard)● X-TT, “freetype” module● Addition of Big5-HKSCS encodings

(Roger So)● Addition of GB18030 encoding

(James Su et al.)

Page 30: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

GTK+ and GNOMEGTK+ and GNOME

● GNOME 1.x– Charset handling Based on Glibc and

Xfree86– Good, but not perfect

● GNOME 2.0 (in development)– Pango– Xft

Page 31: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

Qt 3.0.4 and KDE 3.0.1Qt 3.0.4 and KDE 3.0.1

● Qt comes with its own “codecs” in order to be a multiplatform toolkit.– Somewhat tedious... the tables already

created for Glibc must be re-created for Qt● except we cannot directly use Glibc's code

because of licensing issues... No big deal, just extra efforts.

– Good Unicode support; handles everything with Unicode internally.

– Currently only supports UCS2, challenges for HKSCS-2001

Page 32: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

中文輸入平台中文輸入平台Chinese Input Method Chinese Input Method

ServersServers● XCIN● Chinput

– miniChinput– magicChinput

● 楊春白雪● MyIM

Page 33: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

中文輸入法中文輸入法

● 倉頡● 行列 30● 大易● 五筆字型● 智能 ABC、智能拼音● 混合● Many others

Page 34: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

中文字型中文字型Chinese fontsChinese fonts

● 文鼎– AR PL Mingti2L Big5– AR PL SungtiL GB– AR PL KaitiM Big5– AR PL KaitiM GB

● 華康● 方正● 王漢忠十套 GNU GPL 中文字型

– ……可惜格式不太合用

Page 35: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

Web BrowsersWeb Browsers

● Netscape 4.79● Mozilla 0.9.9

– Dillo, Galeon, etc.● Konqueror

Page 36: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

CJK LaTeX and FreeTypeCJK LaTeX and FreeType

● CJK LaTeX Written by Werner Lemberg from Germany– Yes, Werner can speak Chinese too!

Amazing!● FreeType 1.3.1 and FreeType 2.0.9:

– TrueType (and Type1, BDF etc.) font library

– Main authors: David Turner, Robert Wilhelm, Werner Lemberg

Page 37: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

PostScript PostScript 與 與 PDFPDF

● Ghostscript + CJK (GS-CJK)● Adobe's CMaps (HKscs, GBK2K, etc.)● Acrobat Reader 4.05 for Linux does not

come with CMaps (HKscs and GBK2K) that are already in Acrobat Reader 5.0

● Ghostscript and XPDF are constantly improving

Page 38: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

Office SuitesOffice Suites

– OpenOffice.org family (Thiz Office, Kai Office, Red Office)

● Chinese support improving, a joint effort● Excellent i18n/L10n support for all languages

– HancomOffice● Will be based on Qt 3● qbig5hkscscodec.cpp for Qt2 provided by

ThizLinux Laboratory; Hancom ported the code for Qt3

– Lightweight: AbiWord and Gnumeric● Quite good too!

Page 39: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

如何參與 如何參與 GNU/Linux GNU/Linux 中文化中文化How to participate in How to participate in

i18n effortsi18n efforts● Improve existing infrastructure● Work on new areas● Help with localization and translation

efforts● Join a project that you like, whether it is

Chinese i18n/L10n related or not● Help spread the word! :-)

Page 40: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

PO PO 翻譯翻譯

● GNOME 2.0● KDE 3.0● GNU Utilities● Gettext 工具● PO / MO 格式● 用法、編碼 (Usage, encoding issues)● 寧可不譯,不可誤譯● 「非化名的字型」 (平滑字型、反鋸齒字型 )

Page 41: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

參考網站參考網站

– http://cle.linux.org.tw/– http://xcin.linux.org.tw/– http://www.debian.org.hk/intl/zh/– http://linuxfab.cx/– http://www.linuxforum.net/– http://www.unicode.org/– 朱邦復先生工作室 http://www.cflabs.com/– http://www.google.com/

Page 42: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

待辦工作 待辦工作 / TODO/ TODO

● Some programs still need to be revised in order to conform to i18n/L10n infrastructure

● Always room for improvement in terms of ease of use, completeness, and stability

● More people's participations are welcome

Page 43: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

未來發展未來發展Future Developments and Future Developments and

OpportunitiesOpportunities● 手寫板 Handwriting Pad● 語音識別 Voice Recognition● More smart Cantonese input methods?● IIIMF to replace XIM?● OpenType to replace TrueType?● More interesting Chinese language

researches based on GNU/Linux systems?

Page 44: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

● All skills are useful, even if you are not in CS, CE or EE!

● Mathematics, Physics theory● C, C++, Perl, Python, GTK, Qt

– IPA, Jyutping, Japanese, Korean...● e.g. XCIN 作者是讀 Physics...● 語言學 Linguistics, 語音學 Phonetics

● What we can learn during the process– Skills development, learning English,

learning other new languages, meeting friends, and many more!

Comments and SuggestionsComments and Suggestions

Page 45: Let's Explore Chinese i18n/L10n on GNU/Linux!Anthony Fok, ThizLinux Laboratory Ltd.HKLUG Linux Talk, 13 April 2002 齊來探討 GNU/Linux 中文化 Let's Explore Chinese

Let's Explore Chinese i18n/L10n on GNU/Linux! Anthony Fok, ThizLinux Laboratory Ltd. HKLUG Linux Talk, 13 April 2002

歡迎任何問題!Questions? :-)