strategies for developing non-english websites elizabeth j. pyatt instructional designer...

24
Strategies for Strategies for Developing Non-English Developing Non-English Websites Websites Elizabeth J. Pyatt Instructional Designer [email protected] Education Technology Services

Upload: christopher-allen

Post on 24-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Strategies for Strategies for Developing Non-English Developing Non-English

WebsitesWebsites

Elizabeth J. PyattInstructional Designer

[email protected] Technology Services

Page 2: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Supporting Multiple Supporting Multiple LanguagesLanguages Unpopular Language SupportUnpopular Language Support (Easy): (Easy):

All English Alphabet, all the time.All English Alphabet, all the time.““Escribes vous Russki (Russian)? No”Escribes vous Russki (Russian)? No”

Preferred Language SupportPreferred Language Support (Harder): (Harder):Display native scripts and punctuationDisplay native scripts and punctuationDisplay appropriate punctuation/symbolsDisplay appropriate punctuation/symbols«¿Escribes vous «¿Escribes vous РусскийРусский? !Sí!»? !Sí!»

Page 3: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Script versus LanguageScript versus Language Arabic Script used forArabic Script used for – Arabic, Ottoman – Arabic, Ottoman

Turkish, Persian (Farsi), etc.Turkish, Persian (Farsi), etc. Cyrillic Script usedCyrillic Script used for – Russian, for – Russian,

Ukrainian, Uzbek, Bulgarian, etc.Ukrainian, Uzbek, Bulgarian, etc. Serbo-CroatianSerbo-Croatian (1 language) (1 language)

Cyrillic Text = “Cyrillic Text = “SerbianSerbian””Roman (English alphabet) Text = Roman (English alphabet) Text =

““CroatianCroatian”” Hindi-UrduHindi-Urdu (also 1 language) (also 1 language)

(Hin = Devanagari / Urd = Arabic script)(Hin = Devanagari / Urd = Arabic script)

Page 4: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Language of ScriptsLanguage of Scriptsi18n = internationalizationi18n = internationalization

Roman/Latin alphabetRoman/Latin alphabet = English = English alphabetalphabet

CyrillicCyrillic = Russian = Russian RTLRTL =Right to Left (e.g. =Right to Left (e.g.

Arabic/Hebrew)Arabic/Hebrew) CJKCJK = Chinese-Japanese-Korean = Chinese-Japanese-Korean

Chinese has largest character count Chinese has largest character count South AsianSouth Asian = Scripts of India (many) = Scripts of India (many)

Page 5: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Taxonomy of scriptsTaxonomy of scriptsC = Consonant; V = VowelC = Consonant; V = Vowel

AlphabetAlphabet - 1 letter = 1 vowel or - 1 letter = 1 vowel or consonantconsonantRoman, Cyrillic, Greek, Runes, Georgian, Roman, Cyrillic, Greek, Runes, Georgian,

Armenian, etcArmenian, etcTyping - map single letters to characterTyping - map single letters to character

SyllabarySyllabary - 1 character = - 1 character = 1 CV1 CV syllable syllableJapanese, Cherokee, Ethiopic, SumerianJapanese, Cherokee, Ethiopic, SumerianTyping - map CV sequence into characterTyping - map CV sequence into character(e.g. Jap Katagana (e.g. Jap Katagana na-wana-wa = = ナワナワ ))

Page 6: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Taxonomy of scriptsTaxonomy of scriptsC = Consonant; V = VowelC = Consonant; V = Vowel

IdeographicIdeographic (Chinese) - 1 character / 1 (Chinese) - 1 character / 1 meaningmeaningSymbols combined to make compoundsSymbols combined to make compoundsTyping - map CV sequence to list of possible Typing - map CV sequence to list of possible

characterscharacters Ideographic scripts can have syllabary Ideographic scripts can have syllabary

componentcomponent Consonantal SyllabaryConsonantal Syllabary - letters are - letters are

consonants; vowels are diacritics on C’sconsonants; vowels are diacritics on C’sKorean, Thai, languages of India, Cree, etc.Korean, Thai, languages of India, Cree, etc.Typing uses CV sequences. Fonts must alter Typing uses CV sequences. Fonts must alter

characters depending on surrounding soundscharacters depending on surrounding sounds

E.g. SusiE.g. Susi = = suissuis

Page 7: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Scripts & EncodingScripts & Encoding

ASCIIASCII - assign a number to a character - assign a number to a characterExcel Formula Excel Formula =CHAR(65)=CHAR(65) results in results in “A”“A”

Modern Encoding expands the Modern Encoding expands the repertoire beyond ASCII but with repertoire beyond ASCII but with inconsistent implementations for inconsistent implementations for different platforms/scriptsdifferent platforms/scripts

Know the encodingKnow the encoding for your for your script/language. Needed for debugging.script/language. Needed for debugging.

Page 8: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Some Notable EncodingsSome Notable Encodings Latin 1Latin 1 (ISO-8859-1) (ISO-8859-1)

English, Most W. Europe, Africa, Pacific Is., Nat. AmericanEnglish, Most W. Europe, Africa, Pacific Is., Nat. American Latin 2Latin 2 (ISO-8859-2) (Latin 3/Latin 4…) (ISO-8859-2) (Latin 3/Latin 4…)

Central Europe (Hungarian, Polish, Czech)Central Europe (Hungarian, Polish, Czech) Big5 Big5 (Chinese only)(Chinese only), Shift-JIS , Shift-JIS (Japanese only), etc.(Japanese only), etc. ““ISO”ISO” vs. vs. “Windows” Parallel Encodings“Windows” Parallel Encodings (e.g. Hebrew)(e.g. Hebrew)

• ISO-8859-8 (Visual Hebrew)ISO-8859-8 (Visual Hebrew)•Windows-1255 (Windows Hebrew) (also MacHebrew)Windows-1255 (Windows Hebrew) (also MacHebrew)•Parallel ISO/Windows for many scripts (Arabic, Cyrillic, etc)Parallel ISO/Windows for many scripts (Arabic, Cyrillic, etc)

UnicodeUnicode (Super Encoding, (Super Encoding, all scriptsall scripts))““Exotic Latin Alphabet” - Welsh, Hawaiian, Old Irish etc.Exotic Latin Alphabet” - Welsh, Hawaiian, Old Irish etc.Also Chinese, Japanese, Cyrillic, Arabic, Hebrew, Greek…Also Chinese, Japanese, Cyrillic, Arabic, Hebrew, Greek…

Page 9: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Now What do I do?Now What do I do? Step 1Step 1 - Select target languages - Select target languages

(don’t forget English)(don’t forget English) Step 2Step 2 - Determine which encoding - Determine which encoding

supports language. supports language. Step 3Step 3 - Develop properly encoded - Develop properly encoded

page. page. Aim for Unicode (even English).Aim for Unicode (even English). Step 4Step 4 - Declare encoding & - Declare encoding &

language in HTML Meta tagslanguage in HTML Meta tags

Page 10: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

How do I get properly How do I get properly encoded text?encoded text? Latin 1Latin 1 (English, Spanish, French, (English, Spanish, French,

German)German)Use entity codes (e.g. ñ for ñ)Use entity codes (e.g. ñ for ñ)Declare encodingDeclare encoding

Major World LanguageMajor World Language Set up keyboardsSet up keyboardsType in text editor/HTML editorType in text editor/HTML editorDeclare encoding & languageDeclare encoding & language

Undersupported LanguageUndersupported LanguageGet correct fonts/keyboards or “PDF it”.Get correct fonts/keyboards or “PDF it”.

Page 11: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Character Codes (Latin 1 Character Codes (Latin 1 Langs)Langs) Applies to “Western European” languages Applies to “Western European” languages

onlyonly Always use for backwards compatabilityAlways use for backwards compatability

Some examples: Some examples: Accent codes - e.g. Accent codes - e.g. ññ = = ññ Punctuation - e.g. Punctuation - e.g. ©© = = ©© Old Math - e.g. Old Math - e.g. °° = = °° New Math (recent browsers only)New Math (recent browsers only)

Σ = Σ = σ = σ = ∫ = ∫ = ∫∫ ≠ = ≠ = ≠≠

Page 12: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Encoding & Language Tags Encoding & Language Tags Set encoding in headerSet encoding in header

Latin 1<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

Unicode<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Shift_JIS (Japanese)<meta http-equiv="Content-Type" content="text/html; charset=shift_jis">

Declare Page Language (ISO-639 code)Declare Page Language (ISO-639 code)English-U.S.

<html lang=“en-us">

Spanish/French/German/Japanese Document <html lang=“es">fr = French, de = German, zh = Chinese, jp = Japanese, etc.

Spanish P (or any HTML text tag)<p lang=“es">

Page 13: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Challenge Set 1: Challenge Set 1:

How do you insert the name How do you insert the name José José EspiñoEspiño into HTML? into HTML?

How do you declare the language How do you declare the language Spanish? (multiple options)Spanish? (multiple options)

What encoding is needed (assume What encoding is needed (assume English page with Spanish word)English page with Spanish word)

Page 14: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Stray Unicode CharactersStray Unicode Characters You can hard-code a four-digit You can hard-code a four-digit

Unicode numeric code to force a Unicode numeric code to force a character to appear. E.g. (Cyrillic “D” character to appear. E.g. (Cyrillic “D” ДД = = &#1044;&#1044; or or &#x0414;&#x0414; (hex)) (hex))

Best used for small spans of text or Best used for small spans of text or “exotic” Latin characters (e.g.“exotic” Latin characters (e.g. a#/a(a#/a())

If you use hex version, add the “x” If you use hex version, add the “x” prefix and add leading zero (to make prefix and add leading zero (to make 4 digits total) 4 digits total)

Set encoding to “Set encoding to “utf-8utf-8” with meta-tag” with meta-tag

Page 15: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Challenge 2: Challenge 2:

How do you insert the How do you insert the ¿Escribes ¿Escribes vous vous РусскийРусский?? !Sí! !Sí! into HTML? into HTML? (Note: 1st letter capital in (Note: 1st letter capital in Cyrillic)Cyrillic)

How do you declare the page to How do you declare the page to be Unicode?be Unicode?

Page 16: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Setting Up Keyboards for Setting Up Keyboards for Other ScriptsOther Scripts Activate required keyboards from Activate required keyboards from

Control PanelControl Panel or or Systems PreferencesSystems Preferences (OS X)(OS X)

You may need to install language You may need to install language utilities for utilities for East AsianEast Asian and other and other unusual scripts from the System Diskunusual scripts from the System Disk

Quick DemoQuick Demo

Page 17: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Typing with Encoded FontsTyping with Encoded Fonts

Keyboarding utilities which match the Keyboarding utilities which match the “keys” to the right encoded number “keys” to the right encoded number must be installed.must be installed.

Keyboards can arrange one encoding Keyboards can arrange one encoding in several layoutsin several layoutsQWERTYQWERTY (AKA (AKA “transliterated/phonetic”“transliterated/phonetic”))

•Preferred by U.S. studentsPreferred by U.S. students

Native layoutNative layout (native script typewriters) (native script typewriters)•Preferred by native speakers (e.g. instructors)Preferred by native speakers (e.g. instructors)

Page 18: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Dreamweaver/Front Page:Dreamweaver/Front Page:Options for Inputting TextOptions for Inputting Text

A.A. Switch keyboard (editor may add meta tag)Switch keyboard (editor may add meta tag)B.B. Type Type C.C. Or cut and paste encoded textOr cut and paste encoded textD.D. Or Import from international text editors via Or Import from international text editors via

Save As HTMLSave As HTML Global Writer (Windows)Global Writer (Windows) Simple Text (free from Apple)Simple Text (free from Apple) Others for specific scriptsOthers for specific scripts Avoid import from WordAvoid import from Word

Mini Demo 2Mini Demo 2

Page 19: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Challenge 3 (Research): Challenge 3 (Research):

What encodings can I use for What encodings can I use for Russian?Russian?http://ourworld.compuserve.com/homepages/Pa

ulGor/http://www.http://www.bramabrama.com/compute/encode.html.com/compute/encode.html

How about How about Modern GreekModern Greek vs. vs. Ancient Greek?Ancient Greek?http://www.http://www.hrihri.org/fonts/.org/fonts/http://www.http://www.stoastoa.org/.org/unicodeunicode//quickstartquickstart.html.html

Page 20: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Undersupported ScriptsUndersupported ScriptsUltimate ChallengeUltimate Challenge ““Undersupported”Undersupported” = minority = minority

languages, ancient/medieval, small languages, ancient/medieval, small populationspopulations

Third Party utilities may be neededThird Party utilities may be neededUnicode font (TrueType .ttf format)Unicode font (TrueType .ttf format)Keyboard Utility (if you can get it)Keyboard Utility (if you can get it)Print Font for PDF’s (the last resort)Print Font for PDF’s (the last resort)

Test, Test, TestTest, Test, Test (esp. Mac vs. Win) (esp. Mac vs. Win)

Page 21: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

Print Font Print Font vs.vs. Web Web FontFont1.1. Replaces ASCII Replaces ASCII

characters with characters with random charactersrandom characters

2.2. Both parties must Both parties must have same font to have same font to read document read document correctlycorrectly

3.3. Ideal for print/PDF Ideal for print/PDF documents when no documents when no data transmission data transmission occursoccurs

4.4. E.g. E.g. Symbol, WebdingsSymbol, Webdings

1.1. Complies with some Complies with some encoding (e.g. ASCII)encoding (e.g. ASCII)

2.2. Alternative fonts with Alternative fonts with same encoding can be same encoding can be usedused(e.g. Times or Arial)(e.g. Times or Arial)

3.3. Ideal for Web Ideal for Web transmission, still difficult transmission, still difficult for typing purposesfor typing purposes

4.4. E.g. E.g. Arial Unicode, Lucida Arial Unicode, Lucida Sans Unicode, Lucida Sans Unicode, Lucida Grande, TITUS Cyberbit Grande, TITUS Cyberbit (free) (free) etc.etc.

Page 22: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

When Websites show When Websites show GibberishGibberish Problem:Problem: No Encoding Specified No Encoding Specified (see (see

gibberish)gibberish)Go to Go to ViewView menu and manually switch menu and manually switch

encodingencoding Problem:Problem: No HTML entity codes for No HTML entity codes for

accents accents (See gibberish for accented letters)(See gibberish for accented letters)Try switching Try switching ViewView to Latin 1, Windows- to Latin 1, Windows-

1252, MacRoman, UTF-8 (Unicode)1252, MacRoman, UTF-8 (Unicode)

Page 23: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

ANGEL & Other Web ToolsANGEL & Other Web Tools

1.1. Activate keyboards for needed scriptsActivate keyboards for needed scripts SeeSee

http://tlt.psu.edu/suggestions/international/keyboardshttp://tlt.psu.edu/suggestions/international/keyboards

2.2. Open Open Netscape 7/MozillaNetscape 7/Mozilla3.3. Go to ANGEL or other Web toolGo to ANGEL or other Web tool4.4. Switch keyboardsSwitch keyboards5.5. Type!Type!6.6. Users can view in Users can view in Netscape 7/Mozillia, Netscape 7/Mozillia,

IE5+ (Win) or Safari (OSX)IE5+ (Win) or Safari (OSX)

Page 24: Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

¡Escribez Русский!

Where to Find Out MoreWhere to Find Out More

Penn State Computing with AccentsPenn State Computing with Accentshttp://tlt.psu.edu/suggestions/internatiohttp://tlt.psu.edu/suggestions/internatio

nalnal

Titus Cyberbit Unicode Font (free)Titus Cyberbit Unicode Font (free)http://titus.uni-frankfurt.de/indexe.http://titus.uni-frankfurt.de/indexe.htmhtmLook under “Instrumentalia”Look under “Instrumentalia”