constructing linguistic phylogenetic tree 语言谱系树的构建 chenxi shao zihe li 邵晨曦...

30

Click here to load reader

Upload: elfreda-mason

Post on 12-Jan-2016

331 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Constructing Constructing Linguistic Linguistic

Phylogenetic TreePhylogenetic Tree语言谱系树的构建语言谱系树的构建

Chenxi Shao Zihe LiChenxi Shao Zihe Li

邵晨曦 李子鹤邵晨曦 李子鹤

Page 2: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Tree representation of evolutionTree representation of evolution

Darwin 1859 On the Origin of Species by Darwin 1859 On the Origin of Species by Means of Natural Selection or The Means of Natural Selection or The preservation of Favored Races in the Struggle preservation of Favored Races in the Struggle for Life.for Life.

Tree diagramTree diagram shared, derived characteristics shared, derived characteristics

Page 3: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Phylogenetic treePhylogenetic tree

Zuckerkandl, E. and L. Pauling 1965 Molecules Zuckerkandl, E. and L. Pauling 1965 Molecules as documents of evolutionary history. as documents of evolutionary history. J. theor. J. theor. BiolBiol 8: 357-366. 8: 357-366.

Comparison of DNA sequencesComparison of DNA sequences

Page 4: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

R. Lewin 1996R. Lewin 1996

Page 5: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Linguistic family treeLinguistic family tree

Page 6: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

From From ScienceScience Feb. 27, 2004 Feb. 27, 2004

Page 7: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Quantitative representationQuantitative representation

Meyers, L. F. and William S-Y. Wang 1963 Meyers, L. F. and William S-Y. Wang 1963 Tree Tree representations in Linguisticsrepresentations in Linguistics. Project on . Project on Linguistics Analysis Report 3, Ohio State Linguistics Analysis Report 3, Ohio State University.University.

Page 8: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

How to construct a linguistic phylogentic tree?How to construct a linguistic phylogentic tree? Wang, William S.-Y. and Zhongwei Shen 1992:Wang, William S.-Y. and Zhongwei Shen 1992: Four steps:Four steps:

Selection of charactersSelection of characters Quantization of charactersQuantization of characters Calculation of correlation coefficientsCalculation of correlation coefficients Clustering analysis ——Clustering analysis ——

Selection and encoding of linguistic information

Mathematical algorithm

Page 9: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Mathematical algorithmMathematical algorithm 1. Maximun parsimony 1. Maximun parsimony 最大俭省算法最大俭省算法 Example: subgrouping of Bai dialects Example: subgrouping of Bai dialects

(( Wang 2006Wang 2006 ))

2. Neighbour-joining 2. Neighbour-joining 邻接法邻接法 Example: subgrouping of Yi dialects Example: subgrouping of Yi dialects

(( Wang Wang 汪锋 汪锋 20102010 ))

Page 10: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

3. Average linkage 3. Average linkage 平均联结法平均联结法(( UPGMAUPGMA 法)法)

Example: Affinity among Chinese DialectsExample: Affinity among Chinese Dialects(( Cheng, C.C.Cheng, C.C. 郑锦全郑锦全 19881988 ))

4. Minimum spanning 4. Minimum spanning 最短系连法(弗罗最短系连法(弗罗茨瓦夫分类法)茨瓦夫分类法)

Example: Affinity among Chinese DialectsExample: Affinity among Chinese Dialects(( Ma Ma 马希文马希文 19891989 ))

Page 11: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Selection and encoding of Selection and encoding of linguistic informationlinguistic information

1. Unique, Shared innovation characters1. Unique, Shared innovation characters Classical versionClassical version :: the position of the position of Armenian

(( Hübschmann 1875Hübschmann 1875 ))

Skt.Skt. ghgh   = =  Av.Av. gg, , ghgh,, Arm.Arm. gg,, Balto-SlavicBalto-Slavic gg

   ||       ||    ||      

   hh       jj, , zhzh    gg, , žž      

Page 12: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Modern version: Subgrouping Bai dialects (Wang 2006)Modern version: Subgrouping Bai dialects (Wang 2006) 19 innovation characters generalized from 19 innovation characters generalized from

reconstruction.reconstruction. Convert innovation to 1’s and preservation to 0’sConvert innovation to 1’s and preservation to 0’s

CharacterCharacter TLTL GXGX EQEQ EGEG JMJM JXJX DSDS ZCZC MZLMZL

Split of tone*1aSplit of tone*1a 00 00 00 00 00 00 11 11 00

*Pr –Tʂu*Pr –Tʂu 11 11 11 11 11 00 00 00 00

-i as plural vowel-i as plural vowel 11 00 11 11 11 00 00 00 00

green1[qing1green1[qing1 青青 ]] 11 00 00 11 00 00 00 00 00

………… …… …… …… …… …… …… …… …… ……

Page 13: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Penny in PHYLIPPenny in PHYLIP Rooted treeRooted tree

Page 14: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Problems:Problems:

In many cases, the defining of “innovation” In many cases, the defining of “innovation” is backstepping of assigning value to is backstepping of assigning value to historical phonemes.historical phonemes.

For many sound changes, we have no For many sound changes, we have no consensus on their universality.consensus on their universality.

Page 15: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

2. Retention rate of Swadesh-100 words 2. Retention rate of Swadesh-100 words (Wang, William S-Y.(Wang, William S-Y. 1993)1993) Strict correspondence among languagesStrict correspondence among languages

Calculate the proportion of Swadesh-100 words Calculate the proportion of Swadesh-100 words that satisfies strict correspondence rules between that satisfies strict correspondence rules between each two languageseach two languages

Construct a matrix of affinity/distance between Construct a matrix of affinity/distance between each two languageseach two languages

Page 16: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Example: subgrouping of Austro-Yue Languages Example: subgrouping of Austro-Yue Languages (Chen and He 2002)(Chen and He 2002)

WumingWuming LongzhouLongzhou BuyiBuyi ……

WumingWuming 100100 8686 9090 ……

LongzhouLongzhou 8686 100100 7878 ……

BuyiBuyi 9090 7878 100100 ……

…… …… …… …… 100100

Page 17: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Neighbor in PHYLIPNeighbor in PHYLIP Unrooted treeUnrooted tree

Page 18: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Problems:Problems:

Much debate on Swadesh-100 words Much debate on Swadesh-100 words universalityuniversality

choice of wordschoice of words

Uniformity in the rate of changeUniformity in the rate of change

Page 19: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Inspiration from biological studyInspiration from biological study

Phylogenetic classifications of different Phylogenetic classifications of different organisms employ different segments of organisms employ different segments of genegene :: The ProkaryotesThe Prokaryotes :: 16SrDNA16SrDNA (王洪媛、(王洪媛、

江晓路等江晓路等 20042004 )) Some mammalsSome mammals :: 18SrDNA18SrDNA (刘诚刚、(刘诚刚、

杜志恒等杜志恒等 20122012 )) Some fishesSome fishes :: CRY61CRY61 (孙婷、刘伟等(孙婷、刘伟等 201201

22 ))

Page 20: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Difference in Difference in homonymic relationshiphomonymic relationship

Scope of examination: morphemes that can be Scope of examination: morphemes that can be undoubtedly reconstructed to the proto- language.undoubtedly reconstructed to the proto- language.

Compare the differences in homonymic relationships Compare the differences in homonymic relationships among those morphemes between each two among those morphemes between each two languages.languages.

Page 21: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Rel. betweenRel. betweenRel. between L’sRel. between L’sM’sM’s

A-BA-B B-CB-C A-CA-C

X-YX-Y 0 (shared innov.)0 (shared innov.) 1 (innov. B)1 (innov. B) 1 (innov. A)1 (innov. A)

Y-ZY-Z 1 (innov. A)1 (innov. A) 0 (no innov.)0 (no innov.) 1 (innov. A)1 (innov. A)

X-Z X-Z 1 (innov. A)1 (innov. A) 0 (no innov.)0 (no innov.) 1 (innov. A)1 (innov. A)

D valueD value 2/32/3 1/31/3 3/33/3

LanguageLanguageMorphemeMorpheme AA BB CC Proto-LProto-L

XX uu oo aa *a*a

YY uu oo oo *o*o

ZZ uu uu uu *u*u

L-language, M-morpheme, innov-innovation, Rel-relationshipD- difference in homonymic relationships among morphemes between each two languages

Page 22: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

difference in homonymic relationships among morphemes between each two languages shows distance between the two languages

Distance matrixDistance matrix

Principle: Principle: difference in homonymic relationships measures unshared innovation.

AA BB CC

AA 00 0.6670.667 11

BB 0.6670.667 00 0.3330.333

CC 11 0.3330.333 00

Page 23: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

MeritsMerits :: 1. No need of a universal list of core words. 1. No need of a universal list of core words.

Morphemes that can be reconstructed are basic Morphemes that can be reconstructed are basic morphemes (words) in languages in question.morphemes (words) in languages in question.

2. Generalizing structural changes into differences in 2. Generalizing structural changes into differences in homonymic relationships among basic morphemes. homonymic relationships among basic morphemes. Specific values of sounds are out of consideration.Specific values of sounds are out of consideration.

3. Weight of different sound changes are taken into 3. Weight of different sound changes are taken into consideration. consideration. Changes involving more morphemes are more important.Changes involving more morphemes are more important.

Page 24: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

DemeritDemerit Borrowings through correspondence among Borrowings through correspondence among

languages in question cannot be eliminatedlanguages in question cannot be eliminated

Page 25: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Phylogenetic trees of NaxiPhylogenetic trees of Naxi Characters/ParsimonyCharacters/Parsimony

Page 26: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Swadesh-100 words /NJSwadesh-100 words /NJ (( Proto-Naxi as the outgroupProto-Naxi as the outgroup ))

Page 27: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

D-value /NJ D-value /NJ (( Proto-Naxi as the outgroupProto-Naxi as the outgroup ))

Page 28: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

How to make choice ?How to make choice ?

Character Approach is the optimal choice when you Character Approach is the optimal choice when you have confidence in the specific value of each have confidence in the specific value of each historical phoneme (Indo-European languages)historical phoneme (Indo-European languages)

Swadesh-100 words Approach is optimal choice Swadesh-100 words Approach is optimal choice when internal contact can be identified in the group when internal contact can be identified in the group of languages in question. (Chinese dialects)of languages in question. (Chinese dialects)

D-value Approach is optimal choice when there is no D-value Approach is optimal choice when there is no evidence of internal contact. (Naxi and many other evidence of internal contact. (Naxi and many other minority languages)minority languages)

Page 29: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

主要参考文献主要参考文献 Hsieh, H-I. 1973 A new method of dialectal subgrouping.

Journal of Chinese Linguistics 1. 64-92 Hübschmann 1875 “On the position of Armenian in the

Sphere of the Indo-European Languages”. In Lehmann ed. A reader in Nineteenth Century Historical Indo-European Linguistics. Bloomington: Indiana University Press, 1976.

Krishnamurti et al 1983 Unchanged cognates as a criterion in linguistic subgrouping. Language 59. 544-688

Meyers, L. F. and William S-Y. Wang 1963 Tree representations in Linguistics. Project on Linguistics Analysis Report 3, Ohio State University.

Saitou, N. and M. Nei 1987 The neighbor-joining method: a new method of reconstructing phylogenetic trees. Miol. Boil. Evol. 4. 406-425

Page 30: Constructing Linguistic Phylogenetic Tree 语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤 邵晨曦 李子鹤

Wang, Feng. 2006. Wang, Feng. 2006. Comparison of Languages in Contact: the Comparison of Languages in Contact: the Distillation method and the case of BaiDistillation method and the case of Bai. Taipei: Academic . Taipei: Academic Sinica.Sinica.

Wang, William S-Y.Wang, William S-Y. 1993 Glottochronology, lexicostatistics, 1993 Glottochronology, lexicostatistics, and other numerical methods.and other numerical methods. 收入收入《《王士元语言学论文王士元语言学论文集集》》

陆致极 陆致极 1986 1986 《《闽方言内部差异程度及分区的计算机聚类闽方言内部差异程度及分区的计算机聚类分析分析》》,,《《语言研究语言研究》》第第 22 期期

马希文 马希文 1989 1989 《《比较方言学中的计量方法比较方言学中的计量方法》》,,《《中国语中国语文文》》第第 55 期期

汪锋 汪锋 2010 2010 《《白彝关系语素研究白彝关系语素研究 》 》 国家社会科学基金结项国家社会科学基金结项报告报告

王士元、沈钟伟 王士元、沈钟伟 19921992 《《方言关系的计量表述方言关系的计量表述 》 《 》 《中国语中国语文文》》第第 22 期期

郑锦全 郑锦全 1988 1988 《《汉语方言亲疏关系的计量研究汉语方言亲疏关系的计量研究》》,,《《中国中国语文语文》》第第 22 期期