Constructing Constructing Linguistic Linguistic
Phylogenetic TreePhylogenetic Tree语言谱系树的构建语言谱系树的构建
Chenxi Shao Zihe LiChenxi Shao Zihe Li
邵晨曦 李子鹤邵晨曦 李子鹤
Tree representation of evolutionTree representation of evolution
Darwin 1859 On the Origin of Species by Darwin 1859 On the Origin of Species by Means of Natural Selection or The Means of Natural Selection or The preservation of Favored Races in the Struggle preservation of Favored Races in the Struggle for Life.for Life.
Tree diagramTree diagram shared, derived characteristics shared, derived characteristics
Phylogenetic treePhylogenetic tree
Zuckerkandl, E. and L. Pauling 1965 Molecules Zuckerkandl, E. and L. Pauling 1965 Molecules as documents of evolutionary history. as documents of evolutionary history. J. theor. J. theor. BiolBiol 8: 357-366. 8: 357-366.
Comparison of DNA sequencesComparison of DNA sequences
R. Lewin 1996R. Lewin 1996
Linguistic family treeLinguistic family tree
From From ScienceScience Feb. 27, 2004 Feb. 27, 2004
Quantitative representationQuantitative representation
Meyers, L. F. and William S-Y. Wang 1963 Meyers, L. F. and William S-Y. Wang 1963 Tree Tree representations in Linguisticsrepresentations in Linguistics. Project on . Project on Linguistics Analysis Report 3, Ohio State Linguistics Analysis Report 3, Ohio State University.University.
How to construct a linguistic phylogentic tree?How to construct a linguistic phylogentic tree? Wang, William S.-Y. and Zhongwei Shen 1992:Wang, William S.-Y. and Zhongwei Shen 1992: Four steps:Four steps:
Selection of charactersSelection of characters Quantization of charactersQuantization of characters Calculation of correlation coefficientsCalculation of correlation coefficients Clustering analysis ——Clustering analysis ——
Selection and encoding of linguistic information
Mathematical algorithm
Mathematical algorithmMathematical algorithm 1. Maximun parsimony 1. Maximun parsimony 最大俭省算法最大俭省算法 Example: subgrouping of Bai dialects Example: subgrouping of Bai dialects
(( Wang 2006Wang 2006 ))
2. Neighbour-joining 2. Neighbour-joining 邻接法邻接法 Example: subgrouping of Yi dialects Example: subgrouping of Yi dialects
(( Wang Wang 汪锋 汪锋 20102010 ))
3. Average linkage 3. Average linkage 平均联结法平均联结法(( UPGMAUPGMA 法)法)
Example: Affinity among Chinese DialectsExample: Affinity among Chinese Dialects(( Cheng, C.C.Cheng, C.C. 郑锦全郑锦全 19881988 ))
4. Minimum spanning 4. Minimum spanning 最短系连法(弗罗最短系连法(弗罗茨瓦夫分类法)茨瓦夫分类法)
Example: Affinity among Chinese DialectsExample: Affinity among Chinese Dialects(( Ma Ma 马希文马希文 19891989 ))
Selection and encoding of Selection and encoding of linguistic informationlinguistic information
1. Unique, Shared innovation characters1. Unique, Shared innovation characters Classical versionClassical version :: the position of the position of Armenian
(( Hübschmann 1875Hübschmann 1875 ))
Skt.Skt. ghgh = = Av.Av. gg, , ghgh,, Arm.Arm. gg,, Balto-SlavicBalto-Slavic gg
|| || ||
hh jj, , zhzh gg, , žž
Modern version: Subgrouping Bai dialects (Wang 2006)Modern version: Subgrouping Bai dialects (Wang 2006) 19 innovation characters generalized from 19 innovation characters generalized from
reconstruction.reconstruction. Convert innovation to 1’s and preservation to 0’sConvert innovation to 1’s and preservation to 0’s
CharacterCharacter TLTL GXGX EQEQ EGEG JMJM JXJX DSDS ZCZC MZLMZL
Split of tone*1aSplit of tone*1a 00 00 00 00 00 00 11 11 00
*Pr –Tʂu*Pr –Tʂu 11 11 11 11 11 00 00 00 00
-i as plural vowel-i as plural vowel 11 00 11 11 11 00 00 00 00
green1[qing1green1[qing1 青青 ]] 11 00 00 11 00 00 00 00 00
………… …… …… …… …… …… …… …… …… ……
Penny in PHYLIPPenny in PHYLIP Rooted treeRooted tree
Problems:Problems:
In many cases, the defining of “innovation” In many cases, the defining of “innovation” is backstepping of assigning value to is backstepping of assigning value to historical phonemes.historical phonemes.
For many sound changes, we have no For many sound changes, we have no consensus on their universality.consensus on their universality.
2. Retention rate of Swadesh-100 words 2. Retention rate of Swadesh-100 words (Wang, William S-Y.(Wang, William S-Y. 1993)1993) Strict correspondence among languagesStrict correspondence among languages
Calculate the proportion of Swadesh-100 words Calculate the proportion of Swadesh-100 words that satisfies strict correspondence rules between that satisfies strict correspondence rules between each two languageseach two languages
Construct a matrix of affinity/distance between Construct a matrix of affinity/distance between each two languageseach two languages
Example: subgrouping of Austro-Yue Languages Example: subgrouping of Austro-Yue Languages (Chen and He 2002)(Chen and He 2002)
WumingWuming LongzhouLongzhou BuyiBuyi ……
WumingWuming 100100 8686 9090 ……
LongzhouLongzhou 8686 100100 7878 ……
BuyiBuyi 9090 7878 100100 ……
…… …… …… …… 100100
Neighbor in PHYLIPNeighbor in PHYLIP Unrooted treeUnrooted tree
Problems:Problems:
Much debate on Swadesh-100 words Much debate on Swadesh-100 words universalityuniversality
choice of wordschoice of words
Uniformity in the rate of changeUniformity in the rate of change
Inspiration from biological studyInspiration from biological study
Phylogenetic classifications of different Phylogenetic classifications of different organisms employ different segments of organisms employ different segments of genegene :: The ProkaryotesThe Prokaryotes :: 16SrDNA16SrDNA (王洪媛、(王洪媛、
江晓路等江晓路等 20042004 )) Some mammalsSome mammals :: 18SrDNA18SrDNA (刘诚刚、(刘诚刚、
杜志恒等杜志恒等 20122012 )) Some fishesSome fishes :: CRY61CRY61 (孙婷、刘伟等(孙婷、刘伟等 201201
22 ))
Difference in Difference in homonymic relationshiphomonymic relationship
Scope of examination: morphemes that can be Scope of examination: morphemes that can be undoubtedly reconstructed to the proto- language.undoubtedly reconstructed to the proto- language.
Compare the differences in homonymic relationships Compare the differences in homonymic relationships among those morphemes between each two among those morphemes between each two languages.languages.
Rel. betweenRel. betweenRel. between L’sRel. between L’sM’sM’s
A-BA-B B-CB-C A-CA-C
X-YX-Y 0 (shared innov.)0 (shared innov.) 1 (innov. B)1 (innov. B) 1 (innov. A)1 (innov. A)
Y-ZY-Z 1 (innov. A)1 (innov. A) 0 (no innov.)0 (no innov.) 1 (innov. A)1 (innov. A)
X-Z X-Z 1 (innov. A)1 (innov. A) 0 (no innov.)0 (no innov.) 1 (innov. A)1 (innov. A)
D valueD value 2/32/3 1/31/3 3/33/3
LanguageLanguageMorphemeMorpheme AA BB CC Proto-LProto-L
XX uu oo aa *a*a
YY uu oo oo *o*o
ZZ uu uu uu *u*u
L-language, M-morpheme, innov-innovation, Rel-relationshipD- difference in homonymic relationships among morphemes between each two languages
difference in homonymic relationships among morphemes between each two languages shows distance between the two languages
Distance matrixDistance matrix
Principle: Principle: difference in homonymic relationships measures unshared innovation.
AA BB CC
AA 00 0.6670.667 11
BB 0.6670.667 00 0.3330.333
CC 11 0.3330.333 00
MeritsMerits :: 1. No need of a universal list of core words. 1. No need of a universal list of core words.
Morphemes that can be reconstructed are basic Morphemes that can be reconstructed are basic morphemes (words) in languages in question.morphemes (words) in languages in question.
2. Generalizing structural changes into differences in 2. Generalizing structural changes into differences in homonymic relationships among basic morphemes. homonymic relationships among basic morphemes. Specific values of sounds are out of consideration.Specific values of sounds are out of consideration.
3. Weight of different sound changes are taken into 3. Weight of different sound changes are taken into consideration. consideration. Changes involving more morphemes are more important.Changes involving more morphemes are more important.
DemeritDemerit Borrowings through correspondence among Borrowings through correspondence among
languages in question cannot be eliminatedlanguages in question cannot be eliminated
Phylogenetic trees of NaxiPhylogenetic trees of Naxi Characters/ParsimonyCharacters/Parsimony
Swadesh-100 words /NJSwadesh-100 words /NJ (( Proto-Naxi as the outgroupProto-Naxi as the outgroup ))
D-value /NJ D-value /NJ (( Proto-Naxi as the outgroupProto-Naxi as the outgroup ))
How to make choice ?How to make choice ?
Character Approach is the optimal choice when you Character Approach is the optimal choice when you have confidence in the specific value of each have confidence in the specific value of each historical phoneme (Indo-European languages)historical phoneme (Indo-European languages)
Swadesh-100 words Approach is optimal choice Swadesh-100 words Approach is optimal choice when internal contact can be identified in the group when internal contact can be identified in the group of languages in question. (Chinese dialects)of languages in question. (Chinese dialects)
D-value Approach is optimal choice when there is no D-value Approach is optimal choice when there is no evidence of internal contact. (Naxi and many other evidence of internal contact. (Naxi and many other minority languages)minority languages)
主要参考文献主要参考文献 Hsieh, H-I. 1973 A new method of dialectal subgrouping.
Journal of Chinese Linguistics 1. 64-92 Hübschmann 1875 “On the position of Armenian in the
Sphere of the Indo-European Languages”. In Lehmann ed. A reader in Nineteenth Century Historical Indo-European Linguistics. Bloomington: Indiana University Press, 1976.
Krishnamurti et al 1983 Unchanged cognates as a criterion in linguistic subgrouping. Language 59. 544-688
Meyers, L. F. and William S-Y. Wang 1963 Tree representations in Linguistics. Project on Linguistics Analysis Report 3, Ohio State University.
Saitou, N. and M. Nei 1987 The neighbor-joining method: a new method of reconstructing phylogenetic trees. Miol. Boil. Evol. 4. 406-425
Wang, Feng. 2006. Wang, Feng. 2006. Comparison of Languages in Contact: the Comparison of Languages in Contact: the Distillation method and the case of BaiDistillation method and the case of Bai. Taipei: Academic . Taipei: Academic Sinica.Sinica.
Wang, William S-Y.Wang, William S-Y. 1993 Glottochronology, lexicostatistics, 1993 Glottochronology, lexicostatistics, and other numerical methods.and other numerical methods. 收入收入《《王士元语言学论文王士元语言学论文集集》》
陆致极 陆致极 1986 1986 《《闽方言内部差异程度及分区的计算机聚类闽方言内部差异程度及分区的计算机聚类分析分析》》,,《《语言研究语言研究》》第第 22 期期
马希文 马希文 1989 1989 《《比较方言学中的计量方法比较方言学中的计量方法》》,,《《中国语中国语文文》》第第 55 期期
汪锋 汪锋 2010 2010 《《白彝关系语素研究白彝关系语素研究 》 》 国家社会科学基金结项国家社会科学基金结项报告报告
王士元、沈钟伟 王士元、沈钟伟 19921992 《《方言关系的计量表述方言关系的计量表述 》 《 》 《中国语中国语文文》》第第 22 期期
郑锦全 郑锦全 1988 1988 《《汉语方言亲疏关系的计量研究汉语方言亲疏关系的计量研究》》,,《《中国中国语文语文》》第第 22 期期