![Page 1: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/1.jpg)
NGSST 2006 冬季講習會
Automatic Language Identification
Overview & Some Experiments on OGI-TS Corpus
National Tsing Hua UniversityChi-Yueh Lin
2006/1/19
![Page 2: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/2.jpg)
NGSST 2006 冬季講習會
Introduction to LID
Language Identification (LID) applications Pre-processing for machine systems Pre-processing for human listeners
Some authors preferred to use another abbreviation “ALI”, which stands for “Automatic Language Identification”.
![Page 3: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/3.jpg)
NGSST 2006 冬季講習會
Introduction to LID Pre-processing for machine systems
Multi-lingual information retrieval system in hotel lobby or international airport.
English ASR
French ASR
Spanish ASR
Mandarin ASR
LanguageIdentification
System
Information in English
Information in French
Information in Spanish
Information in Mandarin?
?
?
?
![Page 4: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/4.jpg)
NGSST 2006 冬季講習會
Introduction to LID Pre-processing for human-listeners
AT&T Language Line was designed for handling emergency calls
? ??
Delay in the order of minutes
![Page 5: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/5.jpg)
NGSST 2006 冬季講習會
Introduction to LID AT&T Language Line
http://www.languageline.com The service uses trained human
interpreters to handle about 150 languages.
It takes about 3-minute delay to correctly identify “Tamil”.
![Page 6: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/6.jpg)
NGSST 2006 冬季講習會
Introduction to LIDHuman Perceptual Experiment
From “Reviewing Automatic Language Identification”, IEEE Signal Processing Magazine, Oct. 1994.
![Page 7: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/7.jpg)
NGSST 2006 冬季講習會
Introduction to LIDHuman Perceptual Experiment
Comments from the post-experiment interview Phoneme-spotting and word-spotting
strategies Prosodic cues Increased exposure to each language,
performance improved.
![Page 8: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/8.jpg)
NGSST 2006 冬季講習會
Introduction to LID Paper found in IEEE Xplore
Keyword : “language identification”
ICASSP 2006 6 papers
Years Before1980
1980~1989
1990~1999
2000~present
# of papers
No 5 50+ 40+
Golden Age of LID
![Page 9: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/9.jpg)
NGSST 2006 冬季講習會
Introduction to LID Research on LID before 1980 were pri
marily done in Texas Instruments. 1973~1980 (4 papers) Reference template
House and Neuberg (1977 JASA) HMM trained on sequences of broad pho
netic category labels Near-perfect discrimination No real speech data.
![Page 10: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/10.jpg)
NGSST 2006 冬季講習會
Language identification cues
Phonology Phone & phoneme sets differ from one la
nguage to another. Phone & phoneme frequencies of occurre
nce may also differ. Phonotactics.
Prosody Duration, pitch, and stress.
![Page 11: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/11.jpg)
NGSST 2006 冬季講習會
Language identification cues Morphology
Word roots Lexicon
Syntax The sentence patterns are different amon
g languages.
![Page 12: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/12.jpg)
NGSST 2006 冬季講習會
Language identification cues
Phonology
Prosody
Morphology
Syntax
Most of recent LID systems use these two kinds of cues
These cues are seldom used
![Page 13: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/13.jpg)
NGSST 2006 冬季講習會
Language Identification System
![Page 14: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/14.jpg)
NGSST 2006 冬季講習會
LID systems
![Page 15: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/15.jpg)
NGSST 2006 冬季講習會
LID systems Systems vary primarily according to t
heir method for modeling languages. Spectral-similarity approaches Prosody-based approaches Phone-recognition approaches Using multilingual speech units Word level approaches Continuous speech recognition
![Page 16: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/16.jpg)
NGSST 2006 冬季講習會
LID systems System conditions
Content-independent Speaker-independent
![Page 17: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/17.jpg)
NGSST 2006 冬季講習會
LID systemsSpectral-similarity approaches
The earliest automatic LID system. Use conventional spectral or cepstral fe
ature vectors.
![Page 18: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/18.jpg)
NGSST 2006 冬季講習會
LID systemsSpectral-similarity approaches Cimarusti and Ives (1982 ICASSP)
Read speech 5 speakers, 8 languages 100-dim feature vector
15 area functions, 15 autocorrelation coefficients, 5 bandwidths, 15 cepstral coefficients, 15 filter coefficients, 5 formant frequencies, 15 log area ratios, and 15 reflection coefficients.
![Page 19: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/19.jpg)
NGSST 2006 冬季講習會
LID systemsSpectral-similarity approaches
Foil (1986 ICASSP) Noisy radio signals (~5 dB) 3 languages Information from pitch, energy, and
formant 45-dim feature vector
23-dim from energy 22-dim from pitch VQ codebook (10 clusters) for formants
![Page 20: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/20.jpg)
NGSST 2006 冬季講習會
LID systemsSpectral-similarity approaches
Goodman et al. (1989 ICASSP) Improved version of Foil’s work. (~9 dB) 6 languages Formant-cluster algorithm used an LPC-12 auto
correlation analysis. The parameters used were log-amplitude value
s A1, A2, A3, and formant values F1, F2, F3. Formant-based method is superior than LPCC-
based method.
![Page 21: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/21.jpg)
NGSST 2006 冬季講習會
LID systemsSpectral-similarity approaches
Sugiyama (1991 ICASSP) 20 languages VQ based approach
Standard VQ algorithm VQ histogram algorithm (common codebook)
Autocorrelation coefficients, LPC coefficients, delta-cepstrum coefficients.
![Page 22: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/22.jpg)
NGSST 2006 冬季講習會
LID systemsSpectral-similarity approaches Zissman (1993 ICASSP) applied GMM t
o LID task.
C: Cepstrum, D: Delta-cepstrum
![Page 23: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/23.jpg)
NGSST 2006 冬季講習會
LID systemsProsody-based approaches
Savic (1991 ICASSP) Pitch information is useful for discriminatin
g Spanish from Mandarin Human can use prosodic features (Muth
usamy, 1994 ICASSP) Tonal-languages (Mandarin, Vietnamese) Speech rate (Spanish)
![Page 24: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/24.jpg)
NGSST 2006 冬季講習會
LID systemsProsody-based approaches Itahashi (1994 ICSLP) argues that pitc
h estimation is more robust in noisy environment. Based on fundamental frequency, 21 feat
ures totally. Polygonal line approximation of F0 patte
rn. Use PCA to perform discriminant analysis
![Page 25: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/25.jpg)
NGSST 2006 冬季講習會
LID systemsProsody-based approaches Thyme-Gobbel (1996 ICSLP)
Syllable-based pitch contour Syllable duration Amplitude Rhythm Phrase location Pitch is the most distinguishable feature.
![Page 26: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/26.jpg)
NGSST 2006 冬季講習會
LID systemsProsody-based approaches Ramus (1999 JASA)
A study based on speech resynthesis. Global intonation (aaaa, sasasa) Syllabic rhythm (sasasa ,flat sasasa) Broad phonotactics (saltanaj)
![Page 27: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/27.jpg)
NGSST 2006 冬季講習會
LID systemsProsody-based approaches Rouas (2003 ICASSP, 2005 Speech Co
mm.) Rhythmic parameter
Duration of consonant and vowel Complexity of CV segment.
Fundamental frequency parameter Skewness and kurtosis of F0 Accent location
![Page 28: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/28.jpg)
NGSST 2006 冬季講習會
LID systemsProsody-based approaches
Rouas (2005 Eurospeech) Long-term and short-term prosody modeling. N-gram model. Long-term
Prosodic movements over several pseudo-syllables
Short-term Prosodic movements inside a pseudo-syllable.
![Page 29: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/29.jpg)
NGSST 2006 冬季講習會
LID systemsProsody-based approaches
![Page 30: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/30.jpg)
NGSST 2006 冬季講習會
LID systemsProsody-based approaches Lin (2005, 2006 ICASSP)
Pseudo-syllable segmentation Pitch contours were represented by a set
of Legendre polynomials Dynamic model instead of static model
![Page 31: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/31.jpg)
NGSST 2006 冬季講習會
LID systemsProsody-based approaches However, Hazen (1993) showed that fe
atures derived from prosodic information provided little language discriminability when compared to a phonetic system. Performance of approach based on proso
dic information degrades in N-way identification task when N becomes large.
![Page 32: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/32.jpg)
NGSST 2006 冬季講習會
LID systemsProsody-based approaches Advantage of prosody-based
system Robust to channel effect and noise. Require little transcriptions and
training data.
![Page 33: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/33.jpg)
NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches
Different languages have different phone inventories and different phonotactics.
Zissman (1994 ICASSP) PRLM P-PRLM
![Page 34: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/34.jpg)
NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches
Phone recognition followed by language modeling (PRLM)
N-gram probability distributions are trained from the output of the single-language phone recognizer, not from human-supplied labels.
![Page 35: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/35.jpg)
NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches Parallel PRLM (PPRLM, an
extension of PRLM)
![Page 36: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/36.jpg)
NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches
PPRLM tries to incorporate phones from more than one language into a PRLM-like system. The only limitation is the number of
languages for which labeled training speech is available.
Achieve the best performance among all methods in LID task.
![Page 37: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/37.jpg)
NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches
Yan (ICASSP 1995)Forward-Bigra
m
Backward-Bigram
Combination
![Page 38: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/38.jpg)
NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches
Torres-Carrasquillo (2002, ICASSP & ICSLP) Variation of PRLM-like system. Use GMM tokenizer instead of phone recognizer
as front-end processing. Language models are trained on the values of
“token index”. Shifted delta cepstral feature. Do not need any transcription.
![Page 39: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/39.jpg)
NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches
Feature vector Xn is representedby token index 2.
Token sequence2221321113323111123213213…
Apply language model
![Page 40: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/40.jpg)
NGSST 2006 冬季講習會
LID systemsPhone-recognition approaches
To make phone-recognition-based LID systems easier to train, one can use a single-language phone recognizer as a front end to a system that uses phonotactic scores to perform LID.
Language ID could be performed successfully even when the front end phone recognizer(s) was not trained on speech spoken in the languages to be recognized.
![Page 41: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/41.jpg)
NGSST 2006 冬季講習會
LID systemsUsing multilingual speech units
Focus on the problem of identifying and processing only those phones that carry the most language discriminating information. Mono-phonemes
Phonemes whose acoustic realizations in one language overlap little or not at all with those in another language.
Poly-phonemes Phonemes whose acoustic realizations are similar
enough across many languages.
![Page 42: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/42.jpg)
NGSST 2006 冬季講習會
LID systemsUsing multilingual speech units Dalsgaard (ICSLP 1994)
Four European languages Danish, English, German, Italian
134 phoneme models
K
mI
mG
mD
mUK
p
0
Mono-phonemesPoly-phonemes
![Page 43: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/43.jpg)
NGSST 2006 冬季講習會
LID systemsUsing multilingual speech units Berkling (1994 ICASSP)
3 languages (English, German, Japanese)
Label
Ratio Language with largerfrequency of occurence
f (1.3) GE
![Page 44: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/44.jpg)
NGSST 2006 冬季講習會
LID systemsUsing multilingual speech units Köhler (1998)
Single multi-language (6 languages) front end phone recognizer.
24 mel-scaled cepstral, 12 delta cepstral, 12 delta delta cepstral, energy, delta energy, delta delta energy.
Feature vectors were transformed by a LDA.
Monophones -> multilingual phones
![Page 45: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/45.jpg)
NGSST 2006 冬季講習會
LID systemsWord level approaches
These systems use more sophisticated sequence modeling than the phonotactic models of the phone-level systems, but do not employ full speech-to-text systems.
![Page 46: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/46.jpg)
NGSST 2006 冬季講習會
LID systemsWord level approaches Kadambe and Hieronymus (1995)
Trigram phonotactics & lexicon matching 4 languages
![Page 47: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/47.jpg)
NGSST 2006 冬季講習會
LID systemsWord level approaches Ramesh and Roe (1994)
Use of embedded word models of frequently occurring words and phrases.
Multiple-mixture left-to-right CDHMM, LPC cepstrum based features.
![Page 48: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/48.jpg)
NGSST 2006 冬季講習會
LID systemsWord level approaches Lund and Gish (1995 Eurospeech)
Pseudo-word Language Model (PWLM) Pseudo-words are the frequently occurri
ng sub-sequences within the phoneme recognition output.
Finding pseudo-word candidates is a time-consuming task.
![Page 49: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/49.jpg)
NGSST 2006 冬季講習會
LID systemsWord level approaches
Gao (2005 Eurospeech) Applied techniques from document retrieval.
Spoken document categorization Latent semantic indexing
![Page 50: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/50.jpg)
NGSST 2006 冬季講習會
LID systemsContinuous speech recognition
Several large-vocabulary continuous-speech recognition systems were used in parallel for language ID. Architecture is similar to PRLM and PPRLM During testing, recognizers run in parallel,
and the one yielding output with highest likelihood is selected as the winning recognizer.
Sometime was called parallel phone recognition (PPR).
![Page 51: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/51.jpg)
NGSST 2006 冬季講習會
LID systemsContinuous speech recognition
Biased scores problem
Recognizer-dependent bias
![Page 52: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/52.jpg)
NGSST 2006 冬季講習會
LID systemsContinuous speech recognition Lamel (1994 ICASSP)
English & French 46 CI phone models for English 35 CI phone models for French 99% for laboratory read speech on 2s utt
erance. 76% for telephone spontaneous speech o
n 2s utterance.
![Page 53: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/53.jpg)
NGSST 2006 冬季講習會
LID systemsContinuous speech recognition Mendoza (1996 ICASSP)
English, Japanese, Spanish Bias removal via “Score – Best Score”
strategy. Score : score from conventional
recognizer Best Score : score from raw acoustic
match
![Page 54: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/54.jpg)
NGSST 2006 冬季講習會
LID systemsContinuous speech recognition
Schultz (1996 ICASSP) 4 language-dependent LVCSR run in parallel. German, Japanese, English, Spanish Acoustic, phonotactic rule, lexicon, gramma
r
![Page 55: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/55.jpg)
NGSST 2006 冬季講習會
LID systemsContinuous speech recognition Need language-dependent labels
for each language. More difficult to implement than
any of other systems.
![Page 56: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/56.jpg)
NGSST 2006 冬季講習會
LID systemsMultiple Systems Fusion
Statistic fusion strategies (very common) GMM Neural network
Parris (1995 ICASSP) Logistic function
Gutierrez (2003 ICASSP) Performance Confidence Index Dempster-Schafer Theory of Evidence
![Page 57: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/57.jpg)
NGSST 2006 冬季講習會
Corpus for Language Identification Task
![Page 58: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/58.jpg)
NGSST 2006 冬季講習會
Corpus for Language Identification
In early years, no corpus was collected for language identification task.
Experiments were conducted on small amount of data.
Things change since 1994….
![Page 59: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/59.jpg)
NGSST 2006 冬季講習會
Corpus for Language Identification
Corpus available in Linguistic Data Consortium. OGI-TS – 10 languages (1994) CallFriend – 12 languages (1996) CallHome – 6 languages (1997) CSLU – 22 languages (2005)
![Page 60: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/60.jpg)
NGSST 2006 冬季講習會
OGI-TS Corpus Oregon Graduate Institute Multi-Languag
e Telephone Speech Corpus Collected by Yeshwant Muthusamy. 10 languages
English, Farsi, French, German, Japanese, Korean, Mandarin, Spanish, Tamil, Vietnamese.
90 calls for each language. 50 calls in the Training Set 20 calls in the Development Set 20 calls in the Evaluation Set
Hindi wasadded
afterward
![Page 61: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/61.jpg)
NGSST 2006 冬季講習會
OGI-TS Corpus Corpus used for NIST LID evaluation in
1996. Initial label: 7 Broad Phonetic
Categories Vowel, Fricative, Silence or Closure, Stop,
pre-vocalic sonorant, inter-vocalic sonorant, post-vocalic sonorant.
![Page 62: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/62.jpg)
NGSST 2006 冬季講習會
OGI-TS Corpus Three Types of utterances
fixed, useful vocabulary speech domain-specific vocabulary speech unrestricted vocabulary speech
Three durations of utterances 3 sec 10 sec 45 sec
![Page 63: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/63.jpg)
NGSST 2006 冬季講習會
OGI-TS Corpus Contents of file
nlg - native language (3 sec) clg - common language (3 sec) dow - days of the week (10 sec) num - number 0 thru 10 (10 sec) htl - hometown likes (10 sec) htc - hometown climate (10 sec) roo - room description (10 sec) mea - description of most recent meal (10 sec) stb - free speech before the tone (45 sec) sta - free speech after the tone (10 sec)
![Page 64: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/64.jpg)
NGSST 2006 冬季講習會
OGI-TS Corpus For more information about this corp
us, refer to Muthusamy’s Ph.D. dissertation. Y. K. Muthusamy, "A Segmental Approach
to Automatic Language Identification," Ph.D. Thesis, OGI Technical Report No. CSLU 93-002,Nov. 24, 1993.
![Page 65: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/65.jpg)
NGSST 2006 冬季講習會
Muthusamy’s work on OGI-TS Broad phonetic category PLP spectral feature Neural network-based broad
phonetic segmentation algorithm
![Page 66: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/66.jpg)
NGSST 2006 冬季講習會
Muthusamy’s work on OGI-TS
Pair-wise LID
From Muthusamy’s dissertation
![Page 67: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/67.jpg)
NGSST 2006 冬季講習會
Muthusamy’s work on OGI-TS
From Muthusamy’s dissertation
![Page 68: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/68.jpg)
NGSST 2006 冬季講習會
Some Experiments on
OGI-TS Corpus
![Page 69: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/69.jpg)
NGSST 2006 冬季講習會
System Prosody-based System
Pitch Information Duration Information Static modeling & dynamic modeling
Phone-recognition System PRLM Front-end recognizer : English
GMM-Tokenizer
![Page 70: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/70.jpg)
NGSST 2006 冬季講習會
Prosody System Identify languages mainly based on
prosodic cues. Rhythmic categories
Stress-timed languages (Morse-Code) Syllable-timed languages (Machine Gun) Tonal languages Mora-timed languages
![Page 71: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/71.jpg)
NGSST 2006 冬季講習會
Prosody System Why pitch ?
From the previous research, pitch had been widely investigated and found useful in language identification task.
![Page 72: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/72.jpg)
NGSST 2006 冬季講習會
Prosody System Pitch Contour Extraction
Method proposed by P. Boersma (1993) Autocorrelation-based Find best path through several candidat
es with help of dynamic programming
![Page 73: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/73.jpg)
NGSST 2006 冬季講習會
Prosody SystemPitch Contour Segmentation Information from the smoothed
version of energy contour. Valley points of energy contours
are candidates for segmentation. Duration constraint
No less than 50ms
![Page 74: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/74.jpg)
NGSST 2006 冬季講習會
Prosody System Pitch Contour Representation
For most of previous work, pitch contours are approximated by polygonal lines.
In our recent work, we use Legendre polynomials instead.
![Page 75: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/75.jpg)
NGSST 2006 冬季講習會
Prosody System Pitch Contour Representation Legendre polynomials
![Page 76: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/76.jpg)
NGSST 2006 冬季講習會
Prosody System Pitch Contour Representation Legendre polynomials
P0 : Pitch height P1 : Pitch slope P2 : Pitch curvature P3 : Pitch S-curvature
![Page 77: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/77.jpg)
NGSST 2006 冬季講習會
Prosody System Pitch Contour Representation
In most cases, small value of M is sufficient.
Approximated Pitch Contour
i-th ordercoefficient
i-th orderLegendre polynomial
![Page 78: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/78.jpg)
NGSST 2006 冬季講習會
Prosody System Pitch Contour Representation Legendre polynomial
Orthogonal property
mnnm ndxxPxP
12
21
1
![Page 79: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/79.jpg)
NGSST 2006 冬季講習會
Prosody System Pitch Contour Representation
11
100
1
00
0
0
,
,~
,
,~
PP
PPafa
PP
Pfa
33
3
2
03
22
2
1
02
,
,~
,
,~
PP
PPaf
a
PP
PPaf
a
iii
iii
f
ff 2~
Inner Product Operator
![Page 80: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/80.jpg)
NGSST 2006 冬季講習會
Prosody System Pitch Contour Representation
In our previous work (ICASSP 2005), the most useful features are Duration of pitch contour Coefficient of first order Legender polynomial Coefficient of second order Legender polynom
ial
![Page 81: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/81.jpg)
NGSST 2006 冬季講習會
Prosody System Pitch Contour Representation Each pitch contour is represented
by a set of Legendre polynomial coefficients and duration.
t
t
t
t
a
a
d
v
2
1
Pitch slope
Pitch curvature
Pitch duration
![Page 82: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/82.jpg)
NGSST 2006 冬季講習會
Prosody System Models for LID
In ICASSP 2005, feature vectors for language are modeled by a GMM
In ICASSP 2006, ergodic Markov model is used to further improve the performance.
Static model -> dynamic model
tv
![Page 83: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/83.jpg)
NGSST 2006 冬季講習會
Prosody System Models for LID - GMM
T
t
N
nnntn
T
tt
vw
vpL
1 1
1
,log
log
t
t
t
t
a
a
d
v
2
1
T : Index of pitch contourn : Index of mixture, N=64 here.
l : Index of language
![Page 84: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/84.jpg)
NGSST 2006 冬季講習會
Prosody System Models for LID – Ergodic Markov Model
D1
D3
D4
D5
D2D6D1: dt 50ms~100msD2: dt 100ms~150msD3: dt 150ms~200msD4: dt 200ms~250msD5: dt 250ms~300msD6: dt 300ms~
6,5,4,3,2,1 where ,ˆ
ˆ
DDDDDDDDd
dQd
t
tDt
Duration QuantizerQuantized Duration Index
![Page 85: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/85.jpg)
NGSST 2006 冬季講習會
D3 D2D1
Prosody System Models for LID – Markov Model
21
11
1
1
55
a
a
msd
v
22
12
2
2
185
a
a
msd
v
23
13
3
3
130
a
a
msd
v
21
111 a
av
22
122 a
av
23
133 a
av
)(DQ
)(DQ)(DQ
![Page 86: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/86.jpg)
NGSST 2006 冬季講習會
Prosody System Models for LID – Markov Model
D1
D3
D4
D5
D2D6
Each state is modeled by a
8-component GMM
Transition probabilities are estimated by ML criterion, and these
probabilitiescan be Bi-gram, Tri-
gram,or Mixture of Bi-
grams.
![Page 87: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/87.jpg)
NGSST 2006 冬季講習會
Prosody System Models for LID –Bi-gram
21
1
ˆ,ˆ,
1
ˆ,T
1t
11ˆ
1
and of function is and ,10 here w
ˆˆlog1,log
ˆˆlog1ˆ;log
log
LL
ddpvNw
ddpDdvp
vpL
ttd
nd
nt
N
n
dn
T
ttttdt
T
tt
Bi
ttt
t
t
tt a
av
2
1
![Page 88: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/88.jpg)
NGSST 2006 冬季講習會
Prosody System Models for LID – Tri-gram
T
tttttdt
T
tt
Tri
dddpDdvp
vpL
t1
21ˆ
1
ˆ,ˆˆlog1ˆ;log
log
t
tt a
av
2
1
![Page 89: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/89.jpg)
NGSST 2006 冬季講習會
Prosody System Models for LID –Mixture of Bi-grams Approximate tri-gram with mixture of
bi-grams Overcome the problem of insufficient trai
ning data while training trigram model.
1 and allfor 10 where
ˆˆˆ,ˆˆ
n
121
n
ddpdddp
n
N
nnttnttt
![Page 90: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/90.jpg)
NGSST 2006 冬季講習會
Prosody System Models for LID –Mixture of Bi-grams
21
1
1
ˆˆlog1ˆˆlog1
ˆ;log
log
tttt
T
ttdt
T
tt
Mix
ddpddp
Ddvp
vpL
t
![Page 91: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/91.jpg)
NGSST 2006 冬季講習會
Prosody System Models for LID –Mixture of Bi-grams
td̂2ˆtd3
ˆtd
1
1ˆtd
1 1
![Page 92: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/92.jpg)
NGSST 2006 冬季講習會
Prosody System Pair-wise LID Task
45 pair-wise language identification task.
10-sec & 45-sec utterances 10-sec : HTC, HTL, ROO, MEA 45-sec : STB
Domain specific utterances
Unrestricted domain utterances
![Page 93: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/93.jpg)
NGSST 2006 冬季講習會
Prosody System Pair-wise LID Task (avg. 45 pairs)
GMM Dynamic/Bigram
Dynamic/Trigram
Dynamic/Mix
45-sec 68.91% 80.23%(16.43%)
79.62%(15.54%)
81.35%(18.05%)
10-sec 65.45% 69.83%(6.69%)
68.84%(5.18%)
70.02%(6.98%)
GMM
GMMDynamic
Rate
RateRate Relative
Improvement
![Page 94: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/94.jpg)
NGSST 2006 冬季講習會
Prosody System Pair-wise LID Task on 45-sec ( L vs {others} )
45s GMM DMix Rel. GMM DMix Rel.EN- 67.03 81.84 22.09 KO- 67.67 82.71 22.22
FA- 74.48 85.05 14.18 MA- 76.54 83.41 8.97
FR- 61.00 71.51 17.23 SP- 61.26 73.31 19.67
GE- 63.77 84.65 32.75
TA- 63.05 76.75 21.73
JA- 79.10 86.08 8.82 VI- 74.82 88.21 17.90
![Page 95: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/95.jpg)
NGSST 2006 冬季講習會
Prosody System Pair-wise LID Task on 45-sec ( L vs {others} )
10s GMM DMix Rel. GMM DMix Rel.EN- 59.31 65.99 11.26 KO- 64.47 69.57 7.91
FA- 68.05 70.98 4.30 MA- 71.69 73.09 1.94
FR- 60.00 66.91 11.52 SP- 58.32 64.23 10.13
GE- 61.32 70.02 14.19
TA- 61.53 67.71 10.05
JA- 81.60 79.48 -3.83 VI- 68.20 73.24 7.39
![Page 96: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/96.jpg)
NGSST 2006 冬季講習會
Prosody System Pair-wise LID Task
Stress-timed languages, like English and German, benefit from this dynamic topology.
Syllable-timed languages, like French and Spanish, benefit from this topology also, but still not good enough.
Pitch-accent and tonal languages only improve a little.
![Page 97: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/97.jpg)
NGSST 2006 冬季講習會
PRLM SystemFront-end Phone Recognizer
Design a front-end English phone recognizer.
48 phonetic units are selected from TIMIT database.
Each phonetic units are modeled by 3-state left-to-right mono-phone HMM.
![Page 98: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/98.jpg)
NGSST 2006 冬季講習會
PRLM System48 phonetic units from TIMIT
Stops (6) b d g p t k
Affricates (2) jh ch
Fricatives (8) s sh z zh f th v dh
Nasals (6) m n ng em en eng
Semivowels & Glides (6)
l r w y hh el
Vowels (18) iy ih eh ey ae aa aw ay ah ao oy ow uh uw er ax ix axr
Non-speech (2) sil non-phonetic(pau, epi, h#)
![Page 99: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/99.jpg)
NGSST 2006 冬季講習會
PRLM SystemTraining Phase
Use the English phone recognizer mentioned above to decode the utterances from the training set in the OGI-TS corpus with null-gram language model.
For each language, its corresponding language model is trained on those decoded phone sequences.
![Page 100: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/100.jpg)
NGSST 2006 冬季講習會
PRLM SystemTraining Phase
EnglishPhone Recognizer
English Language Model
French Language Model
Spanish Language Model
Mandarin Language Model
English
French
Mandarin
Do NOT use any language model while
decoding
/a/, /m/, …
/aa/, /en/, …
/jh/, /ey/, …
/b/, /ae/, …
These language models will be used in the evaluation phase
OGI-TSTraining
Set
OtherCorpus
![Page 101: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/101.jpg)
NGSST 2006 冬季講習會
PRLM SystemEvaluation Phase
EnglishPhone
Recognizer
Do NOT use any language model while
decoding
/a/, /m/, /sh/, …
English LMFrench LMSpanish LMMandarin LM…
UnknownLanguage
? ?
EN LM Score
FR LM Score
SP LM Score
MA LM Score
… LM Score
PICK MAXOr
Back-endClassifier
Phone sequence only, accompanied acoustic
scores contribute a little
It’s French !
![Page 102: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/102.jpg)
NGSST 2006 冬季講習會
PRLM SystemPair-wise LID Task
PRLMBigram
Dynamic/Mix
Muthusamy’swork
45-sec 91.05% 81.35% 85.2%
10-sec 83.14% 70.02% 75.8%
phone pitch Broad Phonetic
![Page 103: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/103.jpg)
NGSST 2006 冬季講習會
PRLM SystemPair-wise LID Task (avg. 45 pairs) ( L vs {others} )
45s PRLM DMix PRLM DMix
EN- 94.55% 81.84% KO- 90.36% 82.71%
FA- 94.49% 85.05% MA- 90.62% 83.41%
FR- 87.94% 71.51% SP- 85.54% 73.31%
GE- 92.64% 84.65% TA- 95.02% 76.75%
JA- 91.80% 86.08% VI- 86.39% 88.21%
![Page 104: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/104.jpg)
NGSST 2006 冬季講習會
PRLM SystemPair-wise LID Task (avg. 10 pairs) ( L vs {others} )
10s PRLM DMix PRLM DMix
EN- 85.70% 65.99% KO- 84.50% 69.57%
FA- 84.05% 70.98% MA- 87.16% 73.09%
FR- 83.38% 66.91% SP- 79.13% 64.23%
GE- 85.32% 70.02% TA- 87.19% 67.71%
JA- 86.51% 79.48% VI- 81.57% 73.24%
![Page 105: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/105.jpg)
NGSST 2006 冬季講習會
Prosody vs PRLM45-sec utterances
Performance on 45s Utterances
0
20
40
60
80
100
EN FA FR GE JA KO MA SP TA VI
Language
Iden
tifica
tion
Rat
e
PRLM
Prosody
![Page 106: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/106.jpg)
NGSST 2006 冬季講習會
Prosody vs PRLM10-sec utterances
Performance on 10s Utterances
0
20
40
60
80
100
EN FA FR GE JA KO MA SP TA VI
Language
Iden
tifica
tion
Rat
e
PRLM
Prosody
![Page 107: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/107.jpg)
NGSST 2006 冬季講習會
Prosody vs PRLMError reduction rate when the length of testing utterance increases from 10s to 45s
PRLM Prosody
PRLM Prosody
EN- 61.89% 46.60 % KO- 37.80 % 43.18 %
FA- 65.45 % 48.48 % MA- 26.95 % 38.35 %
FR- 27.43 % 13.90 % SP- 30.71 % 25.38 %
GE- 49.86 % 48.80 % TA- 61.12 % 27.99 %
JA- 39.21 % 32.16 % VI- 26.15 % 55.94 %PRLM benefits more fromlonger utterances
![Page 108: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/108.jpg)
NGSST 2006 冬季講習會
GMM-Tokenizer SystemIntroduction
Simplified version of PRLM Use GMM-Tokenizer instead of phone rec
ognizer as front-end processing. Do not need any transcription in the trai
ning set.
![Page 109: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/109.jpg)
NGSST 2006 冬季講習會
GMM-Tokenizer SystemIntroduction
GMM Tokenizer
![Page 110: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/110.jpg)
NGSST 2006 冬季講習會
GMM-Tokenizer SystemIntroduction
38-dim MFCC 12 cepstra 12 delta cepstra 12 delta-delta cepstra Delta energy Delta-delta energy
30 Shifted-delta-cepstrum (SDC) (N, d, p, k) = (10, 1, 3, 3)
![Page 111: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/111.jpg)
NGSST 2006 冬季講習會
GMM-Tokenizer SystemIntroduction
N : 單一音框計算出的倒頻譜參數維度d : 差分化音框大小p : 串接差分向量的音框距離k : 串接差分向量的個數
•Delta parameter
•shifted delta parameter
11
ct-2 ct-1 ct ct+1 ct+2
2 2 d =2
Δct-2 Δct-1 Δct Δct+1 Δct+2 Δct+3 Δct+4
4
1
2
tct
ct
c
N=10, d=2, p=3, k=3
d
d
ttt
ccc
1
2
1
2
)(
k=3
p=3
![Page 112: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/112.jpg)
NGSST 2006 冬季講習會
GMM-Tokenizer SystemIntroduction
1 2 1 1 0( | ) ( | ) ( )t t t t tP w w P w w P w
Token sequence2221321113323111123213213…
Speech
![Page 113: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/113.jpg)
NGSST 2006 冬季講習會
GMM-Tokenizer SystemIntroduction
b0 b1 b2 …….. bm-1 bm
c1 c2 .……. cm
……Input voice data
bi : 切割位置
m : 切割段落數目
ci : 段落中心值
New Token Index
![Page 114: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/114.jpg)
NGSST 2006 冬季講習會
GMM-Tokenizer SystemResults on 45s utterances
GMMTokenizer
3 Lang 6 Lang 11 Lang
No Seg.Seg
90.00%91.67%
62.50%65.83%
49.55%57.73%
![Page 115: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/115.jpg)
NGSST 2006 冬季講習會
Future Work
Try to combine these two systems in order to further improve the performance.
![Page 116: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/116.jpg)
NGSST 2006 冬季講習會
The Future of LID task
![Page 117: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/117.jpg)
NGSST 2006 冬季講習會
LID task in the future 1990s is the golden age of LID
research. Many methods were proposed and had been shown successful during that time. However, some problems should still be further investigated.
![Page 118: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/118.jpg)
NGSST 2006 冬季講習會
LID task in the future
Pre-processing Post-processingLID
SystemSpeech
HypothesizedLanguage
LanguageDependentKnowledge
![Page 119: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/119.jpg)
NGSST 2006 冬季講習會
LID task in the futurepre-processing phase
Segmentation method applied to language identification task. Does language-independent
segmentation method exist ? If exists, it can be further argued that
whether this kind of segmentation is suitable for LID task or not.
![Page 120: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/120.jpg)
NGSST 2006 冬季講習會
LID task in the future More language-specific knowledge.
So far, the most high-performance LID system is heavily based on HUGE amount of training data.
![Page 121: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/121.jpg)
NGSST 2006 冬季講習會
LID task in the future Prosodic based system
No suitable hierarchy for prosody exists so far.
How to model prosodic information ?
![Page 122: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/122.jpg)
NGSST 2006 冬季講習會
LID task in the futurepost-processing phase
Apply decision fusion strategies to combine several systems. System complexity of sub-systems can
be reduced. Fusion strategies are somehow different
from those in the distributed detection problem.
![Page 123: NGSST 2006 冬季講習會 Automatic Language Identification Overview & Some Experiments on OGI-TS Corpus National Tsing Hua University Chi-Yueh Lin 2006/1/19](https://reader030.vdocuments.pub/reader030/viewer/2022032707/56649e375503460f94b2699d/html5/thumbnails/123.jpg)
NGSST 2006 冬季講習會
Thank You !