doko.vn 148346 nghien cuu cac ky thuat trong nhan dang

45
ĐẠI HC QUC GIA HÀ NI TRƯỜNG ĐẠI HC CÔNG NGHĐàm Tiến Dũng NGHIÊN CU CÁC KTHUT TRONG NHN DNG TING NÓI KHOÁ LUN TT NGHIP ĐẠI HC HCHÍNH QUY Ngành: Khoa hc máy tính HÀ NI - 2010

Upload: tinhyeudauth92

Post on 20-Oct-2015

45 views

Category:

Documents


1 download

DESCRIPTION

nhan dang tieng noi

TRANSCRIPT

  • I HC QUC GIA H NI

    TRNG I HC CNG NGH

    m Tin Dng

    NGHIN CU CC K THUT TRONG NHN DNG TING NI

    KHO LUN TT NGHIP I HC H CHNH QUY

    Ngnh: Khoa hc my tnh

    H NI - 2010

  • I HC QUC GIA H NI

    TRNG I HC CNG NGH

    m Tin Dng

    NGHIN CU CC K THUT TRONG NHN DNG TING NI)

    KHO LUN TT NGHIP I HC H CHNH QUY

    Ngnh: Cng ngh thng tin

    Cn b hng dn: TS. L Anh Cng

    H NI - 2010

  • LI CM Nu tin, ti xin by t s bit n su sc ti TS. L Anh Cng (cng tc ti

    trng H Cng Ngh - H Quc gia H Ni), ngi tn tnh theo st hng dn ti trong sut qu trnh thc hin lun vn ny.

    Tip theo, ti xin dnh li cm n chn thnh ti TS. L S Vinh (cng tc ti trng H Cng Ngh - i hc Quc Gia H Ni), ngi nhit tnh gip , cho ti nhng li khuyn hu ch hon thin lun vn.

    Ti xin gi li cm n n c Lng Chi Mai v anh V Tt Thng (cng tc ti vin Khoa hc v Cng ngh Vit Nam), hai ngi sn lng gii p khc mc vchia s kinh nghim cho ti khi gp b tc trong qu trnh nghin cu.

    Ti cng xin gi li cm n n ngi ng nghin cu vi ti, bn H Thanh Tng, ngi st cnh cng ti vt qua rt nhiu kh khn trong thi gian hon lun vn ny.

    Cui cng, ti xin by t s bit n v hn ti cha m ti, cc anh ch ti v cng vi bn b lun bn cnh khuyn khch, ng vin, gip ti vt qua nhng kh khn trong qu trnh thc hin lun vn.

    H Ni, thng 05 nm 2010.

    Sinh vin

    m Tin Dng

  • Nhn dng ting ni l mt lnh vc nghin cu quan trng v c nhiu ng dng trong cuc sng. Cho n nay, c rt nhiu nghin cu v nhn dng ting ni c a ra, v k thut u c im mnh, im yu ring. Trong lun vn ny, ti s gii thiu mt s k thut tiu biu trong nhn dng ting ni, bao gm k thut trch chn c trng MFCC, cc k thut nhn dng bng m hnh Markov n v bng i snh mu. Song song vi vic nghin cu l thuyt, ti cng xy dng mt h thng nhn dng ting ni th nghim da trn cc l thuyt , vi mc ch l kim tra tnh ng n v so snh cc k thut trong nhn dng ting ni. Cui cng, thng qua nghin cu ny, ti xin xut ra mt s phng hng pht trin, nghin cu tip theo, cng vi nhng ng dng ca nghin cu vo cc bi ton trong thc t.

  • MC LCChng 1. M U............................................................................................ 1

    1.1. GII THIU BI TON NHN DNG TING NI ...................... 11.2. MC CH CA NGHIN CU...................................................... 2

    Chng 2. TING NI V S BIU DIN TING NI................................. 32.1. M V V S TO THNH TING NI ....................................... 32.2. PHIN M ........................................................................................ 3

    2.3. BIU DIN TN HIU TING NI TRONG MY TNH ............... 4Chng 3. K THUT TRCH CHN C TRNG MFCC TRONG NHN DNG TING NI ................................................................................................... 5

    3.1. NH NGHA .................................................................................... 53.2. TRCH CHN C TRNG MFCC ................................................ 5

    3.2.1. Pre-emphasis .................................................................................. 6

    3.2.2. Windowing..................................................................................... 6

    3.2.3. DFT (Dicrete fourier transform) ..................................................... 7

    3.2.4. Mel filter-bank and log ................................................................... 8

    3.2.5. DCT (Discrete consinse transform)................................................. 9

    3.2.6. Feature extraction ........................................................................... 9

    Chng 4. K THUT NHN DNG TING NI S DNG M HNH MARKOV N ................................................................................................. 11

    4.1. GII THIU M HNH MARKOV N.......................................... 114.2. NHNG VN CHNH CA HMM........................................... 12

    4.2.1. c lng xc sut m hnh ........................................................ 134.2.2. Nhn dng .................................................................................... 174.2.3. Hun luyn ................................................................................... 17

    4.3. M HNH MARKOV N CHO NHN DNG TING NI.......... 194.3.1. Xy dng m hnh Markov n cho nhn dng ting ni ................ 194.3.2. Phng php tnh xc sut on nhn m hc bj(ot)...................... 224.3.3. Phng php hun luyn nhng (Embedded Training) ................. 23

    4.4. VN TNH TON XC SUT TRONG LP TRNH ............. 25Chng 5. NHN DNG TING NI BNG I SNH MU.................... 27

    5.1. THUT TON SO SNH THI GIAN NG (DTW).................. 275.2. P DNG VO BI TON NHN DNG TING NI............... 28

  • Chng 6. KT QU THC NGHIM............................................................ 296.1. M T THC NGHIM................................................................. 296.2. KT QU ........................................................................................ 306.3. NH GI KT QU .................................................................... 31

    Chng 7. KT LUN ...................................................................................... 327.1. TNG KT NI DUNG.................................................................. 327.2. HNG PHT TRIN.................................................................... 32

    PH LC ................................................................................................. 33TI LIU THAM KHO ......................................................................................... 35

  • DANH MC BNG BIUBng 1: Kt qu ca mt s h thng nhn dng ting ni trn th gii ...............2

    Bng 2: Khi to tham s ? ...............................................................................25Bng 3: Khi to tham s aij..............................................................................25

    Bng 4: T in v phin m cc t trong b s m ting Vit 0-9 ..................29

    Bng 5: Kt qu thc nghim trn b d liu ting ni ng u cao ................30

    Bng 6: Kt qu thc nghim trn b d liu ting ni ng u thp ...............31

  • DANH MC HNH NHHnh 1: S ha tn hiu m thanh.........................................................................4

    Hnh 2: Cc bc trch chn c trng MFCC ....................................................6

    Hnh 3: Ct tn hiu bng ca s trt (window) .................................................7

    Hnh 4: tng quan gia thang o tn s thng v tn s Mel ...........................8

    Hnh 5: HMM vi N=5......................................................................................12

    Hnh 6: HMM vi N=3, T=5 .............................................................................14

    Hnh 7: Tnh xc sut Forward [2].....................................................................15

    Hnh 8: Tnh xc sut Backward [2] ..................................................................16

    Hnh 9: HMM cho t ONE ................................................................................20

    Hnh 10: Bin thin trong m v ah ...................................................................20

    Hnh 11: HMM ba trng thi cho t ONE..........................................................21

    Hnh 12: Ph tn hiu ca 2 t ONE TWO pht m lin tip ..............................21

    Hnh 13: HMM kt hp.....................................................................................22

    Hnh 14: HMM cho t ONE..............................................................................25Hnh 15: m hnh thut ton DTW ....................................................................27

  • BNG K HIU VIT TT

    Vit tt Tn y ngha

    DTW Dynamic Time Warping So snh thi gian ng

    HMM Hidden Markov Model M hnh Markov n

    IPA International Phonetics Alphabet Bng ch ci ng m quc t

    MFCC Mel frequency cepstral coefficients Cc h s ph tn s Mel

    NNs Neural Networks Cc mng n ron

  • 1Chng 1. M UTrong chng ny, ti s nu ln tnh cn thit, mc ch cng vi nhng ngha

    khoa hc, tnh thc tin ca ti nghin cu ny.

    1.1. GII THIU BI TON NHN DNG TING NITrong cuc sng hng ngy, ting ni t nhin chnh l phng tin giao tip n

    gin, hiu qu v thng dng nht gia ngi vi ngi. Ting ni tr nn qu quen thuc i vi con ngi ngay t khi mi lt lng. Tm quan trng ca ting ni trong cuc sng thng ngy l khng th ph nhn. Tuy nhin, ngy nay khi m my mc c khp ni xung quanh chng ta th loi hnh giao tip c bn nht gia con ngi v my mc li l cc dng lnh, cc ch th thng qua vic g bn phm. Cc dng lnh thng rt my mc v kh nh i vi con ngi, ng thi thao tc bng tay cng thng chm hn so vi vic s dng ting ni. Hy th tng tng, cuc sng s tr nn d dng th no nu chng ta c th giao tip vi my mc bng ting ni t nhin. Khi , chng ta c th son tho vn bn bng ting ni, quay s in thoi bng ting ni, hay tm kim thng tin trn Internet bng ging ni ch khng qua thao tc bng tay.

    Nhn dng ting ni, ng trn quan im hc my l mt bi ton nhn dng mu phc tp. Mc ch ca bi ton ny l phn lp tn hiu ting ni u vo thnh mt dy lin tip cc mu c hc sn. Trong mt mu c th l mt t, hoc mt m v (n v nh nht c th phn bit c cu to nn t). V c bn, bi ton nhn dng ting ni c chia ra thnh cc loi sau[19]:

    - Nhn dng ting ni ri rc/lin tc

    - Nhn dng ting ni ph thuc ngi ni/khng phc thuc ngi ni

    - Nhn dng ting ni vi b t vng nh/b t vng ln

    Lnh vc nhn dng ting ni ang ngy cng c quan tm hn trong nhng nm gn y. Nhiu l thuyt c xy dng, nh k thut trch chn c trng LPC hay MFCC, cc k thut nhn dng nh m hnh Markov n (HMM), cc mng n ron (NNs) hay so snh thi gian ng (DTW). Bng di y ch ra kt qu t c ca mt s h thng nhn dng ting ni hin thi trn th gii [7]:

  • 2Bng 1: Kt qu ca mt s h thng nhn dng ting ni trn th gii

    H thng Lng t vng T l li (%)

    TI Digits 11 (b s ting Anh) 0.5

    Wall street Journal read speech 5.000 3

    Wall street Journal read speech 20.000 3

    Broadcast News 64.000+ 10

    Conversation Telephone Speech (CTS) 64.000+ 20

    Mc d c rt nhiu l thuyt c a ra, tuy nhin nhng g t c vn l cha ting ni c th thay hon ton nhng dng lnh trong giao tip gia ngi vi my. Tuy nhin, nhng thnh tu t c cng gip con ngi gii quyt kh nhiu bi ton trong cuc sng. Mt s my in thoi di ng c th cho php quay s t ng khi ngi dng c tn ngi cn gi trong danh b vo. Con ngi c th iu khin s hot ng ca r bt bng ging ni, mc d nhng ch th thng ngn gn v nm trong mt tp hu hn cc ch th c hun luyn.

    1.2. MC CH CA NGHIN CUNgy nay, hng nghin cu ch o trong lnh vc nhn dng ting ni l cc

    k thut tch bit h thng nhn dng ting ni ra khi s ph thuc vo ngi ni, b t vng hay mi trng. Vit Nam hin nay, cc nghin cu v nhn dng ting ni khng nhiu, v thng tp trung vo cc tp t vng nh nhm gii quyt mt s bi ton thc t nht nh.

    Da trn nhng l thuyt c xy dng, nghin cu ny c thc hin nhm t c nhng mc ch sau y:

    - Tm hiu cc k thut trong nhn dng ting ni, trong tp trung vo hai phn chnh l k thut trch chn c trng MFCC v k thut nhn dng s dng m hnh Markov n.

    - T ci t h thng nhn dng ting ni trn tp s m ting Vit t 0 n 9bng m hnh Markov n v bng phng php i snh mu, da trn phng php trch chn c trng MFCC. Sau so snh cc kt qu t c a ra kt lun v cc phng php trn.

    - ra nhng nh hng pht trin tip theo sau nghin cu ny.

  • 3Chng 2. TING NI V S BIU DIN TING NICc h thng nhn dng nh nhn dng ch vit tay hay nhn dng ting ni u

    c gng m phng tt nht qu trnh m vt cn nhn dng c hnh thnh trong thc t. Chnh v vy ,trc khi i vo tm hiu cc k thut trong nhn dng ting ni, chng ta cn nm c mt s kin thc v ting ni nh cch mt ting ni c to thnh hay cch biu din ting ni.

    2.1. M V V S TO THNH TING NITrong ng m hc, m v l mt n v phn on nh nht ca ting ni c

    dng to nn cc t c ngha. Ni cch khc, m v l n v nh nht c th phn bit ca ting ni. Nh vy, mt t c pht m ra thc cht l s kt hp ca mt tp cc m v lin tip nhau.

    Ting ni ca con ngi c hnh thnh nh s kt hp ca cc b phn trong b my pht m nh li, hng, mi, rng, mi Khi nhng b phn nhng v tr khc, cc m thanh khc nhau s c to thnh. Chnh v vy, chng ta hon ton c th phn bit m ny vi m khc trn c s nh gi cch kt hp ca cc c quan trong b my pht m, hay v tr ca chng khi pht m.

    2.2. PHIN M 1

    T chc ng m quc t (International Phonetic Association), pht minh ra bng ch ci ng m quc t (International Phonetic Alphabet), vit tt l IPA, da trn cc c tnh ca m v v s to thnh ting ni. IPA l mt h thng chun cc k hiu bng ch latin c dng biu din ting ni, trong mi k hiu tng ng vi mt m v. Vic biu din ting ni bng cc k hiu ny c gi l phin m. V d, t PEN trong ting Anh s c phin m l /p n/. Phin m, ni cch khc, chnh l s biu din ca ting ni di dng vn bn.

    Tuy nhin, trong my tnh, cc ngn ng lp trnh khng th biu din c ht cc k hiu latin, v vy cc k hin trong IPA s c biu din bng mt k t hoc mt nhm k t ASCII vit lin nhau. Trong ting Anh M (American English), bng ch ci ARPABET c s dng biu din phin m, trong mi k t hoc nhm k t ASCII trong ARPABET s tng ng vi mt k hiu latin trong IPA.V d, t PEN khi s dng ARPABET s c phin m l /p eh n/.

    1 Tham kho Chng 6, Speech and Language Processing [7] v Chng 5, Pattern Recognition in

    Speech and Language Processing[4]

  • 42.3. BIU DIN TN HIU TING NI TRONG MY TNHTn hiu m thanh ni chung, v ting ni ni ring c th c m phng di

    dng th c trc honh l thi gian, trc tung l cng . Gi tr ti mt im trn th s l cng ca m thanh ti mt thi im nht nh.

    Tn hiu m thanh ngoi i thc l tn hiu lin tc, hay tn hiu tng t(Analog signal). Trc khi thc hin bt c bc x l no, tn hiu m thanh cn c s ha. Vic ny c thc hin t ng bi cc thit b thu m, bng cch lygi tr ca tn hiu u vo ti cc thi im khc nhau, hay cn gi l ly mu. Tn hiu m thanh sau khi c ly mu tr thnh tn hiu s, l tn hiu ri rc.

    Hnh 1: S ha tn hiu m thanhNh vy, mt tn hiu m thanh bt k khi c a vo my tnh, l mt tp

    cc mu lin tip nhau, mi mu l gi tr bin ca tn hiu ti mt thi im nht nh. Mt tham s quan trng trong vic ly mu tn hiu m thanh l tn s ly mu, Fs, tc l s mu c ly trong mt giy. c th o lng chnh xc, cn phi ly t nht 2 mu trong mt chu k ca tn hiu tng t u vo. Nh vy, tn s ly mu phi ln hn 2 ln tn s cao nht ca tn hiu m thanh u vo. Tuy nhin, trn thc t tai ngi ch c th nhn bit c cc m thanh c tn s nh hn 10.000Hz, do tn s ly mu l 20.000Hz l cho vic nhn dng vi chnh xc cao. Trong lnh vc nhn dng ting ni qua in thoi, tn s ly mu ch cn l 8.000Hz v ch c cc tn hiu c tn s nh hn 4.000Hz c truyn i bi in thoi. Cc thit b thu m th thng c tn s ly mu l 16.000Hz.

  • 5Chng 3. K THUT TRCH CHN C TRNG MFCC TRONG NHN DNG TING NI 2

    Trong cc bi ton nhn dng mu ni chung, phng php trch chn c trngng vai tr quyt nh trong s chnh xc ca bi ton. Chnh v vy, la chn mt phng php trch chn c trng tt l iu cn c quan tm c bit. Trong chng ny, ti s gii thiu mt phng php trch chn c trng ph bin v hiu qu trong nhn dng ting ni, trch chn c trng MFCC.

    3.1. NH NGHATrch chn c trng i vi nhn dng ting ni l vic tham s ha chui tn

    hiu m thanh u vo, bin i tn hiu m thanh thnh mt chui cc vector c trng, mi vector c trng bao gm n gi tr thc (n ph thuc vo cch trch chn c trng) [6]. Hin nay, c rt nhiu phng php trch chn c trng nh: LPC(Linear predictive coding), AMDF(Average magnitude different function),

    MFCC(Mel-frequency cepstral coefficients), hoc kt hp ca cc phng php trn. Phn tip theo s gii thiu c th v phng php trch chn c trng MFCC.

    3.2. TRCH CHN C TRNG MFCCTrong nhn dng ting ni, k thut trch chn c trng MFCC l phng php

    ph bin nht. MFCC l vit tt ca Mel-frequency cepstral coefficients. K thut ny da trn vic thc hin bin i chuyn d liu m thanh u vo v thang o tn s Mel, mt thang o din t tt hn s nhy cm ca tai ngi i vi m thanh.

    K thut trch chn c trng ny gm cc bc bin i lin tip nh trong hnh 2, trong u ra ca bc bin i trc s l u vo ca bc bin i sau. u vo ca qu trnh trch chn c trng ny s l mt on tn hiu ting ni c ri rc ha. Chi tit ca tng bc x l s c gii thiu trong phn tip theo y.3

    2 Mt s ni dung trong chng ny c nghin cu cng sinh vin H Thanh Tng trong kha lun

    Nghin cu cc c trng trong nhn dng ting ni ting Vit, nm 2010.3 Tham kho Chng 9, Speech and Language Processing [7]

  • 6Hnh 2: Cc bc trch chn c trng MFCC3.2.1. Pre-emphasis

    y l bc u tin ca trch chn c trng MFCC, c thc hin bng cch tng cng ca nhng tn s cao ln nhm lm tng nng lng vng c tn s cao. Vic tng cng ca vng tn s cao ln lm cho thng tin r rng hn vi m hnh m hc v tng chnh xc ca vic nhn dng tng mu m.

    3.2.2. Windowing

    nng cao chnh xc ca nhn dng ting ni, mi mt t trong on hi thoi s c phn tch thnh cc m v, v h thng s nhn dng tng m v. Do , cc c trng cn phi c trch chn trn tng m v, thay v trn c on hi thoi di. Windowing l vic ct on tn hiu m thanh u vo ra thnh cc mu tn hiu c thi lng nh, gi l cc frame. Mi frame ny sau s c nhn dng nh mt m v. Mt l do khc cho thy s cn thit ca vic windowing l v tn hiu m thanh thay i rt nhanh, do cc thuc tnh nh bin , chu k s khng n nh. Khi tn hiu m thanh c ct ra thnh nhng on nh th mi on, c th coi tn hiu l n nh, cc c trng ca tn hiu l khng i theo thi gian.

    thc hin vic ny, chng ta s dng mt ca s (window) chy dc tn hium thanh v ct ra cc on tn hiu nm trong ca s . Mt ca s c nh ngha bng cc thng s:

    - Frame size: rng ca ca s, cng l ln ca frame tn hiu s c ct ra.

    - Frame shift: bc nhy ca ca s, l di on m ca s s trt ct ra frame tip theo.

  • 7Hnh 3: Ct tn hiu bng ca s trt (window)Mi frame sau s c nhn vi mt h s, gi tr ca h s ny ty thuc vo

    tng loi ca s. ? [? ] = ? [? ] ? [? ] x[n] l gi tr ca mu th n, y[n] l gi tr ca mu th n sau khi nhn vi h s, w[n] l h s cho mu th n.

    Loi ca s n gin nht l ca s Rectangular, gi tr ca cc h s w[n] c cho bi cng thc sau:? [? ] = ?1 ?? 0 ? ? 10 ??? ? ? ?? , ? ? ? ? ? ???? ? ? ? ? ?? ? ?

    Mt loi ca s khc thng dng hn trong trch chn c trng MFCC l ca sHamming. Trong loi ca s ny, gi tr ca tn hiu s gim dn v 0 khi tin dn ra 2 bin ca frame. Biu thc h s ca ca s hamming l:

    ? [? ] = ?0.54 0.46 cos ?2? ?? ? ?? 0 ? ? 10 ??? ? ? ?? 3.2.3. DFT (Dicrete fourier transform)

    Bc bin i tip theo l thc hin bin i Fourier ri rc i vi tng mu tn hiu c ct ra. Qua php bin i ny, tn hiu s c a v khng gian tn s.

    Cng thc ca bin i Fourier l:

  • 8? [? ] = ? ? [? ]??2?? ? ?? 1? =0 Trong x[n] l gi tr ca mu th n trong frame, X[k] l mt s phc biu din

    cng v pha ca mt thnh phn tn s trong tn hiu gc, N l s mu trong mt frame. Thng thng ngi ta s dng bin i FFT (Fast fourier transform) thay v DFT. Bin i FFT nhanh hn nhiu so vi bin i DFT, tuy nhin thut ton ny i hi gi tr N phi l mt ly tha ca 2.

    V X[k] l mt s phc nn n s c ly gi tr tuyt i:? [? ] = |? [? ]| = ? ??? ?(? [? ])2 + ?? ? ? ?? ? ?? (? [? ])23.2.4. Mel filter-bank and log

    Kt qu ca qu trnh bin i Fourier th hin nng lng ca tn hiu nhng di tn s khc nhau. Tuy nhin, tai ca ngi li khng c s nhy cm nh nhau i vi mi di tn s. Do vic m hnh ha tnh cht ny ca tai ngi trong qu trnh trch chn c trng lm tng kh nng nhn dng ca h thng. Trong m hnh trch chn c trng MFCC, tn s s c chuyn sang thang o tn s mel theo cng thc: ?? ? ? = 2595 lg ?1 + ?700?

    Trong f l tn s thang o thng, fmel l tn s thang o mel. Ngi ta s dng cc bng lc tnh cc h s mel. S dng bao nhiu bng lc th s cho ra by nhiu h s mel, v cc h s mel ny s l u vo cho qu trnh tip theo ca trch chn c trng MFCC. Hnh v sau biu din m hnh cc bng lc trong thang o tn s bnh thng v thang o mel [11]:

    Hnh 4: tng quan gia thang o tn s thng v tn s Mel

  • 9Vi M bng lc, cc h s Mel c tnh nh sau:? [? ] = log ?? ? [? ]? ? [? ]?? =0 ?, ? = 0,1,2, , ? Hm[k] l trng s cho gi tr th k trong frame, i vi h s th m, v c tnh bng cng thc [13][16]:

    ? ? [? ] =

    0, ? < ? [? 1] 2(? ? [? 1])(? [? + 1] ? [? 1])(? [? ] ? [? 1]) , ? [? 1] ? ? [? ]2(? [? + 1] ? )(? [? + 1] ? [? 1])(? [? + 1] ? [? ]) , ? [? ] ? ? [? + 1] 0, ? > ? [? + 1]

    3.2.5. DCT (Discrete consinse transform)

    Bc tip theo trong qu trnh trch chn c trng l thc hin bin i cosine ri rc DCT. Sau khi thc hin bin i Fourier th tn c chuyn v khng gian tn s. Vic lc tn s bng cc bng lc Mel gip c ng mi gii tn s v mt h s thc. Cc h s ny th hin cc c trng v tn s c bn, xung m thanh. Tuy nhin, cc c trng ny khng quan trng i vi vic phn bit cc m khc nhau. Thay vo , cc c trng v b my pht m nh v tr ca khoang ming, khoang mi, li rt quan trng cho h thng nhn dng. Vic thc hin bin i consine s lm tch bit cc c trng v b my pht m v ngun m. Trn thc t, ch c khong 12 h s u tin sau khi thc hin bin i DCT l cn thit cho nhn dng ting ni. Biu thc ca bin i DCT nh sau [14]:

    ?[? ] = ? ln?S?[i]? cos ?n2L (2i + 1)?? 1?=0 3.2.6. Feature extraction

    Cc h s mel thu c sau khi thc hin bin i DCT s c ly lm c trng. Thng thng, ch 12 h s u tin c ly lm c trng cho nhn dng v cc h s ny l nhn ra s khc bit khi hai m khc nhau c pht m.

    c trng th 13 l c trng v nng lng ca tn hiu. c trng nng lng c th c trch chn ngay sau bc windowing, v c tnh bng cng thc:

    ? ? ??? ? = ? ? 2[? ]?? =0

  • 10

    Nhng c trng va trch chn c gi l cc c trng gc. Vi 13 c trnggc , chng ta thm vo 13 c trng delta, th hin tc thay i ca tn hiu gia cc frame. Cc c trng delta c tnh bng cng thc:? ?[? ] = ??+1[? ] ??1[? ]2

    Trong ct[n] l c trng th n trong 13 c trng gc ca frame th t, dt[n] l c trng delta th n ca frame th t.

    tng chnh xc ca nhn dng, c th thm 13 c trng double delta, th hin gia tc thay i ca tn hiu gia cc frame. Cc c trng double delta c tnh nh cc c trng delta, khi coi cc c trng delta l c trng gc.

    Nh vy chng ta trch chn c tng cng l 39 c trng. Mi frame sau khi bin i s cho ra mt vector c trng 39 chiu, mi chiu l mt gi tr thc. Cc vector ny s c s dng trc tip cho cc qu trnh hun luyn v nhn dng.

  • 11

    Chng 4. K THUT NHN DNG TING NI S DNG M HNH MARKOV N

    Nu coi phng php trch chn c trng l iu kin cn cho s chnh xc ca h thng nhn dng ting ni, th iu kin chnh l m hnh nhn dng. Trong phn ny ti s gii thiu chi tit v k thut nhn dng ting ni s dng m hnh Markov n, mt m hnh c p dng thnh cng trong mt s phn mm nh b cng c nhn dng ting ni HTK4 hay b nhn dng ting ni Sphinx [17].

    4.1. GII THIU M HNH MARKOV NL thuyt v nhn dng ting ni ang ngy cng pht trin, tuy nhin mt h

    thng nhn dng ting ni khng b ph thuc vo ngi ni hay cc iu kin mi trng phi rt lu na mi c th t c. C nhiu phng php cho nhn dng ting ni c a ra nh: mng neural (Neural networks), DTW (Dynamic Time Warping), m hnh Markov n (Hidden Markove Model) Trong , k thut nhn dng ting ni s dng m hnh Markov n l mt hng tip cn tim nng, c bit l i vi nhn dng ting ni lin tc vi b t vng ln.

    M hnh Markov n c nghin cu t rt lu, v cng trnh nghin cu thnh cng u tin v HMM c thc hin bi Baum vo cui nhng nm 60, u nhng nm 70. HMM l mt phng php hc my da trn thng k, trong h thng c m hnh bi mt tp cc trng thi (hay nhn) n v cc xc sut chuyn trng thi. Vi mi mt chui cc quan st u vo, m hnh s tm ra mt chui cc trng thi c xc sut cao nht tng ng vi chui quan st .

    Mt HMM c xc nh bi cc yu t sau:5

    - Q = q1q2...qT: l tp cc trng thi.

    - A = a11a12a1NaN1aNN: l ma trn xc sut chuyn trng thi, trong mi gi tr aij biu din xc sut chuyn t trng thi i sang trng thi j v ? ?? =??=11 ?.

    - O = o1o2oT: l chui gm T quan st.

    - B = bi(ot): l tp cc likelihood ca cc quan st, hay cn c gi l xc sut sinh, trong mi gi tr bi(ot) biu din xc sut ca quan st ot c sinh ra t trng thi i. 4 Website: http://htk.eng.cam.ac.uk 5 Tham kho Chng 6, Speech and Language Processing [7]

  • 12

    - q0, qend: l cc trng thi bt u v trng thi kt thc. 2 trng thi l lun c bt k HMM no v khng lin h vi chui quan st.

    - ? ? ? ? ? xc sut khi to trng thi, ? t trng thi i v tr u tin v ? ? = 1??=1 .Mt HMM c nh ngha bi cc tp tham s A, B, ? ng thi v T quan

    st c th c biu din nh sau: ? = (? , ? , ? )

    Hnh 5: HMM vi N=5M hnh Markov n trn c xy dng trn c s cng nhn cc gi thuyt

    sau[7]:

    - Xc sut chuyn ti mt trng thi tip theo ch ph thuc vo trng thi hin ti.

    - ? (? ?+1|?1 ??) = ? (? ?+1|? ?)- Xc sut chuyn trng thi l khng ph thuc vo thi gian

    - ? ?? = ? (? ?+1 = ?|? ? = ?) ?- Xc sut ca mt quan st u ra ot ch ph thuc vo trng thi sinh ra quan

    st qt, khng ph thuc vo bt k trng thi hay quan st no khc.

    - ? (??|?1, , ??, , ?? , ?1, , ??, , ?? ) = ? (??|? ? = ?)4.2. NHNG VN CHNH CA HMM

    C ba vn chnh cn c cp n khi s dng mt h thng HMM, l[2]:

    - c lng xc sut m hnh:

    Cho trc: m hnh = (A, B, ? ), cng chui quan st O = o1o2oT, Cn lm: tnh xc sut ca chui quan st vi m hnh P(O| )

    - Nhn dng

    Cho trc: m hnh = (A, B, ? ), cng chui quan st O = o1o2oT,

  • 13

    Cn lm: tm dy trng thi ti u Q = q1 q2 qT c kh nng cao nht sinh ra chui quan st O, vi m hnh cho trc

    - Hun luyn

    Cho trc: m hnh = (A, B, ? ), cc chui quan st cho hun luyn: ? ? = ?1? ?2? ??? , trong Ok l dy quan st cho v d hun luyn th k Cn lm: khp li cc tham s ca m hnh c xc sut P(O| ) ln

    nht

    Trong phn tip theo ti s gii thiu c th v cch gii quyt cc vn c a ra trn y.

    4.2.1. c lng xc sut m hnh 6

    Vn cn quan tm u tin khi s dng HMM l vic c lng xc sut ca m hnh. l xc sut on nhn mt dy quan st ca mt m hnh = (A, B, ? )cho trc, k hiu l P(O|). Ta c cc cng thc:

    ? (? |?) = ? ? (? ?, ? ?|?)??=1 = ? ? (? ?|? ?, ? )?

    ?=1 ? (? ?|? )? (? ?|? ?, ? ) = ? ? ????|? ?????=1? (? ?|?) = ? ? 1 ? ? ? ?? ?+1? 1?=1

    Trong , p l s kh nng c th c ca dy Q = q1q2...qT, on nhn dy quan st O bi m hnh . Xt mt v d n gin, mt HMM gm c 3 trng thi v chui quan st O = o1o2o3o4o5 gm 5 quan st. Hnh di y m t m hnh ny, trong hng dc l cc trng thi, hng ngang l cc quan st, mt quan st s c on nhn bi mt trong 3 trng thi. Xt mt trng hp, dy Q1= q1q1q2q3q3 c th hin bi ng i nt m trong hnh v di y.

    6 Tham kho The Concepts of Hidden Markov Model in Speech recognition Systems[2] v A Tutorial

    on Hidden Markov Models and Selected Applications in Speech Recognition[12]

  • 14

    Hnh 6: HMM vi N=3, T=5Khi , xc sut P(O1|Q1,) v P(Q1|) s c tnh nh sau:

    ? (? 1|? 1, ? ) = ? ? (??1|? ?1)5?=1 = ?1(?1)?1(?2)?2(?3)?3(?4)?3(?5)? (? 1|?) = ? 1 ? ? ? ?? ?+14?=1 = ? 1? 11? 12? 23? 33

    ? (? 1|? 1, ? )? (? 1|?) = ?1(?1)?1(?2)?2(?3)?3(?4)?3(?5)? 1? 11? 12? 23? 33Vi cc tnh ton trn, chng ta s phi tnh xc sut ca tt c cc ng i c

    th c. Vi N trng thi n v T quan st th c ti NT ng i, nh vy s xc sut phi tnh s tng ln theo cp s m khi N v T tng ln. Trong cc bi ton thc t, s trng thi n v quan st thng ln, do cch tnh ton nh ny l khng hiu qu.

    4.2.1.1. Thut ton ForwardMt thut ton quy hoch ng c kh nng tnh ton hiu qu hn c a

    ra: thut ton Forward. Xc sut ca dy quan st P(O|) s c tnh ton bng cch cng tt c cc xc sut ca tt cc cc ng i c th sinh ra dy quan st, nhng thut ton ny hot ng hiu qu bi v n gp hon ton mi ng i vo mt li forward (forward trellis) duy nht.

    Trong thut ton forward, mt mng NxT s c to ra, trong gi tr ? ?(?)biu din xc sut v tr quan st th t v c trng thi n th i, c cng thc nh sau:? ?(?) = ? (?1, ?2, . . , ?? , ? ? = ?|? )

  • 15

    Hnh 7: Tnh xc sut Forward [2]Gi tr ? ?(?) c tnh bng cch cng tt c cc xc sut ca cc ng i c

    th dn ti v tr . Thut ton Forward c m t nh sau:

    1, Khi to: ? 1(?) = ? ???(?1), 1 ? ?2, Quy np:

    ? ?+1(?) = ?? ? ?(?)? ????=1 ? ?? (??+1), ?1 ? ? 11 ? ? 3, Kt thc:

    ? (? |? ) = ? ? ? (?)??=14.2.1.2. Thut ton Backward

    Thut ton Backward cng ging vi thut ton Forward, tuy nhin ng i ca quan st s c xt theo chiu ngc li, t quan st th T cho n quan st u tin. Do xc sut ti mi mt v tr s l tng cc xc sut ca tt c cc ng i t cui tr v n v tr . Trong thut ton ny, xc sut v tr quan st th t v trng thi n th i s c k hiu l ? ?(?), trong :

  • 16

    ? ?(?) = ? (??+1, ??+2, , ?? , ? ? = ?, ? )

    Hnh 8: Tnh xc sut Backward [2]Thut ton Backward c m t nh sau:

    1, Khi to: ? ? (?) = ? ?? , 1 ? ?2, Quy np:

    ? ?(?) = ? ? ???? (??+1)? ?+1(?)??=1 , ?? = ? 1, ? 2, ,11 ? ? 3, Kt thc:

    ? (? |? ) = ? ? 1??? (?1)? 1(?)??=1 phc tp ca c hai thut ton trn u l O(N2T). Ngoi ra, xc sut ca dy

    quan st P(O|) cn c th c tnh bng thut ton Forward-Backward, l kt hp ca 2 thut ton trn, bng cng thc:

    ? (? |? ) = ? ? ?(?)? ?(?)??=1

  • 17

    4.2.2. Nhn dngNhim v ca nhn dng l tm ra mt chui trng thi n tt nht cho mt chui

    quan st cho trc, i vi mt m hnh = (A, B, ? ) cho trc. C nhiu thut ton lm vic ny, tuy nhin trong phn ny ti s ch gii thiu thut ton Viterbi, mt thut ton quy hoch ng ni ting cho nhn dng i vi m hnh HMM.

    Thut ton Viterbi c cch hot ng ging vi thut ton Forward, i t tri sang phi ca dy quan st lp y mt mng NxT. Gi vt(j) l gi tr ti hng j, ct t. Gi tr ca vt(j) chnh l xc sut ln nht trong s cc xc sut ca cc ng i dn n v tr . Thut ton s c thc hin mt cch quy, gi tr vt(j) s c tnh da vo cc gi tr trc nh sau:? ?(?) = max1?? 1 ? ?1(?)? ????(??)

    Kt qu nhn dng s l ng i c xc sut cao nht, t u ti cui m hnh.

    4.2.3. Hun luynMc ny s cp n vn hun luyn HMM, c coi l phn kh nht trong

    ba vn c nu ra phn ny. Nhim v ca vic hun luyn HMM l iu chnh cc tham s m hnh (A, B, ? ) t c mt m hnh ti u nht cho cc mu hun luyn. C nhiu k thut c a ra cho vn ny, tuy nhin trong mc ny ti s ch gii thiu mt k thut hun luyn kh thng dng, l k thut hun luyn s dng thut ton Baum-Welch[18], hay cn gi l thut ton Forward-Backward mt trng hp ring ca thut ton ti u ha k vng (Expectation Maximization Algorithm)[5]. Thut ton ny da trn phng php lp t c cc t a phng ca hm xc sut P(O|). Trong mi vng lp, cc tham s ca m hnh s c iu chnh li, v m hnh mi s tt hn m hnh c, nh c chng minh bi Baum v nhiu ngi khc. Thut ton s dng li khi gp iu kin hi t, tc l khi xc sut m hnh P(O|) khng tng na hoc tng rt t, hoc khi gp phi iu kin ti hn ca tnh ton. M hnh lun lun hi t, tuy nhin ch c th m bo gi tr t c ca P(O|) l mt cc i a phng.

    Trc khi i vo thut ton c th, cn nh ngha hai xc sut: t(i) v t(i,j). u tin l xc sut t(i), c ngha l xc sut trng thi i ti quan st th t, vi mt dy quan st v m hnh cho trc:??(?) = ? (? ? = ?|? , ? ) = ? (? ? = ?, ? |? )? (? |? )

  • 18

    V ? (? ? = ?, ? |? ) = ? ?(?)? ?(?) v ? (? |? ) = ? ?(?)? ?(?)??=1 nn ta c:??(?) = ? ?(?)? ?(?) ? ?(?)? ?(?)??=1Xc sut th hai l t(i,j), c ngha l xc sut trng thi i ti quan st th t v

    trng thi j ti quan st th t+1, vi m hnh v dy quan st O cho trc:

    ?(?, ?) = ? (? ? = ?, ? ?+1 = ?|? , ? ) = ? (? ? = ?, ? ?+1 = ?|? , ? )? (? |?)? (? |? )Theo cng thc Bayes ta c:? (? ? = ?, ? ?+1 = ?|? , ? )? (? |?) = ? (? ? = ?, ? ?+1 = ?, ? |? )S dng cc thut ton Forward v Backward, ta tnh c:? (? ? = ?, ? ?+1 = ?, ? |? ) = ? ?(?)? ???? (??+1)? ?+1(?)Do vy ta c:

    ?(?, ?) = ? ?(?)? ???? (??+1)? ?+1(?) ? ?(?)? ?(?)??=1Nu ta cng t(i) i vi tt c cc gi tr ca t (ngoi tr t=T), ta s thu c mt

    kt qu l gi tr k vng v s ln m trng thi i on nhn mt quan st trn tt c cc quan st. Mt khc, nu ta cng t(i,j) i vi tt c cc gi tr ca t (ngoi tr t=T), ta s thu c mt gi tr k vng v s ln trng thi i chuyn sang trng thi j.T nhng nh gi trn, cc tham s ca m hnh s c tnh ton li nh sau:??? = ?? ?? ? ? ? ?? ?1 = ?? = ?1(?)???? = ?? ?? ? ? ? ?? ?? ?? ? ? ?? ? ? ???? ?? ? ? ? ?? ?? ? ? ? ?? = ?(?, ?)? 1?=1 ??(?)? 1?=1 = ? ?(?)? ?? ??(??+1)? ?+1(?)? 1?=1 ? ?(?)? ?(?)? 1?=1???(? ? ) = ?? ?? ? ? ? ?? ??? ? ?? ? ? ?? ? ? ? ? ? ? ?? ?? = ? ? ??? ?? ? ? ? ?? ??? ? ?? ?? = ??(?)??=1,? ?=? ? ??(?)??=1 = ? ?(?)? ?(?)?:? ?=? ? ? ?(?)? ?(?)??=1

  • 19

    y, du ^ th hin cc tham s mi sau khi iu chnh li. Sau khi cp nht li cc tham s nh trn, chng ta s thu c mt m hnh mi ? ph hp hn m hnh c , i vi dy quan st O: ? ?? |? ? > ? (? |? )

    C th thut ton Baum-Welch c m t nh sau:

    1. Khi to A, B v ? .2. Lp:

    - Bc k vng: tnh cc gi tr t(i) v t(i,j).

    - Bc ti u ha: tnh li cc tham s A, B v ? .C th v thut ton Baum-Welch trong bi ton nhn dng ting ni s c m

    t trong phn 4.3.3.

    4.3. M HNH MARKOV N CHO NHN DNG TING NI

    4.3.1. Xy dng m hnh Markov n cho nhn dng ting niM hnh miu t trn l mt dng c bit ca HMM, thng c s dng

    trong nhn dng ting ni. M hnh ny c gi l HMM tri-sang-phi (left-to-right HMM hay Bakis Network) bi v cc trng thi c di chuyn t tri sang phi, v khng tn ti bc chuyn t trng thi c s th t cao hn n mt trng thi c s th t thp hn (aij = 0 nu i > j). M hnh ny c p dng trong bi ton nhn dng ting ni v n c th biu din tt dng ting ni theo thi gian.

    Trn thc t, c th tn ti nhiu cch biu din khc nhau ca HMM cho bi ton nhn dng ting ni, ty thuc vo cch phn tch bi ton. M hnh c miu t trn y ch l mt trong s . Nh trnh by Chng 2, mt t c pht m ra c th coi l s kt hp ca cc m v lin tip nhau. V d nh t ONE, phin m theo chun IPA l w-ah-n, khi pht m t ny th ngi ni s ln lt pht m cc m v w, ah v n. Tip , trong chng 3, chng ta cp n vn trch chn c trng bng phng php MFCC, trong mi mt d liu ting ni sau khi trch chn c trng s cho ta mt chui cc vector c trng lin tip nhau. Trong mi vector bao gm cc gi tr c trng cho d liu ting ni trong mt khong thi gian ngn nht nh. Chnh v vy, trong bi ton nhn dng ting ni, chng ta coi cc trng thi n l cc m v, cc quan st l cc vector c trng i vi mt HMM, v cc HMM s c xy dng cho tng t mt. Kt qu ca qu trnh nhn dng s cho ta mt

  • 20

    chui cc trng thi tng ng vi chui quan st, v t chui trng thi chng ta c th xc nh t c pht m l t g.

    Hnh 9: HMM cho t ONENh ni trn, trong m hnh Markov n tri-sang-phi cho t ONE ny,

    khng tn ti cc ng i ngc ca trng thi n. Loi m hnh ny l ph hp v khi mt ngi pht m t ONE, ngi s bt buc phi pht m ln lt tng m v w-ah-n. Nu th t cc m v l w-n-ah th t c pht m s bin thnh mt t khc.Trong HMM cho nhn dng ting ni, tn ti bc chuyn t mt trng thi n chnh n, cho php mt trng thi c th c lp li nhiu ln. Trong ting ni t nhin, di ca mt t hay mt m v lun thay i, do bc t chuyn ny cho php m hnh c th ph hp vi nhng d liu ting ni u vo c thi gian bin thin.

    i vi nhng trng hp nhn dng ting ni n gin, c s lng t vng t, nh nhn dang b s m t 0-9 th mt trng thi n trong HMM biu din mt m v l hp l. Tuy nhin, trong nhn dng ting ni lin tc vi b t vng ln, cn c mt cch biu din mn hn. Trong ting ni t nhin, mt m v c th ko di n 100 frame (nh ngha frame c nu mc 3.2.2), iu ny khin cho xc sut t chuyn trng thi s rt ln, lm gim chnh xc khi nhn dng. Thm vo , cc c trng ca m nh nng lng m cng bin thin kh nhiu trong mt m v. Chnh v s khng ng u ca m v ny, trong cc h thng nhn dng ting ni lin tc vi b t vng ln s dng m hnh HMM, mt m v thng c m hnh bi nhiu hn mt trng thi n, m c th l ba trng thi n c lp. iu cng c ngha l mt trng thi n trong HMM s biu din mt mu m v, c th l phn u, gia hoc cui ca mt m v. Chng ta tm gi m hnh ny l HMM ba trng thi.[7]

    Hnh 10: Bin thin trong m v ah

  • 21

    chuyn t HMM n gin nh nu phn u sang HMM ba trng thi, chng ta ch cn thay mi trng thi n bng 3 trng thi n u, gia v. Trng thi cui ca mt m v s c ni vi trng thi u ca m v tip theo. Hnh v sau u m t HMM ba trng thi, trong cc ch ci b, m, f tng ng vi cc trng thi u, gia v cui ca mt m v.

    Hnh 11: HMM ba trng thi cho t ONEHMM c m t trn y c xy dng cho tng t trong b t vng. Chnh

    v vy m hnh ny ch c th p dng nhn dng i vi u vo l tng t ring bit. Chng ta cn phi c nhng k thut b sung h thng c th nhn dng vi d liu ting ni lin tip. Di y, ti s gii thiu hai phng php nhn dng ting ni lin tc:

    Phng php u tin l ct d liu ting ni lin tc u vo thnh tng t, sau nhn dng i vi tng t ring bit vi m hnh trn. Da trn c tnh ca ting ni lin tc, l gia cc t thng c mt khong lng, tc l khong thi gian nh m ngi ni ngng li pht m t tip theo. Nh vy mt tn hiu m thanh u vo s c ct ra ti nhng khong lng, thnh nhiu tn hiu m thanh nh khc nhn dng. Hnh v bn di th hin tn hiu m thanh sau khi c phn tch ph (thc hin bin i Fourier), chng ta c th thy kh r khong lng gia hai t ONEv TWO.

    Hnh 12: Ph tn hiu ca 2 t ONE TWO pht m lin tipTrong phng php th hai, mt HMM mi s c xy dng da trn cc

    HMM c xy dng cho tng t. M hnh mi ny s l kt hp ca tt c cc m

  • 22

    hnh c, bng cch t tt c cc HMM cho tng t li, tt c cc trng thi bt u v kt thc s c gp li thnh ch mt trng thi bt u v mt trng thi kt thc. Mt ng i t trng thi kt thc n trng thi bt u s c thm vo, cho php nhn dng c mt chui cc t c di bt k. C th thm vo mt trng thi lng cui mi t m phng khong lng gia 2 t khi pht m cc t lin tip nhau. Hnh v sau y th hin m hnh ny, sil l trng thi lng.

    Hnh 13: HMM kt hp4.3.2. Phng php tnh xc sut on nhn m hc bj(ot)

    Trong mc ny, ti s gii thiu v cch tnh xc sut on nhn m hc bj(ot)da vo phn phi Gauss, hay phn phi chun. Phng php ny c cp n trong cc ti liu: Two Pass Hidden Markov Model for Speech Recognition[1] v Speech and Language Processing[7]. Xt trng hp n gin nht l mi vector c trng ch c mt chiu, k hiu gi tr duy nht ca mi vector c trng l ot. Gi tr ca ot c gi nh l mt phn phi Guass, v mi trng thi n j trong HMM c tng quan vi mt gi tr k vng ? ? v phng sai ??2. Xc sut bj(ot) s c tnh thng qua hm mt xc sut Gauss:

    ??(??) = 1? 2? ??2 ??? ? ??? ? ? ?22??2 ?

    Nh gii thiu phn 4.2.3, phng php hun luyn theo thut ton Baum-Welch l mt phng php lp. Do , ti mi vng lp cc gi tr k vng v phng sai cng cn c iu chnh li. Cng phn 4.2.3, chng ta tm hiu v xc sut

  • 23

    t(i), l xc sut trng thi i ti thi im t, v cch tnh xc sut ny. Cng thc c a ra cp nht cc tham s k vng ? ? v phng sai ??2 nh sau:? ? = ??(?)????=1 ??(?)??=1???2 = ??(?)(?? ? ?)2??=1 ??(?)??=1

    Tuy nhin, cc vector c trng trong bi ton nhn dng ting ni thc cht l cc vector a chiu. V vy chng ta cn phi s dng phn phi Gauss a chiu tnh ton gi tr bj(ot). Nu s chiu ca mt vector l D, vi mi trng thi ta s c mt vector k vng D chiu ? ? v mt ma trn hip phng sai ? kch thc DxD. Ma trn hip phng sai y l mt ma trn ng cho, c gi tr hng d ct d l phng sai ca chiu th d ca vector c trng. Tng t, vector k vng cng c tnh bng cch tnh k vng cho tng chiu ca cc vector c trng. Cui cng, cng thc tnh xc sut on nhn m hc bj(ot) s l:?? (??) = 1? 2? ?? ???? ???? ? ? ?? ? 1??? ? ? ??

    = ? 1? 2? ???2 ??? ? ???? ? ?? ?22???2 ??? =1

    4.3.3. Phng php hun luyn nhng (Embedded Training)Trong phn ny, chng ta s tm hiu v vn hun luyn trong h thng nhn

    dng ting ni s dng m hnh markov n. Mc ch ca qu trnh hun luyn ny l xy dng cc HMM cho tng t. D liu cn thit chun b cho qu trnh hun luyn ny bao gm cc tp mu d liu m thanh cho cc t trong b t vng.

    4.3.3.1. Hand-labeled word training

    Nh gii thiu mc 4.2.3, phng php hun luyn cho m hnh Markov n l da trn thut ton Baum-Welch. Tuy nhin trc khi ni v thut ton ny, ti xin c gii thiu mt phng php hun luyn n gin hn, da trn k thut hc c gim st, phng php hun luyn t c gn nhn trc (hand-labeled word training). Trong phng php ny, d liu hun luyn c gn nhn trc, tc l di ca tng m v trong mi mu d liu hun luyn c xc nh. Do , chng ta cng c th xc nh c mi frame s tng ng vi m v no. Nh chng

  • 24

    ta bit, cc vector c trng c trch chn trn tng frame v cc m v l cc trng thi trong HMM nn ta s xc nh c mt quan st s c on nhn bi trng thi no. Khi , ta c th d dng tnh c phn phi Gauss ca tng trng thi, da vo cng thc:

    ? ? = 1? ? ?? ??=1,? ?=? ???2 = 1? ? (?? ? ?)2 ??=1,? ?=?Sau , tham s B c th d dng tnh c bng hm mt xc sut nh ni

    phn trc. Tip , cc tham s A v ? s c khi to theo cc nguyn tc sau:- ? i s c gi tr bng 1 nu m hnh bt u vi trng thi i, ngc li ? i = 0.- aj s c tnh bng cch thng k d liu:? ?? = ? ?? ?? ?? ? ? ?? ? ? ?? ?? ?? ? ? ? ?

    4.3.3.2. Embedded training

    Trong thc t, gn nhn cho mt tp d liu di mt ting, c th cn thi gian ln n 400 ting. Do phng php hun luyn hand-labeled word training l khng kh thi trong cc bi ton c b t vng ln. Mt k thut hun luyn khc, khng i hi d liu gn nhn sn, c xy dng da trn thut ton Baum-Welch (mc 4.2.3), l phng php hun luyn nhng (Embedded training). Phng php ny gm hai bc nh sau:

    1. Xy dng m hnh Markov n = (A, B, ? ) cho t cn hun luyn. Cc tham s A, B v c khi to nh sau:

    - ? i s c gi tr bng 1 nu m hnh bt u vi trng thi i, ngc li ? i = 0.

    - aj s c gi tr bng 0.5 nu i = j hoc bc chuyn t i sang j l mt bc chuyn tn ti trong m hnh, ngc li aj = 0. Ring aNN s bng 1.

    - Khi to cc gi tr k vng ? ? v phng sai ? ? , trong gi tr k vng v phng sai cho mi trng thi s l k vng v phng sai ca tt c cc vector u vo. Sau tnh bj(ot) da vo ? ? v ??2.

  • 25

    2. Chy thut ton Baum-Welch cho m hnh .

    Hnh v v bng sau m t HMM cho t ONE v cc gi tr khi to ca A v ?

    Hnh 14: HMM cho t ONE

    Bng 2: Khi to tham s ?w ah n? 1 0 0

    Bng 3: Khi to tham s aijw ah n

    w 0.5 0.5 0 0 0.5 0.5 0 0 14.4. VN TNH TON XC SUT TRONG LP TRNH

    Khi lp trnh cc h thng nhn dng s dng HMM, vn tnh ton cc xc sut lun cn c quan tm. Khi s lng quan st ln, cng vi vic cc xc sut lun nh hn 1 th hin tng underflow rt d xy ra, nht l khi tnh cc xc sut forward ? ?(?), backward ? ?(?) hay xc sut m hnh P(O|). gii quyt vn ny, trong phn ny ti s gii thiu mt phng php c m t trong ti liu Numerically Stable Hidden Markov Model Implementation[8]. Thay v tnh cc xc

    sut bnh thng, chng ta s tnh logarit t nhin ca cc xc sut . Hm logarit gip ta vn gi c tng quan gia cc xc sut khi so snh, ng thi trnh c hin tng underflow. trnh hin tng mt s ngn ng nm ra ngoi l hoc tr v gi tr NaN khi tnh logarit ca 0, gi tr LOGZERO c thay cho kt ca khi tnh logarit ca 0. Cc cng thc s c ly logarit c hai v, da trn bn phng thc c bn sau:

  • 26

    1. Hm s m m rng eexp(x)

    - Vi x l mt s thc: ???? (? ) = ? (? )- Vi x = LOGZERO: ???? (? ) = 02. Hm logarit t nhin m rng eln(x)

    - Vi x > 0 ??? (? ) = ln (? )- Vi x = 0 ??? (? ) = ?? ? ? ? ? ?3. Hm tng logarit elnsum(eln(x), eln(y))

    - Vi x > 0, y > 0??? ?? ? ???? (? ), ??? (? )? = ??? (? + ? )= ??? (? ) + ??? ?1 + ???? ???? (? ) ??? (? )??- Vi x = 0 ??? ?? ? ??? ? ? ? ? ? , ??? (? )? = ??? (? )- Vi y = 0 ??? ?? ? (??? (? ), ?? ? ? ? ? ? ) = ??? (? )4. Hm tch logarit elnproduct(eln(x), eln(y))

    - Vi x > 0, y > 0??? ? ??? ? ?????? (? ), ??? (? )? = ??? (? ) + ??? (? )- Vi x = 0 ??? ? ??? ? ????? ? ? ? ? ? , ??? (? )? = ?? ? ? ? ? ?- Vi y = 0 ??? ? ??? ? ??(??? (? ), ?? ? ? ? ? ? ) = ?? ? ? ? ? ?

  • 27

    Chng 5. NHN DNG TING NI BNG I SNH MUTrong chng ny, ti s gii thiu thm mt k thut nhn dng ting ni bng

    phng php i snh mu, da trn thut ton so snh thi gian ng (Dynamic Time Warping - DTW) v phng php trch chn c trng MFCC gii thiu trong Chng 3. Nhng ni dung c trnh by trong chng ny c tng hp t ti liu Dynamic Time Warping Algorithm Review[15] v Cross-words Reference Template

    for DTW-based Speech Recognition Systems[3].

    5.1. THUT TON SO SNH THI GIAN NG (DTW)So snh thi gian ng l mt thut ton quy hoch ng nh gi s tng

    ng gia hai dy c di khc nhau. Thut ton ny hot ng c bit hiu qu khip dng cho hai dy bin thin theo thi gian.

    Cho trc hai dy ? = (?1, ?2, , ?? ) v ? = (?1, ? , , ?? ) c di tng ng l N v M. Gi c(xi, yj) l khong cch gia hai thnh phn ? ? ? v ?? ? . Mt mng D c kch thc NxM s c to ra lu tr cc gi tr trung gian. Gi tr D(i, j) (i=1,,N; j=1,,N) l khong cch ca hai dy con ? ? = (?1, ?2, , ??) ? v ?? = ??1, ? , , ?? ? ? . Gi tr D(i, j) s c tnh nh sau:

    - ? (?, ?) = ? ?(?1, ? ? ), ? = 1?? =1 ?(? ? , ? 1), ? = 1?? =1 ? ?? {? (? 1, ?), ? (?, ? 1), ? (? 1, ? 1)} + ??? ?, ?? ? Hnh sau y th hin vic tnh ton gi tr ti tng trong thut ton DTW:

    Hnh 15: m hnh thut ton DTWKhong cch gia hai dy X v Y, k hiu l DTW(X, Y) bng gi tr hng N,

    ct M ca mng D:

  • 28

    ? ? ? (? , ? ) = ? (? , ? )5.2. P DNG VO BI TON NHN DNG TING NI

    Thut ton so snh thi gian ng c dng phn lp mu trong bi tonnhn dng ting ni ri rc, tc l nhn dng tng t ring bit. Sau khi trch chn c trng MFCC, mi mu ting ni s cho ta mt dy cc vector nhiu chiu. Khong cch gia hai dy ? = (?1, ?2, , ?? ) v ? = (? 1, ? , , ?? ) bt k s c tnh bng thut ton DTW nh m t trn, trong gi tr c(xi, yj) l khong cch Euclid gia hai vector ?? ? v ?? ? . Nu D l s chiu ca mt vector th c(xi, yj) s c tnh nh sau:

    ??? ?, ?? ? = ? ? ?? ?? ??? ?2?? =1 c th nhn dng, mt tp cc mu ting ni (X1, X2, , Xn) c gn nhn

    t trc s c dng lm mu i snh. Gi X l mu cn nhn dng, nu X gn vi Xi nht (khong cch DTW gia X v Xi l nh nht) th X v Xi s cng thuc mt lp.

  • 29

    Chng 6. KT QU THC NGHIMTrong chng ny, ti s a ra mt s kt qu thc nghim t c cng

    vi nh gi, sau khi xy dng chng trnh nhn dng ting ni da vo cc k thut m t nhng phn trc, v xut mt s hng pht trin tip theo sau nghin cu ny.

    6.1. M T THC NGHIMChng trnh thc nghim c xy dng vi mc ch so snh ca hai k thut

    nhn dng ting ni: m hnh Markov n v i snh mu, i vi ting ni ri rc v c b t vng nh. Vi mc ch trn, thc nghim c tin hnh trn hai b d liu s m ting vit t 0-9 (KHNG n CHN) ca mt ngi ni. C hai b d liu ny u bao gm nhng thnh phn sau:

    - Hai tp mu ting ni, mt tp dng hun luyn (i vi phng php HMM) hoc lm mu i snh (i vi phng php i snh mu DTW), mt tp dng kim tra.

    - Mt file lit k cc m v.- Mt file t in lit k cc t v phin m. C tt c 10 t, lit k trong Bng 4.

    Bng 4: T in v phin m cc t trong b s m ting Vit 0-9

    T Phin m

    KHNG KH OO NGZ

    MT M OO TC

    HAI H A I

    BA B A

    BN B OO NZ

    NM N AW MZ

    SU S AW U

    BY B AA YI

    TM T A MZ

    CHN CH I NZ

    Tt c cc mu ting ni trong hai b d liu ny u c ngi vit thu m trn my tnh xch tay bng phn mm m ngun m Audacity 1.2.6 (website: http://audacity.sourceforge.net), trong mi trng phng kn khng c ting ng.

  • 30

    Chng trnh thc nghim c xy dng bng ngn ng lp trnh Java, pht trin trn nn IDE Eclipse (website: http://eclipse.org). Chng trnh c chia ra thnh hai module chnh: frontend v core:7

    - Phn frontend bao gm cc bc x l ca qu trnh trch chn c trng MFCCnh m t Chng 3, mt s khu tin x l tn hiu m thanh cng vi mt b tch t n gin.

    - Phn core bao gm ci t ca hai thut ton nhn dng: HMM v DTW. Trong , b nhn dng DTW c vit theo thut ton m t Chng 5, cn b nhn dng HMM c vit theo l thuyt Chng 4, vi s h tr ca th vin m ngun m JaHMM(website: http://code.google.com/p/jahmm/).

    6.2. KT QUChng trnh c chy th nghim trn tng b d liu, vi nhng tiu ch cn

    nh gi l: t l nhn dng ng v hiu sut ca thut ton (thi gian chy).

    B d liu th nht bao gm cc mu ting ni c ng u cao, do ngi ngi ni pht m cc t vi tc n nh. Tp d liu hun luyn bao gm 100 mu ting ni chia u cho 10 t trong t in, mi t c 10 mu. Tp d liu kim tra bao gm 50 mu ting ni, mi t c 5 mu. Kt qu nhn dng trn b d liu ny nh sau:

    Bng 5: Kt qu thc nghim trn b d liu ting ni ng u cao

    HMM DTW

    Tng s mu 50 50

    Nhn dng ng 50 50

    T l nhn dng ng 100% 100%

    Thi gian chy chng trnh 2.082s 93.204s

    B d liu thc nghim th hai bao gm cc mu ting ni c ng u thp hn, do tc pht m ca ngi ni khng n nh. B d liu ny bao gm 200 mu ting ni ca tp hun luyn v 100 mu ting ni ca tp kim tra, trong s lng mu cho mi t l bng nhau. Kt qu thc nghim thu c nh sau:

    7 Chi tit xem Ph lc

  • 31

    Bng 6: Kt qu thc nghim trn b d liu ting ni ng u thp

    HMM DTW

    Tng s mu 100 100

    Nhn dng ng 97 95

    T l nhn dng ng 97% 95%

    Thi gian chy chng trnh 3.793s 232.194s

    6.3. NH GI KT QU- V chnh xc, vi tp d liu ting ni ri rc v nh, ca mt ngi thu, c

    hai phng php u c t l nhn dng ng kh cao. Trong thc nghim vi b d liu ting ni khng ng u, phng php HMM c t l nhn dng ng cao hn mt cht so vi phng php DTW (97% so vi 95%).

    - V hiu sut, trong c hai b d liu, thi gian chy khi nhn dng bng DTW lun cao hn nhiu ln so vi nhn dng bng HMM. iu ny l d hiu, bi trong k thut nhn dng bng DTW, thi gian chun b trc khi nhn dng l rt nh v chng trnh ch phi trch chn c trng cho cc mu dng i snh. Khi nhn dng, chng trnh s phi so snh ting ni cn nhn dng vi tt c cc mu i snh. Trong khi vi phng php HMM, thi gian chun b trc khi nhn dng l ln hn nhiu v chng trnh s phi chy thut ton hun luyn Baum-Welch, mt thut ton lp cho tng t tnh ra cc tham s ca m hnh HMM. V HMM c xy dng cho tng t, nn vi 10 s m s ch c 10 m hnh, v do tc nhn dng s rt nhanh.

    Vi nhng kt qu , c th tm nh gi phng php nhn dng ting ni bng HMM l tt hn so vi phng php nhn dng ting ni bng DTW. Vi tc nhn dng cao, phng php HMM hon ton c th p dng trong cc bi ton nhn dng ting ni trc tuyn.

  • 32

    Chng 7. KT LUN7.1. TNG KT NI DUNG

    Sau y l tng hp nhng ng gp ca ti qua lun vn ny:

    - Gii thiu mt s k thut tiu biu c th xy dng mt h thng nhn dng ting ni hon chnh. Nhng k thut bao gm, k thut trch chn c trng bng phng php MFCC, cc k thut nhn dng theo tip cn thng k (HMM) v theo tip cn i snh mu (DTW).

    - Xy dng thnh cng mt h thng nhn dng ting ni n gin minh ha, kim nghim tnh ng n ca tng k thut, cng nh so snh tnh hiu qu ca cc k thut nhn dng.

    Nhng kt qu t c t tin hnh thc nghim cng nh t nghin cu l thuyt, mc d cha phi l thnh tu ln lao nhng n gip ti c c ci nhn c bn v nhn dng ting ni, ng thi to mt nn tng kin thc c th thc hin nhng nghin cu su hn trong lnh vc ny.

    7.2. HNG PHT TRINVi nhng kt qu thu c sau nghin cu ny, ti xin a ra mt s hng

    pht trin tip theo nh sau:

    - Kt qu thc nghim cho thy k thut nhn dng ting ni bng m hnh Markov n l rt tim nng. Tuy nhin, HMM c rt nhiu bin th, trong khi nhng l thuyt a ra trong nghin cu ny vn cn mc c s. V vy, nghin cu tip theo s tp trung tm hiu mt dng HMM hiu qu cho nhn dng ting ni.

    - Tm hiu cch trch chn c trng cho ting Vit, v ting Vit l ngn ng c thanh iu nn cn c c trng th hin iu ny.

    - Mc d t kt qu nhn dng kh cao, nhng thc nghim trn c thc hin trn b d liu ting ni nh, ca mt ngi ni, nn chng trnh thc nghim ny ch mang tnh kim tra, nh gi cc thut ton ch cha c kh nng ng dng vo thc t. Hn na, h thng nhn dng ny s hng ti cc bi ton trc tuyn, c th l bi ton tm kim thng tin trc tuyn bng ting ni. V vy, chng trnh cn c pht trin theo hng m rng b t vng ln khong trn 2000 t ting Vit, ng thi b d liu hun luyn s c m rng bao gm ging ni ca nhiu ngi, c nam v n.

  • 33

    PH LCMT S LP CHNH TRONG CHNG TRNH THC NGHIM

    Chng trnh thc nghim c xy dng bng ngn ng lp trnh JAVA, bao gm hai package chnh:

    Package sr.frontend bao gm cc class l ci t ca qu trnh trch chn c trng MFCC:

    - AudioDataReader: bao gm cc phng thc h tr vic c file .wav

    - feature.AudioFrame: l th hin ca mt frame tn hiu m thanh

    - feature.FeatureVector: l th hin ca mt vector c trng sau khi trch chn c trng, bao gm mt mng s thc cha cc gi tr ca vector

    - feature.WindowingProcessor: ci t ca bc windowing, ct tn hiu m thanh u vo ra thnh nhng AudioFrame

    - feature.DFTProcessor: ci t ca bc bin i Fourier, thc hin bin i Fourier nhanh i vi tng frame

    - feature.MelFilterBankProcessor: thc hin vic lc tn s trn thang o tn s Mel

    - feature.DCTProcessor: thc hin bin i cousine ri rc trn tng frame

    - feature.FeatureExtractor: trch chn cc c trng c bn cng cc c trng delta v double delta cho tng frame, tr v cc FeatureVector. S lng c trng s trch chn do ngi dng thit lp.

    Package sr.core bao gm cc lp:- SRMath: bao gm cc phng thc h tr vic tnh ton trong h thng, nh

    tnh ton logarit ca cc xc sut trong m hnh Markov n[8].

    - Dictionary: qun l tp t vng cng vi phin m ca tng t

    - HMMGraph: biu din m hnh Markov n cho mt t da vo phin m, cung cp cc phng thc khi to cc tham s (A, B, ? n luyn

    - trainer.TrainerDataReader: cung cp cc phng thc c d liu hun luyn

    - trainer.TrainerDataWriter: cung cp cc phng thc ghi cc tham s m hnh sau khi hun luyn xong

  • 34

    - decoder.HMMDecoder: b nhn dng s dng m hnh Markov n, s dng th vin m ngun m JaHMM (http://code.google.com/p/jahmm)

    - decoder.DTWDecoder: b nhn dng s dng phng php i snh mu, ci t theo thut ton so snh thi gian ng (DTW)

  • 35

    TI LIU THAM KHO[1] Abdulla W. H., Kasabov N. K., Two Pass Hidden Markov Model for Speech

    Recognition Systems, Proc. ICICS9, Singapore, 1999.

    [2] Abdulla W. H., Kasabov N. K., The Concepts of Hidden Markov Model in

    Speech Recognition, Technical Report Tr99/09, University of Otago, July 1999.

    [3] Abdulla W. H., Chow D., Sin G., Cross-words Reference Template for DTW-

    based Speech Recognition Systems, Proc. IEEE TENCON, Bangalore, India,

    2003.

    [4] Chuo W., Juang B. H., Pattern Recognition in Speech and Language

    Processing, CRC Press, 2003, Ch. 5.

    [5] Duda R. O., Hart P. E., Stork D. G., Pattern Classification, Wiley-Interscience;

    2 edition, 2000, pp. 32-53.

    [6] Englund C., Speech recognition in the JAS 39 Gripen aircraft adaption to

    speech at different G-loads, Master Thesis in Speech Technology, 2004, pp. 1-5.

    [7] Jurafsky D., Martin J. H., Speech and Language Processing: An introduction to

    natural language processing, computational linguistics, and speech recognition,

    Prentice Hall, 2 edition, 2008, Ch. 6. 7. 9.

    [8] Mann T. P., Numerically Stable Hidden Markov Model Implementation, 2006.

    [9] Molau S., Pitz M., Schluter R., Ney H., Computing Mel-Frequency Cepstral

    Coefficients on the Power Spectrum, Proc. Acoustics, Speech and Signal

    Processing, 2001.

    [10] Rabiner L. R. A., Tutorial on Hidden Markov Models and Selected Applications

    in Speech Recognition, Proc. IEEE, 1989

    [11] Roch M., Cepstral Processing, Lecture of San Diego State University.

    [12] Seltzer M., SPHINX III Signal Processing Front End Specification, CMU

    Speech Group, 1999.

    [13] Senin P., Dynamic Time Warping Algorithm Review, Information and

    Computer Science Department, University of Haiwii, Honolulu, 2008.

    [14] Sigurdsson S., Petersen K. B., Lehn-Schiler T., Mel Frequency Cepstral

    Coefficients: An Evaluation of Robustness of MP3 Encoded Music,

  • 36

    Proceedings the Seventh International Conference on Music Information

    Retrieval (ISMIR), 2006.

    [15] Walker W., Lamere P., Kwok P., Raj B., Singh R., Gouvea E., Wolf P., Woelfel

    J., "Sphinx-4: A Flexible Open Source Framework for Speech Recognition",

    Techical Report, Sun Microsystems Inc., 2004.

    [16] Welch L. R. , Hidden Markov Models and the Baum-Welch Algorithm, IEEE

    Information Theory Society Newsletter, 2003, pp. 1. 10-13.

    [17] Nhn dng ting ni bng mng Neuron, Ch. 1. 2. 3.

    [18] Ng Minh Dng, ng Vn Chuyt, Kho st tnh n nh ca mt s c trng ng m trong nhn dng ngi ni, 2003.