dsp hw2-1 - speech.ee.ntu.edu.tw · -c lib/config.cfg set format of input feature (mfcc_z_e_d_a)-o...

39
DSP HW2-1 HMM Training and Testing 教授:李琳山 助教:王君璇

Upload: others

Post on 26-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

DSP HW2-1HMM Training and Testing

教授:李琳山助教:王君璇

Page 2: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Outline1. Introduction2. HiddenMarkovModelToolkit(HTK)3. HomeworkProblems4. SubmissionRequirements

Page 3: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Introduction● Constructadigitrecognizer- monophone

ling|yi|er|san|si|wu|liu|qi|ba|jiu

● FreetoolsofHMM:HiddenMarkovToolkit(HTK)http://htk.eng.cam.ac.uk/

● Trainingdata,testingdata,scripts,andotherresourcesallareavailableonhttp://speech.ee.ntu.edu.tw/DSP2019Spring/

Page 4: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Flowchart

Page 5: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Hidden Markov Model Toolkit (HTK)

Page 6: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Feature Extraction

Page 7: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Feature Extraction - HCopy

Convertwaveto39dimensionMFCC.-C lib/hcopy.cfg● inputandoutputformat● parametersoffeatureextraction● Chapter7- SpeechSignalsandFront-endProcessing

-S scripts/training_hcopy.scp● amappingfromInputfilenametooutputfilename

speechdata/training/N110022.wav

MFCC/training/N110022.mfc

Page 8: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Training Flowchart

Page 9: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Training Flowchart

Page 10: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Initialize model - HCompV

Computeglobalmeanandvarianceoffeatures-C lib/config.cfg

● setformatofinputfeature(MFCC_Z_E_D_A)-o hmmdef -M hmm

● setoutputname:hmm/hmmdef-S scripts/training.scp

● alistoftrainingdatalib/proto

● adescriptionofaHMMmodel,HTKMMFformat⇨ youcanmodifytheModelFormathere(#states)!

Page 11: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Initial MMF PrototypeMMF:HTKBookchapter7

Page 12: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Initial HMM● bin/macro

ProduceMMFcontainsvFloor

● bin/models_1mixsiladdsilenceHMM

hmm/hmmdef

hmm/models

Page 13: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Training Flowchart

Page 14: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Adjust HMMs - HERest

Basicproblem3forHMM● GivenOandaninitialmodelλ=(A,B,π),adjustλtomaximizeP(O|λ)

Page 15: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Adjust HMMs - HERest

AdjustparametersλtomaximizeP(O|λ)● oneiterationofEMalgorithm● runthiscommandthreetimes=>threeiterations

–I labels/Clean08TR.mlf● setlabelfileto“labels/Clean08TR.mlf”

-o lib/models.lst● alistofwordmodels(liN(零),#i(一),#er(二),…jiou(九),sil)

Page 16: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Add SP Model

Add ”sp”(shortpause)HMMdefinitiontoMMFfile“hmm/hmmdef”

Page 17: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Modify HMMs - HHEd

lib/sil1.hed● alistofcommandtomodifyHMMdefinitions

lib/models_sp.lst● anewlistofmodel(liN(零),#i(一),#er(二),…jiou(九),sil,sp)

Page 18: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Training Flowchart

Page 19: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Adjust HMMs Again - HERest

Page 20: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Increase Number of Mixtures - HHEd

Page 21: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Modification of Models

You can modify # of Gaussian mixture here.

This value tells HTK to change the mixture number from state 2 to state 4. If you want to change # state, check lib/proto.

You can increase # Gaussian mixture here.

Page 22: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Adjust HMMs Again - HERest

Page 23: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Training Flowchart

Hint:Increasemixtureslittlebylittle!

Page 24: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Testing Flowchart

Page 25: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Construct Word Net - HParse

lib/grammar_sp● regularexpression● easyforusertoconstruct

lib/wdnet_sp● outputwordnet● theformatthatHTKunderstand

Page 26: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Viterbi Search - HVite

-w lib/wdnet_sp● inputwordnet

-i result/result.mlf● outputMLFfile

lib/dict● dictionary:amappingfromwordtophonesequences

ling->liN,er->#er,….一 ->sic_ii,七->chi_ii

Page 27: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Compared With Answer - HResults

LongestCommonSubsequence(LCS)

Ref:See HTK book 3.2.2 (p. 33)

Page 28: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Report - Part 1 (40%) - Run Baseline

1. DownloadHTKtools(recommend:compiledbinary)andhomeworkpackage

2. SetPATHforHTKtools:set_htk_path.sh

3. Execute(bashshellscript)01_run_HCopy.sh02_run_HCompV.sh03_training.sh04_testing.sh

Page 29: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Report - Part 1 (40%) - Run Baseline (cont.)

3. Youcanfindaccuracyin“result/accuracy”thebaselineaccuracyis74.34%

4. Putthescreenshotofyourresultonthereport.

Page 30: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Useful tips1. Tounzipfiles

unzipXXXX.ziptar-zxvfXXXX.tar.gz

2. Tosetpathin“set_htk_path.sh”PATH=$PATH:“~/XXXX/XXXX”

3. Incaseshellscriptisnotpermittedtorun…chmod744XXXX.sh

Page 31: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Useful tips

4. IfyouencounterNosuchfileordirectoryonthecompiledbinary files,itisbecauseyouaretryingtoruna32-bitbinaryona64-bitsystemthatdoesn'thave32-bitsupportinstalled.Youmayneedtoinstalllibrarypackagessuchaslibc6:i386,libncurses5:i386,andlibstdc++6:i386.

Page 32: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Report - Part 2 (40%) - Improve Accuracy● Acc>95%forfullcredit;90~95%forpartialcredit

andputthescreenshotofyourresultonthereport.

proto 03_training.sh,mix2_10.hed...

Page 33: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Part 2 - Attention 1● Executing03_training.shtwiceisdifferentfrom

doublingthenumberoftrainingiterations.Toincreasethenumberoftrainingiterations,pleasemodifythescript,ratherthanrunitmanytimes.

Page 34: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Part 2 - Attention 2● Everytimeyoumodifiedanyparameterorfile,you

shouldrun00_clean_all.sh toremoveallthefilesthatwereproducedbefore,andrestartalltheprocedures.Ifnot,thenewsettingswillbeperformedonthepreviousfiles,andhenceyouwillbenotabletoanalyzethenewresults.(Ofcourse,youshouldrecordyourcurrentresultsbeforestartingthenextexperiment.)

Page 35: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Report - Part 3 (30%)● Writeareportdescribingyourtrainingprocessand

accuracy.Numberofstates,Gaussianmixtures,iterations,…HowsomechangeseffecttheperformanceOtherinterestingdiscoveries

● Well-writtenreportmayget+10%bonus.

Page 36: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Submission Requirements1. 4shellscripts

yourmodified01~04_XXXX.sh

2. 1accuracyfilewithonlyyourbestaccuracy(Thebaselineresultisnotneeded.)

3. proto,mix2_10.hedyourmodifiedhmmprototype andfilewhichspecifiesthenumberofGMMsofeachstate

4. hw2-1_bXXXXXXXX.pdfscreenshotforbaselineandthebestresult,orotherinteresting.

Page 37: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Submission Requirements (cont.)5. Putthose8 filesinafolder,compressthefolderto1

zipfileanduploadittoCEIBA.● FoldernameshouldbebXXXXXXXX (e.g.b04901000orr07922000)

● .ziponly● 20% ofthefinalscorewillbetakenoffforwrongformat

6. Deadline:2019/5/323:59:59● LatePenalty:10%offevery24hoursafterdeadline

(lessthan24hourswillbeviewedas24hours).● Submissionafter3dayswillgetzeropoint.

Page 38: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

If you have any problem…● Checkforhintsinthelinuxandshellscripts.ex:鳥哥● ChecktheHTKbook.● AskfriendswhoarefamiliarwithLinuxcommandsor

Cygwin. (link:howtoHTKonCygwin)

Page 39: dsp hw2-1 - speech.ee.ntu.edu.tw · -C lib/config.cfg set format of input feature (MFCC_Z_E_D_A)-o hmmdef -M hmm set output name: hmm/hmmdef-S scripts/training.scp a list of training

Contact TA● email:[email protected]

title:[HW2-1]ProblemDescription● OfficeHour:Monday14:30-15:30電二531王君璇

(Pleasesendanemailbeforecoming!)