wyszukiwanie w plikach audio
TRANSCRIPT
Audio searchAndrzej Dudziec
Outline
● Introduction
● Speech recognition
● Phonetic algorithms
● Evaluation
● Results
● Conclusions
Introduction
Introduction
audio text
Speech recognition
● Words consists of letters e.g. ‘ONE’ - ‘O’, ‘N’, ‘E’● Speech consists of phonemes e.g. /wʌn/ - ‘W’, ‘AH’, ‘N’
Speech recognition
phonemes
AM
Speech recognition
● one W AH N● two T UW● three TH R IY● four F AO R● five F AY V● six S IH K S● seven S EH V AH N● eight EY T● nine N AY N● ten T EH N
phonemes
words
sentences
AM
dict
LM
Speech recognition
phonemes
words
sentences
AM
dict
LM
Issues
● Acoustic level○ background noise○ multiple speakers○ accent, dialect, sex, mood○ coarticulation
● Dictionary level○ homonyms (be & bee, I scream & ice cream)
Phonetic algorithmsThompson -> thompsonthompson -> th3mps3nth3mps3n -> th3mpS3nth3mpS3n -> Th3mpS3nTh3mpS3n -> Th3mPS3nTh3mPS3n -> Th3MPS3nTh3MPS3n -> Th3MPS3NTh3MPS3N -> T23MPS3NT23MPS3N -> TMPSNTMPSN111111 -> TMPSN1
sixteen sixty
Soundex
Metaphone
Caverphone
Soundex
Metaphone
Caverphone
S235
SKST
SKTN11
S230
SKST
SKTA11
● Soundex● Metaphone● Caverphone
Phonetic algorithms
Ackermann AzuronSoundex SoundexA265 A265
Metaphone code computation algorithm
Remove all repeating neighboring letters except letter C.
The beginning of the word should be transformed using the
following rules:
KN → N
GN → N
PN → N
AE → E
WR → R
Remove B letter at the end, if it is after M letter.
Replace C using the rules below:
With Х: CIA → XIA, SCH → SKH, CH → XH
With S: CI → SI, CE → SE, CY → SY
With K: C → K
Replace D using the following rules:
With J: DGE → JGE, DGY → JGY, DGI → JGY
With T: D → T
Replace GH → H, except it is at the end or before a vowel.
Replace GN → N and GNED → NED, if they are at the end.
Replace G using the following rules
With J: GI → JI, GE → JE, GY → JY
With K: G → K
Remove all H after a vowel but not before a vowel.
Perform following transformations using the rules below:
CK → K
PH → F
Q → K
V → F
Z → S
Replace S with X:
SH → XH
SIO → XIO
SIA → XIA
Replace T using the following rules
With X: TIA → XIA, TIO → XIO
With 0: TH → 0
Remove: TCH → CH
Transform WH → W at the beginning. Remove W if there is no vowel
after it.
If X is at the beginning, then replace X → S, else replace X → KS
Remove all Y which are not before a vowel.
Remove all vowels except vowel at the start of the word.
Daitch-Mokotoff SoundexLetter combination At the
startAfter a vowel
Other
SCHTSCH, SCHTSH, SCHTCH, SHTCH, SHCH, SHTSH, STCH, STSCH, STRZ, STRS, STSH, SZCZ, SZCS
2 4 4
SHT, SCHT, SCHD, ST, SZT, SHD, SZD, SD 2 43 43
CSZ, CZS, CS, CZ, DRZ, DRS, DSH, DS, DZH, DZS, DZ, TRZ, TRS, TRCH, TSH, TTSZ, TTZ, TZS, TSZ, SZ, TTCH, TCH, TTSCH, ZSCH, ZHSH, SCH, SH, TTS, TC, TS, TZ, ZH, ZS
4 4 4
Phonetic algorithms
Evaluation
Resultshelp ≠ helped
hell ≠ heaven
Results
Results
Results
preprocessing audio snippets
XMLtext
audio snippets
Results
Results
Results
Conclusions
● good recognition model and audio preprocessing is crucial, consider speed vs accuracy
● phonetic filtering increases recall but decreases precision
● phonetic filters as improvement, not standalone
● consider fuzzy search
Use cases
● audio archive
● looking up broadcast○ opinion mining○ collecting information
● voice control
● dictation○ short notes○ voice mail -> text messages
Discussion?