2 nd progress meeting for sphinx 3.6 development arthur chan, david huggins-daines, yitao sun...

30
2 nd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jun 7, 2005

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

2nd Progress Meeting For Sphinx 3.6 Development

Arthur Chan,David Huggins-Daines,

Yitao SunCarnegie Mellon University

Jun 7, 2005

This meeting (2nd Progress Meeting of 3.6)

Purpose of this meeting A working progress report on various aspects

of the development A briefing on embedded sphinx2. (by David) A briefing on sphinx3’s “crazy branch” (by

Arthur) As a branch in CVS Include several interesting features Include bunches of mild changes

Discussion before another check-in.

Outline of this talk Review of 1st Progress Meeting Progress of Embedded version of Sphinx 2 (by Dave,

7-10 pages) Progress of Sphinx 3’s crazy branches (15-20 pages)

Architecture Diagram of Sphinx 3.6 Changes in search abstraction (7 pages) Progress on search implementation (8 pages)

GMM Computation FSG mode, Word Switching Tree Search mode

Mild re-factoring (Not “gentle” any more) (3 pages) LM S3.0 family of tools

Hieroglyph (1 page)

Review of 1st Progress Meeting

Last time.. Two separate layers were defined

Low-Level Implementation of Search and Possible abstractions of Search Just introduced, its advantage was not yet

revealed. Implementation of Mode 5 was still under

developed (only 10% Completion) Just modularize libs3decoder to 8 sub-

modules

Progress of Architecture in Sphinx 3.6

Motivation of Architecting Sphinx 3.X Need of new search algorithms

New search algorithm development could have risk. We don’t want to throw away the old one. Mere replacement could cause backward

compatibility problem. Code has grown to a stage where

Some changes could be very hard. Multiple programmers become active at the

same time CVS conflict could become often if things are

controlled by “if-else” structure

Architecture of Sphinx 3.X (X<6)

Batch sequential Architecture (Shaw 96) Each executable would customize the sub-

routines

decode livepretend Decode_anytopo align allphone

GMM Computation 1approx_cont_mgau

Search 1

Process Controller 1

GMM Computation 2(Using gauden &

senone Method 1)

Search 2

Process Controller 2

GMM Computation 3(Using gauden &

senone Method 2)

Search 3

Process Controller 3

GMM Computation 4(Using gauden & senone Method 3)

Search 4

Process Controller 4

Command Line 1 Command Line 2 Command Line 3 Command Line 4

Initialization 1(kb and kbcore)

Initialization 2 Initialization 3 Initialization 4

Pros/Cons of Batch Sequential Architecture Pros:

Great flexibility for individual programmers No assumption, data structure are usually optimized

for the application. Align and allphone have optimization.

Crafting in individual application has high quality Cons:

Tremendous difficulty in maintenance Most changes need to be carried out for 5-6 times.

Spread disease of code duplication Code with functionality was duplicated multiple times

Scared a lot of programmers in the past Beginners tend to love general architecture

Big Picture of Software Architecture in Sphinx 3.6 Layered and Object Oriented

Implemented in C Major high level routines

Initializer (kb.c or kbcore.c) A kind of clipboard for other controllers

Process controller (corpus.c) Govern the protocol of processing a sentence

Search abstraction routine (srch.c) Govern how search is done Implemented as piplines and filters with shared memory Each filter can be overridden, similar to what OO

language do Command line processor (cmd_ln_macro.c

and cmd_ln.c) – implemented as macros.

Software Architecture Diagram of Sphinx 3.6

Applications Controllers/Abstractions

Implementations Libraries

decode

livepretend

align

allphone

dag

astar

livedecodeAPI

SearchController

ProcessController

SearchInitializer

CommandLine

Processor

User Defined Applications

Fast Single Stream GMM

Computation

Multi Stream GMM

Computation

Mode 0 : Align

Mode 1 : Allphone

Mode 2 : FSG

Mode 3 : Anytopo

Mode 4 : Magic Wheel

Mode 5 : WSFT

DictionaryLibrary

SearchLibrary

LM Library

AM Library

Utility Library

FeatureLibrary

MiscellaneousLibrary

decode(anytopo)

Search Abstraction Search abstraction is implemented as objects Search operations are implemented as filters with shared memory Each filter, a kind of unique operation for search Ideally, each filter or a set of filter can be replaced.

SelectActive

CDSenone

ComputeApprox.

GMMScore

(CI senone)

ComputeDetailGMMScore

(CD senone)

ComputeDetailHMMScore(CD)

PropagateGraph (Phone-Level)

RescoringAt word

End usingHigh-Level

KS(e.g. LM)

PropagateGraph(Word-Level)

Search For One Frame

Different ways to implement Search implementations

1, Use Default implementation Just specify all atomic search operations

(ASOs) provided 2, Override “search_one_frame”

Only need to specify GMM computation and how to “search_one_frame”

3, Override the whole mechanism For people who dislike the default so much Override how to “search”

Concrete Examples Mode 4 (Magic Wheel) and Mode 5 (WST)

are using the default implementation Mode 2 (FSG)

override “search_one_frame” implementation But share GMM implementation.

Likely, Mode 0 (align),1 (allphone) and 3 (flat lexicon decoding) will also do the same.

Future work Align, allphone and decode_anytopo’s re-factoring are not yet

completed. Search abstraction need to consider

More flexible mechanisms Do the search backward. (for backward search) Approximate search in the first stage (for phoneme and word

look-ahead) (Optional) Parallel and distributed decoding

Command-line and internal modules could still have mismatch

Might learn from mechanisms of Sphinx 2 and Sphinx 4 Controlling how an utterance could require 5 different files

A better control format? Not yet fully anticipate fixed point front-end and GMM

computation in Sphinx 2

Progress of Search Implementation in

Sphinx 3.6

GMM Computation

Decode can now use SCHMM specify by .semi. Implemented and tested by Dave

GMM Computation in align, allphone, decode, livepretend are now common

Not yet incorporate Sphinx 2 Fixed-point version of GMM computation It looks very delicious.

Finite State Machine Search (Mode 2) -Implementation

Largely Completed (Completion 70%) Recipe:

Search function pointer implementation adapted from Sphinx 2 FSG_* family of

routines GMM computation

Use Sphinx 3 GMM computation Already allows CIGMMS

Finite State Machine Search (Mode 2) –Problems for the Users

Not yet seriously tested Finding test cases are hard

Still don’t have a way to write grammar Yitao’s goal in Q3 and Q4 2005

Either directly incorporate the CFG’s score into the search

Or implement an approximate converter from CFG to FSM (HTK’s method)

Finite State Machine Search (Mode 2) –Other Problems Problems inherited from Sphinx2 (copied from

Ravi’s slide) No lextree implementation (What?) Static allocation of all HMMs; not allocated “on

demand” (Oh, no! ) FSG transitions represented by NxN matrix (You can’t

be serious!! ) Other wish list

No histogram pruning (Houston, we’ve got a problem.)

No state-based implementation (Wilson! I am sorry!! ) We need it for unifyication of BW, alignment, allphone and FSG

search.

Time Switching Tree Search (Mode 4)

Name changes: It was “lucky wheel” Now is “magic wheel”

In last check-in, after test-full, results are exactly the same for 6 corpora We could sleep.

Future work: Change the word end triphone

implementation from composite triphone to full triphones

Word Switching Tree Search (Mode 5)

Now could run for the Communicator task With the same performance as mode 4

Major reasons why it doesn’t approach decode_anytopo’s result Bigram probability is not yet factored

Not an easy task. Still considering howto. Triphone’s implementation is not yet exact

Completion 30%

Future work on Mode 5

N-gram Look-ahead Full trigram tree implementation Phoneme and Word Look-ahead Share full triphone implementation

with mode 4 in future.

Big picture of All Search Implementations Finite state machine data structure could unify

align, allphone, Baum-Welch, FSG search

Time will show whether it is also applicable in tree search.

Search implementation has more short-term demand. Mode 5 will be our new flag ship By Oct, 3 out of 4 goals in mode 5 should be

completed. Between different searches, code should be shared

as much as possible

Some other mild refactorings

Summary of Re-factorings

Not gentle any more But it is mild

Several useful things to know Language model routine revamping S3.0 family of tools Overall status of merging

LM routine Current capability

Read both text-based and DMP-based LM Allow switching of LM Allow inter-conversion between text and

DMP format of LM Provide single interface to all applications

Tool of the month : lm_convert lm3g2dmp++ Will be the application for future language

model inter-conversion Other formats? CMULMTK’s format?

S3.0 family of tools Architecture drives many changes in the

code Align, allphone and decode_anytopo now use

kbcore Same version of multi-stream GMM

Computation routine Simplified search structure. ctl_process mechanism

Next step is to use srch.c interface. All tools are now sharing

Sets of common command-line macros

Code Merging Sphinx3.0, Sphinx 3.X and share are now

unified. Alex: “It’s time to fix the training algorithms!” Ravi: “It’s time to add full n-gram and full n-phones to the

recognizer!!” Dave: ”It’s time to work on pronunciation modeling!” Yitao: “It’s time to implement a CFG-based search!!” Evandro: “It’s time to do more regression test!” Alan: “Don’t merge Sphinx with festival!!” Next step:

It’s time to clean up SphinxTrain. We will keep the pace to be <4 tools

check-in/month.

Hieroglyphs Halves of Chapter 3 and 5 are finished

Chapter 3: “Introduction to Speech Recognition”

Missing : Description of DTW, HMM and LM Chapter 5: “Roadmap of building speech

recognition system” Missing

How to evaluate the system? How to train a system? (Evandro’s tutorial will be

perfect)

Still ~4 chapters (out of 12) of material to go before 1st draft is written

Conclusion We have done something. Embedded Sphinx 2

Its completion will benefit both sphinx 2 and sphinx 3

Sphinx 3.6 Its completion will benefit

long term development Short term need in funded projects

Tentative deadline: Beginning of October