korea maritime and ocean university nlp jung tae lee [email protected]

22
Korea Maritime and Ocean University NLP Jung Tae LEE [email protected]

Upload: melvyn-daniels

Post on 31-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

NET talk : A parallel network that learns to read a loud

Korea Maritime and Ocean UniversityNLP

Jung Tae LEE [email protected]

Terrence J.Sejnowski and Charles R. Rosenburg (1986)

Page 2: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

01 Introduction

03 Performance

04 Summary

02 Network Architecture

Page 3: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

`

01 Introduction of NETtalk

1. Introduction of NETtalk

NETtalk

One of the method for converting text to speech(TTS).

Automated learning procedure for parallel network of deterministic

processing units.

Conventional approach is converted by applying phonolohical rules,

and handling exceptions with a look-up table.

After trainig, it achives good performance and generalizes to novel

words.

NETtalk

Page 4: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

`

Characteristics of TTS in Eng

English is amongst the most difficult languages to read aloud.

Speech sounds have exceptions that are often context-sensitive

-  EX) the “a” in almost all words ending in “ave”, such as

“brave” and “gave”, is a long vowel,

but-not in “have”, and some words can vary in pronuciation

with their syntactic role.

01 Introduction of NETtalk

NETtalk

This is the problem in conventional approach

Page 5: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

`

DECtalk : commercial product DECtalk used two methods for converting text to phonemes

1. A word is first looked up in a pronunciation dictionary of common

words;

if it is not found there the a set of phonological rules is applied.

(For novel words that are not correctly pronounced)

2. alternative approach is based on massively-parallel network models.

Knowledge in these models is distributed over many processing units

and make decision by exchange of information between the process-

ing unit

01 Introduction of NETtalk

NETtalk

Page 6: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

`

In this paper :

Network learning algorithms with three layers.

NETtalk can be trained on any dialect of any languages.

Demonstrates that a relatively small network can capture most of the

significant regularities in English pronunciation as well as absorb many

of the irregulatities.

01 Introduction of NETtalk

NETtalk

Page 7: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

`

2. Network Architecture

02 Network Archi-tecture

Processing Unit

NETtalk

The network is composed of processing units that non-linearly trans-

form

their summed, continuous-valued inputs.

The connection strength, or

weight, linking one unit to an-

other unit can be a positive or

negative real value.

Page 8: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

Processing Unit The ouput of the ith unit is determined by first summing all of its in-

puts

02 Network Archi-tecture

NETtalk

=

is the weight from the jth to the ith unit, and then applying a

sigmoidal transformation

= =

Page 9: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

Processing Unit

02 Network Archi-tecture

NETtalk

value, representing either an excitatory or an inhibitory influence of the first unit on the output of the second unit

NETtalk is hierarchically arranged into three layers of units

Page 10: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

Representations of Letters and Phonemes

02 Network Archi-tecture

NETtalk

There are seven groups of units in the input layer- Each input group encodes one letter of the input text.- Seven letters are presented to the input units at any one time.

And one group of units in each of the other two layers- The desired output of the network is the correct phoneme, or contrastive speech sound, associated with the center, or fourth

Except for center letter provide a partial context for this deci-sion- The test is stepped through the window letter-by-letter.

At each step, the network computes a phoneme, and after each word the weights are adjusted according to how closely the computed pronunciation matches the correct one.

Page 11: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

Representations of Letters and Phonemes

02 Network Archi-tecture

NETtalk

The letters are represented by alphabet, plus an additional 3 units to encode punctuation and word boundaries

The phonemes, are represented in terms of 23 articulatory features, such as point of articulation, voicing, vowel height, and so on

Three additional units encode stress and syllable boundaries

goal of the learning algorithm is to adjust the weights between the units in the network in order to make the hidden units good feature detectors

Page 12: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

Learning Algorithm

02 Network Archi-tecture

NETtalk

Two texts were used to train the network:- Phonetic transcriptions from informal, continuous speech of a child- 20,012 word corpus from a dictionary

A subset of 1000 words was chosen from this dictionary taken from the Brown corpus of the most common words in English

Letters and phonemes were aligned like this: “phone” - /f-on-/

Page 13: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

Learning Algorithm

02 Network Archi-tecture

NETtalk

Training according to the discrepancy between the desired andactual values of the output units.

This error was “back-propagated” from the output to the input layer.

So, network is adjusted to minimize its contribution to the totalmean square error in discrepancy.

Briefly, the weights were updated accroding to:

: is from the jth unit in layer n to the ith unit in layer n + 1 : smooths the gradient by over-relaxation : leaning rate

Page 14: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

Learning Algorithm

02 Network Archi-tecture

NETtalk

and recursively back-propagating the differences to lower layers

P’(E) : is the first derivative of P(E), : was the desired value of the ith unit in the output layer, : was the actual value obtained from the network

Back-propagate condition : margin > 0.1initialize the weight : -0.3 ~ 0.3 (uniform)

Page 15: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

`

3. Performance

03 Performance

Performance

Two measures of performance were computed

Best Guess- best guess, which was the phoneme making the smallest angle with the output vector.

Perfect match- value of each articulatory feature was within a marginof 0.1 of its corrects value.

NETtalk

Page 16: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

`

Continuous Informal Speech

03 PerformanceNETtalk

Learining after 50,000words. Perfect matches were at 55%.

Page 17: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

`

Continuous Informal Speech

03 PerformanceNETtalk

Examples of raw output from the simulator

stresses

text

phonemes

200word

1 iter

25 iter

Cont’

Page 18: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

`

Continuous Informal Speech

03 PerformanceNETtalk

Graphical summary of the weights between the letter units and some of the hidden units

Negative(inhibitory weight)

Positive(excitatory weight)

Page 19: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

`

Continuous Informal Speech

03 PerformanceNETtalk

Damage to the network and recovery from damage.

Page 20: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

`

Dictionary

03 PerformanceNETtalk

Used the 1000 most Common word in EGN .

Hard pron

soft pron

Page 21: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

`

04 Summary

4. Summary

NETtalk

• Seven groups of nodes in the input layer,

• The text was stepped through the window on a letter-by-letter basis.

• standard back-propagation algorithm

• Strings of seven letters were thus presented to the input layer at any one time.

Page 22: Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

Thank You

Korea Maritime and Ocean UniversityNLP

Jung Tae LEE [email protected]