korea maritime and ocean university nlp jung tae lee [email protected]

NET talk : A parallel network that learns to read a loud

Korea Maritime and Ocean UniversityNLP

Jung Tae LEE [email protected]

Terrence J.Sejnowski and Charles R. Rosenburg (1986)

01 Introduction

03 Performance

04 Summary

02 Network Architecture

`

01 Introduction of NETtalk

1. Introduction of NETtalk

NETtalk

One of the method for converting text to speech(TTS).

Automated learning procedure for parallel network of deterministic

processing units.

Conventional approach is converted by applying phonolohical rules,

and handling exceptions with a look-up table.

After trainig, it achives good performance and generalizes to novel

words.

NETtalk

`

Characteristics of TTS in Eng

English is amongst the most difficult languages to read aloud.

Speech sounds have exceptions that are often context-sensitive

- EX) the “a” in almost all words ending in “ave”, such as

“brave” and “gave”, is a long vowel,

but-not in “have”, and some words can vary in pronuciation

with their syntactic role.


NETtalk

This is the problem in conventional approach

`

DECtalk : commercial product DECtalk used two methods for converting text to phonemes

1. A word is first looked up in a pronunciation dictionary of common

words;

if it is not found there the a set of phonological rules is applied.

(For novel words that are not correctly pronounced)

2. alternative approach is based on massively-parallel network models.

Knowledge in these models is distributed over many processing units

and make decision by exchange of information between the process-

ing unit


NETtalk

`

In this paper :

Network learning algorithms with three layers.

NETtalk can be trained on any dialect of any languages.

Demonstrates that a relatively small network can capture most of the

significant regularities in English pronunciation as well as absorb many

of the irregulatities.


NETtalk

`

2. Network Architecture

02 Network Archi-tecture

Processing Unit

NETtalk

The network is composed of processing units that non-linearly trans-

form

their summed, continuous-valued inputs.

The connection strength, or

weight, linking one unit to an-

other unit can be a positive or

negative real value.

Processing Unit The ouput of the ith unit is determined by first summing all of its in-

puts


NETtalk

=

is the weight from the jth to the ith unit, and then applying a

sigmoidal transformation

= =

Processing Unit


NETtalk

value, representing either an excitatory or an inhibitory influence of the first unit on the output of the second unit

NETtalk is hierarchically arranged into three layers of units

Representations of Letters and Phonemes


NETtalk

There are seven groups of units in the input layer- Each input group encodes one letter of the input text.- Seven letters are presented to the input units at any one time.

And one group of units in each of the other two layers- The desired output of the network is the correct phoneme, or contrastive speech sound, associated with the center, or fourth

Except for center letter provide a partial context for this deci-sion- The test is stepped through the window letter-by-letter.

At each step, the network computes a phoneme, and after each word the weights are adjusted according to how closely the computed pronunciation matches the correct one.

Representations of Letters and Phonemes


NETtalk

The letters are represented by alphabet, plus an additional 3 units to encode punctuation and word boundaries

The phonemes, are represented in terms of 23 articulatory features, such as point of articulation, voicing, vowel height, and so on

Three additional units encode stress and syllable boundaries

goal of the learning algorithm is to adjust the weights between the units in the network in order to make the hidden units good feature detectors

Learning Algorithm


NETtalk

Two texts were used to train the network:- Phonetic transcriptions from informal, continuous speech of a child- 20,012 word corpus from a dictionary

A subset of 1000 words was chosen from this dictionary taken from the Brown corpus of the most common words in English

Letters and phonemes were aligned like this: “phone” - /f-on-/

Learning Algorithm


NETtalk

Training according to the discrepancy between the desired andactual values of the output units.

This error was “back-propagated” from the output to the input layer.

So, network is adjusted to minimize its contribution to the totalmean square error in discrepancy.

Briefly, the weights were updated accroding to:

: is from the jth unit in layer n to the ith unit in layer n + 1 : smooths the gradient by over-relaxation : leaning rate

Learning Algorithm


NETtalk

and recursively back-propagating the differences to lower layers

P’(E) : is the first derivative of P(E), : was the desired value of the ith unit in the output layer, : was the actual value obtained from the network

Back-propagate condition : margin > 0.1initialize the weight : -0.3 ~ 0.3 (uniform)

`

3. Performance

03 Performance

Performance

Two measures of performance were computed

Best Guess- best guess, which was the phoneme making the smallest angle with the output vector.

Perfect match- value of each articulatory feature was within a marginof 0.1 of its corrects value.

NETtalk

`

Continuous Informal Speech

03 PerformanceNETtalk

Learining after 50,000words. Perfect matches were at 55%.

`



Examples of raw output from the simulator

stresses

text

phonemes

200word

1 iter

25 iter

Cont’

`



Graphical summary of the weights between the letter units and some of the hidden units

Negative(inhibitory weight)

Positive(excitatory weight)

`



Damage to the network and recovery from damage.

`

Dictionary


Used the 1000 most Common word in EGN .

Hard pron

soft pron

`

04 Summary

4. Summary

NETtalk

• Seven groups of nodes in the input layer,

• The text was stepped through the window on a letter-by-letter basis.

• standard back-propagation algorithm

• Strings of seven letters were thus presented to the input layer at any one time.

Thank You

Korea Maritime and Ocean UniversityNLP

Jung Tae LEE [email protected]

korea maritime and ocean university nlp jung tae lee [email protected]

Documents