semi-supervised learning by machine speech chain …...semi-supervised learning by machine speech...

15
http://www.naist.jp/ 無限の可能性、ここが最先端 -Outgrow your limits- Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech Interpretation Satoshi Nakamura, Sakriani Sakti, and Katsuhito Sudoh Graduate School of Science and Technology, Nara Institute of Science and Technology, Japan LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan 1 Dec. 6. 2019

Upload: others

Post on 26-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits-

Semi-supervised Learning by Machine Speech Chain for

Multilingual Speech Processing, and Recent Progress

on Automatic Speech Interpretation

Satoshi Nakamura,

Sakriani Sakti, and Katsuhito Sudoh

Graduate School of Science and Technology,

Nara Institute of Science and Technology, Japan

LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan

1

Dec. 6. 2019

Page 2: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits-

Topics

Recent advances in speech processing

– ASR and TTS research

– Machine Speech Chain unifies ASR and TTS

– Application to code switching speech

Speech Translation

– Recent Progress on Automatic Speech Interpretation

Dec. 6. 2019 LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan

2

Page 3: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits-

Motivation Background

In human communication → A closed-loop speech chain mechanism has a critical auditory feedback

mechanism (“Speech Chain”, Denes, Pinson 1973)

Dec. 6. 2019 LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan

3

Sensory nerves

Motornerves

Sensory nerves

Auditory feedback

Speaking Listening

Page 4: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits-

Machine Speech Chain

Proposed Method

LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan

4

Develop a closed-loop speech chain model based on deep learning

“Good afternoon”

Sensory nerves

Motornerves

Auditory feedback

Speaking

“How are you?”

Speaking

Auditory feedback

Use the closed-loop

for ASR and TTS

Dec. 6. 2019

Page 5: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits-

Machine Speech Chain

Definition:

– 𝑥 = original speech, 𝑦 = original text

– ො𝑥 = predicted speech, ො𝑦 = predicted text

– 𝐴𝑆𝑅(𝑥): 𝑥 → ො𝑦 (seq2seq model transforms speech to text)

– 𝑇𝑇𝑆 𝑦 : 𝑦 → ො𝑥 (seq2seq model transforms text to speech)

LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan

5

Dec. 6. 2019

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, “Listening while Speaking: Speech Chain by Deep Learning”, Proc. IEEE ASRU 2017

𝐿𝐴𝑆𝑅(𝑦, ො𝑦)𝐿𝑇𝑇𝑆(𝑥, ො𝑥)

Page 6: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits-

Speech Chain with One-shot Speaker Adaptation

Proposed model

– Train ASR and TTS models using unpaired data and small amount of paired data.

– Speaker individuality is generated by SPKREC embedding.

6

Dec. 6. 2019 LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, “Machine Speech Chain with One-shot Speaker Adaptation”, Proc. INTERSPEECH 2018

Page 7: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits- LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan

7

Code-switching Challenges For ASR

“これはstill waterですか?”

Standard ASR is monolingual

ASR?Output text

Japanese

ASR“こんにちは”

English

ASR “Hello”

Japanese

EnglishJapanese

• Typical case where paired speech and transcription are difficult to collect.

Challenge with Code-switching : Mixed multilingual input

Dec. 6. 2019

Page 8: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits-

Code-switching Challenges

LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan

8

ASR

TTS

෧𝑡𝑒𝑥𝑡

𝑡𝑒𝑥𝑡

𝐿𝑜𝑠𝑠

𝑠𝑝𝑒𝑒𝑐ℎASR

TTS

෧𝑡𝑒𝑥𝑡𝐿𝑜𝑠𝑠unfold

ASR

TTS

෧𝑡𝑒𝑥𝑡

෧𝑡𝑒𝑥𝑡

t𝑒𝑥𝑡

Given only text Given only speech

• Typical case where paired speech and transcription are difficult to collect.

“これはstill waterですか?”

ASR?Output text

Japanese

EnglishJapanese

Challenge with Code-switching : Mixed multilingual input

Dec. 6. 2019

S. Nakayama, A. Tjandra, S. Sakti, S. Nakamura, “Speech chain for semi-supervised learning of Japanese-English code-switching ASR and TTS”. In Proc. IEEE SLT, 2018

18.11% CER-> 5.08% CER

Page 9: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits-

Topics

Recent advances in speech processing

– ASR and TTS research

– Machine Speech Chain unifies ASR and TTS

– Application to code switching speech

Speech Translation

– Recent Progress on Automatic Speech Translation

LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan

9

Dec. 6. 2019

Page 10: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits-

Speech Translation

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST10

MultilingualSpeech

Recognition

Spoken Language

Translation

MultilingualSpeech

SynthesisJapanese English

I go to school「私は学校に行く: Watashi wa Gakko he iku」

Watashi wa Gakko he iku

I go to school

• Cascaded process of speech recognition, machine translation, and speech synthesis.

• Machine translation of ASR transcripts.

Page 11: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits-

Cross-lingual Communication

LT4ALL Dec.6 2019 ©Satoshi Nakamura, AHC Lab, NAIST, Japan

11

Input:Text

SpeechVideo

Gesture

Speech⇒TextASR

RealtimeIncremental

MTConversion

Dialog Control

LinguisticInformation

ParalinguisticEmotion,

Style, Personality,

Prosody, Gesture

ParalinguisticEmotion,

Style, Personality,

Prosody, Gesture

Output:Text

SpeechVideo

Gesture

Source Language Target Language

Speech“to o kyo e i ku”

MT results/I/go/to/Tokyo/

TTS results“ai go tu tokyo/

Personality, Prosody Personality, Prosody

DiscourceContext

Domain knowledge

Text

Image⇒textPR

Text

Text⇒SpeechTTS

Text⇒ImageImage Generation

End-to-end Process

Communication

① Simultaneity, Incremental, Latency,

② +Para/non linguistic information

LinguisticInformation

Dec. 6. 2019

Page 12: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits-

Human Interpreter [A.Mizuno 2016]

Dec. 6. 2019 LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan

12

E-J Translation Example

(1) The relief workers (2) say (3) they don’t have (4) enough food, water, shelter, and medical supplies (5) to deal with (6) the gigantic wave of refugees (7) who are ransacking the countryside (8) in search of the basics (9) to stay alive.

(1) 救援担当者は (9) 生きるための (8) 食料を求めて (7) 村を荒らし回っている (6) 大量の難民達の (5) 世話をするための (4) 十分な食料や水,宿泊施設,医療品が (3) 無いと (2) 言っています.

Necessary #Chunk>3!

(1) 救援担当者達の (2) 話では (4)食料,水,宿泊施設,医薬品が, (3) 足りず (6) 大量の難民達の (5) 世話が出来ないとのことです.(7) 難民達は今村々を荒らし回って, (9) 生きるための (8) 食料を求めているのです.

Necessary #Chunk<3!

Memory Chunk

Page 13: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits- Dec. 6. 2019 LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan

13

Katsuki Chousa, Katsuhito Sudoh, and Satoshi Nakamura. 2019. Simultaneous Neural Machine Translation using Connectionist Temporal Classification. arXiv preprint , 1911.1193

Simultaneous Speech Translation with Adaptive Delay

Define a special token <wait>

ブッシュ大統領 は プーチン と 会談 する

President Bush <wait> meets with Putin<wait><wait>

Page 14: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits-

Paralinguistic Speech Translation

Dec. 6. 2019 LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan

14

Q. T. Do, S. Sakti, S. Nakamura, “Sequence-to-Sequence Models for Emphasis Speech Translation”. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(10):1873–1883, 2018

Page 15: Semi-supervised Learning by Machine Speech Chain …...Semi-supervised Learning by Machine Speech Chain for Multilingual Speech Processing, and Recent Progress on Automatic Speech

http://www.naist.jp/無限の可能性、ここが最先端 -Outgrow your limits-

Summary

Machine Speech Chain

– Semi-supervised training using unpaired data

– Code switching speech, under-resource language, and continuous learning

Speech Translation

– Simultaneous speech translation

• Segmentation, anticipation, rewording, evaluation

– Paralinguistic speech translation

• Emphasis, and emotions

Future works

– Understanding and interpretation

– Context, situation and multi-modality

– Common sense, knowledge, and cross-cultural knowledge

LT4ALL ©Satoshi Nakamura, AHC Lab, NAIST, Japan

15

Dec. 6. 2019