towards machine comprehension of spoken content machine... · 2017-11-09 · understand the...

61
讓機器聽懂人說話 李宏毅 Hung-yi Lee

Upload: others

Post on 29-Jan-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

讓機器聽懂人說話 李宏毅

Hung-yi Lee

Page 2: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Outline

你 好 嗎?

從語音訊號到文字 你 好 嗎?

機器決定 要說(做)什麼

語音:

我 很 好

了解一個詞

了解一個句子

了解一整段對話

Everything is based on Deep Learning

Page 3: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Deep Learning in One Slide

They are functions.

Many kinds of networks:

Matrix

How to find the function?

Given the examples of inputs/outputs as (training data): {(x1,y1),(x2,y2), ……, (x1000,y1000)}

Fully connected feedforward network

Convolutional neural network (CNN)

Recurrent neural network (RNN)

Vector

Vector Seq

𝑥 𝑦

Page 4: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Outline

你 好 嗎?

從語音訊號到文字 你 好 嗎?

語音:

Page 5: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Speech Recognition

Spoken Content

Text

Speech Recognition

f “How are you”

“Hi”

“I am fine”

“Good bye”

Page 6: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Typical Deep Learning Approach

• The hierarchical structure of human languages

hh w aa t d uw y uw th ih ng k

what do you think

t-d+uw1 t-d+uw2 t-d+uw3

…… t-d+uw d-uw+y uw-y+uw y-uw+th ……

d-uw+y1 d-uw+y2 d-uw+y3

Phoneme:

Tri-phone:

State:

Page 7: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Typical Deep Learning Approach

• The first stage of speech recognition • Classification: input → acoustic feature, output → state

…… Determine the state each acoustic feature belongs to Acoustic

feature

States: a a a b b c c

Page 8: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Typical Deep Learning Approach

…… ……

xi

Size of output layer = No. of states

P(a|xi)

DNN

DNN input:

One acoustic feature

DNN output:

Probability of each state

P(b|xi) P(c|xi) ……

……

CNN

Page 9: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Very Deep

MSR

Page 10: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Human Parity!

• 微軟語音辨識技術突破重大里程碑:對話辨識能力達人類水準!(2016.10)

• https://www.bnext.com.tw/article/41414/bn-2016-10-19-020437-216

• Dong Yu, Wayne Xiong, Jasha Droppo, Andreas Stolcke , Guoli Ye, Jinyu Li , Geoffrey Zweig, “Deep Convolutional Neural Networks with Layer-wise Context Expansion and Attention”, Interspeech 2016

• IBM vs Microsoft: 'Human parity' speech recognition record changes hands again (2017.03)

• http://www.zdnet.com/article/ibm-vs-microsoft-human-parity-speech-recognition-record-changes-hands-again/

• George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall, “English Conversational Telephone Speech Recognition by Humans and Machines”, arXiv preprint, 2017

Machine 5.9% v.s. Human 5.9%

Machine 5.5% v.s. Human 5.1%

Page 11: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

End-to-end Approach - Connectionist Temporal Classification (CTC)

• Connectionist Temporal Classification (CTC) [Alex Graves, ICML’06][Alex Graves, ICML’14][Haşim Sak, Interspeech’15][Jie Li, Interspeech’15][Andrew Senior, ASRU’15]

好 好 好

Trimming

棒 棒 棒 棒 棒

“好棒”

Why can’t it be “好棒棒”

Input:

Output: (character sequence)

(vector sequence)

Problem?

Page 12: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

End-to-end Approach - Connectionist Temporal Classification (CTC)

• Connectionist Temporal Classification (CTC) [Alex Graves, ICML’06][Alex Graves, ICML’14][Haşim Sak, Interspeech’15][Jie Li, Interspeech’15][Andrew Senior, ASRU’15]

好 φ φ 棒 φ φ φ φ 好 φ φ 棒 φ 棒 φ φ

“好棒” “好棒棒” Add an extra symbol “φ” representing “null”

Page 13: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

More Approaches

• DNN + structured SVM • [Meng & Lee, ICASSP 10]

• DNN + structured DNN • [Liao & Lee, ASRU 15]

• Neural Turing Machine • [Ko & Lee, ICASSP 17]

hidden layer h1

hidden layer h2

W1

W2

F2(x, y; θ

2)

WL

speech signal

F1(x, y; θ

1)

y (phoneme label sequence)

(a ) u s e DNN p h o n e p o s te rio r a s a c o u s tic ve c to r

(b ) s tru c tu re d S VM (c ) s tru c tu re d DNN

Ψ(x,y)

hidden layer hL-1

hidden layer h1

hidden layer hL

W0,0

output layer

input layer

W0,L

feature extraction

a c b a

x (acoustic vector sequence)

Page 14: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Outline

你 好 嗎?

從語音訊號到文字 你 好 嗎?

語音:

了解一個詞

Page 15: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Word Embedding

• Machine learns the meaning of words from reading a lot of documents without supervision

dog

cat

rabbit

jump run

flower

tree

Word Embedding

Page 16: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Word Embedding

• Machine learns the meaning of words from reading a lot of documents without supervision

• A word can be understood by its context

蔡英文 520宣誓就職

馬英九 520宣誓就職

蔡英文、馬英九 are something very similar

You shall know a word by the company it keeps

Page 17: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Demo

• Machine learn the meaning of words from reading a lot of documents without supervision

Page 18: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

機器能不能學會鄉民用語

和「好棒」語意最相近的辭彙

超讚、真不錯、真好、好有趣、好感動

和「好棒棒」語意最相近的辭彙

不就好棒棒、阿不就好棒、好清高、好高尚、不就好棒

和「廢宅」語意最相近的辭彙

宅宅、臭宅、魯宅、魯蛇、窮酸宅

和「本魯」語意最相近的辭彙

小魯、魯妹、魯蛇小弟、魯弟、小弟

Page 19: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

機器能不能學會鄉民用語

V(魯夫) − V(海賊王) ≈ V(鳴人) − V(火影忍者)

魯夫:海賊王 = 鳴人:? Ans: 火影忍者

魯蛇:loser = 溫拿:? Ans: winner

魯蛇:窮 = 溫拿:? Ans: 有錢

研究生:期刊 = 漫畫家:? Ans: 少年Jump

Page 20: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Outline

你 好 嗎?

從語音訊號到文字 你 好 嗎?

語音:

了解一個詞

了解一個句子

Page 21: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Sentiment Analysis

Sentiment Analysis

我 覺 太 得 糟 了

超好雷

好雷

普雷

負雷

超負雷

看了這部電影覺得很高興 …….

這部電影太糟了 …….

這部電影很棒 …….

Positive (正雷) Negative (負雷) Positive (正雷)

……

RNN (Recurrent Neural Network)

Page 22: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Recurrent Neural Network

• Recurrent Structure: usually used when the input is a sequence

f h0 h1

x1

f h2 f h3

g

No matter how long the input sequence is, we only need one function f

y

x2 x3

Func f ht

xt

ht-1

Page 23: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

LSTM GRU

Func f ht

xt

ht-1

Page 24: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Sentiment Analysis

• It is bad.

• It is not bad.

• AI is hard to learn, but it is powerful.

• AI is powerful, but it is hard to learn.

• AI is powerful even though it is hard to learn.

0.05

0.90

0.86

0.35

0.73

Smaller number means more negative.

Larger number means more positive.

感謝陳冠宇同學提供實驗結果

Page 25: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Outline

你 好 嗎?

從語音訊號到文字 你 好 嗎?

語音:

了解一個詞

了解一個句子

了解一整段對話

Page 26: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

New task for Machine Comprehension of Spoken Content

• TOEFL Listening Comprehension Test by Machine

Question: “ What is a possible origin of Venus’ clouds? ”

Audio Story:

Choices:

(A) gases released as a result of volcanic activity

(B) chemical reactions caused by high surface temperatures

(C) bursts of radio energy from the plane's surface

(D) strong winds that blow dust into the atmosphere

(The original story is 5 min long.)

Page 27: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

New task for Machine Comprehension of Spoken Content

• TOEFL Listening Comprehension Test by Machine

“what is a possible origin of Venus‘ clouds?"

Question:

Audio Story: Neural

Network

4 Choices

e.g. (A)

answer

Using previous exams to train the network

ASR transcriptions

Page 28: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Model Architecture

“what is a possible origin of Venus‘ clouds?"

Question:

Question Semantics

…… It be quite possible that this be due to volcanic eruption because volcanic eruption often emit gas. If that be the case volcanism could very well be the root cause of Venus 's thick cloud cover. And also we have observe burst of radio energy from the planet 's surface. These burst be similar to what we see when volcano ……

Audio Story:

Speech Recognition

Semantic Analysis

Semantic Analysis

Attention

Answer

Select the choice most similar to the answer

Attention

The whole model learned end-to-end.

Page 29: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Model Architecture - Attention Mechanism

Sentence 2 Sentence 1

w1 w2 w3 w4 w5 w6 w7 w8

Story (through ASR)

S1 S2 S3 S4 S5 S6 S7 S8

Σ

α1 α2 α3 α4 α5 α6 α7 α8

yb(1) yf(T)

VQ

… yb(1) yb(2) yb(T)

… yf(1) yf(2) yf(T)

Module for Vector Representation

𝑉𝑠 = ∝𝑡∗ 𝑆𝑡8𝑡=1 𝛼 =

𝑉𝑄∙ 𝑆𝑡|𝑉𝑄|∙|𝑆𝑡|

Understand the question

W 2 … W T

Question W 1

VQ : vector representation for question

Vs : consider both Question and Story with attention weight α

Concatenate the output of last hidden layer in bi-directional GRU

Concatenate the output of hidden layer at each time step

(similarity score)

A bi-directional GRU

Page 30: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Model Architecture

Question

Story (through ASR)

Att

VA VB VC VD

Choice A Choice D Choice B Choice C

0.6 0.1 0.2 0.1

hop 1

+ + hop 2

Attention Machanism recap

Process Question by VecRep module

Att

VecRep Att

VecRep

Get Vs through attention module

A hop means the machine considers question and story jointly once

Do more hops for considering story again

Att

……

…… + hop n

VecRep VecRep VecRep VecRep

Get 4 choices representation through VecRep module Compare similarity between choices and VQn

Take the choice with the highest score as answer

To keep question info, add VQ and VS

Page 31: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Sentence Representation

Bi-directional RNN

Tree-structured Neural Network

Attention on all phrases

Sentence

w1 w2 w3 w4

S1 S2 S3 S4

w1 w2 w3 w4

Page 32: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Experimental Results A

ccu

racy

(%

)

random

Naïve approaches

Example Naïve approach: 1. Find the paragraph containing most key terms in

the question. 2. Select the choice containing most key terms in

that paragraph.

Page 33: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Experimental Results A

ccu

racy

(%

)

random

42.2% [Tseng, Shen, Lee, Lee, Interspeech’16]

Naïve approaches

48.8% [Fan, Hsu, Lee, Lee, SLT’16]

Page 34: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Outline

你 好 嗎?

從語音訊號到文字 你 好 嗎?

機器決定 要說(做)什麼

語音:

我 很 好

了解一個詞

了解一個句子

了解一整段對話

Page 35: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

要怎麼讓機器可以和人對話?

• 人都知道怎麼和人對話,我們可以把規則寫下來,讓機器照著做嗎? (Hand-crafted rules)

• For example, you want to build a chat-bot

• If there is “推薦” and “音樂” in the input, then chat-bot says “我推薦五月天”

• You can say “請推薦我一些音樂” and “你推薦誰的音樂?”. Smart?

• What if someone says “你不推薦誰的音樂?” ……

• 問題

• 人類對話的規則太過複雜,無法窮舉

• Chat-bot 沒有 “free style”,回應都是事先設好的

Page 36: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

決定要說什麼很難嗎?

• 當你輸入一個句子時,機器有多少個可能的句子呢?

• 中文句子由一串中文字夠成

• 常用中文字約 4000 個

• 假設句子長度固定為 15 個字

• Ans: 4000 的 15 次方

• 遠多於地球上所有海洋中的水分子數目

≈1.07 x 1054

Page 37: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

如果機器可以正確回答一句話,它不是大海撈針, 而是從大海中撈出一個的水分子

Page 38: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Sequence-to-sequence Learning

• Sequence to sequence learning: Both input and output are both sequences with different lengths.

Seq2seq

機 器 學 習

Seq2seq

Machine Learning

Seq2seq

你 好 嗎

你 好 嗎 我 很 好

語音辨識 翻譯 對話

電視影集、電影台詞

freestyle!

Page 39: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Sequence-to-sequence Learning

RNN Encoder

Input sequence (中文)

output sequence (英文)

RNN Decoder

語義

會讀中文

會寫英文

vector

(Encoder, Decoder 共通的語言?)

Page 40: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Sequence-to-sequence Learning

• Both input and output are both sequences with different lengths. → Sequence to sequence learning

• E.g. Machine Translation (machine learning→機器學習)

Containing all information about

input sequence

learnin

g

mach

ine

Page 41: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

learnin

g

Sequence-to-sequence Learning

• Both input and output are both sequences with different lengths. → Sequence to sequence learning

• E.g. Machine Translation (machine learning→機器學習)

mach

ine

學 機 器

……

……

Don’t know when to stop

習 慣 性

Page 42: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Sequence-to-sequence Learning

推 tlkagk: =========斷========== 接龍推文是ptt在推文中的一種趣味玩法,與推齊有些類似但又有所不同,是指在推文中接續上一樓的字句,而推出連續的意思。該類玩法確切起源已不可知(鄉民百科)

Page 43: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Sequence-to-sequence Learning

• Both input and output are both sequences with different lengths. → Sequence to sequence learning

• E.g. Machine Translation (machine learning→機器學習)

Add a symbol “===“ (斷)

[Ilya Sutskever, NIPS’14][Dzmitry Bahdanau, arXiv’15]

learnin

g

mach

ine

學 機 器 習

===

Page 44: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Chat-bot

• Both input and output are both sequences with different lengths. → Sequence to sequence learning

電視影集、電影台詞 Source of image: https://github.com/farizrahman4u/seq2seq

Page 45: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Chat-bot with GAN

Discriminator

Input sentence/history h response sentence x

Real or fake

human dialogues

Chatbot

En De

Conditional GAN

response sentence x

Input sentence/history h

Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, Dan Jurafsky, “Adversarial Learning for Neural Dialogue Generation”, arXiv preprint, 2017

Ref:一日搞懂 GAN https://www.slideshare.net/tw_dsconf/ss-78795326

Page 46: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Example Results 感謝 段逸林 同學提供實驗結果

Page 47: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Towards Characterization

• 感謝王耀賢同學提供實驗結果

• https://github.com/yaushian/simple_sentiment_dialogue

Input: How do you feel ? I am good.

I am so embarrassed.

Input: I love you. I love you!

I wish I wish I wish I could go.

Page 48: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

✘Negative sentence to positive sentence: it's a crappy day -> it's a great day

i wish you could be here -> you could be here

it's not a good idea -> it's good idea

i miss you -> i love you

i don't love you -> i love you

i can't do that -> i can do that

i feel so sad -> i happy

it's a bad day -> it's a good day

it's a dummy day -> it's a great day

sorry for doing such a horrible thing -> thanks for doing a

great thing

my doggy is sick -> my doggy is my doggy

my little doggy is sick -> my little doggy is my little doggy

Cycle GAN

感謝 王耀賢 同學提供實驗結果

Page 49: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Outline

你 好 嗎?

從語音訊號到文字 你 好 嗎?

機器決定 要說(做)什麼

語音:

了解一個詞

了解一個句子

了解一整段對話

摘要

Page 50: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Summarization

Audio File

to be summarized

This is the summary.

Select the most informative segments to form a compact version

Extractive Summaries

…… deep learning is powerful …… …… ……

[Lee, et al., Interspeech 12] [Lee, et al., ICASSP 13] [Shiang, et al., Interspeech 13]

Machine does not write summaries in its own words

Page 51: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Abstractive Summarization

• Now machine can do abstractive summary (write summaries in its own words)

Title 1

Title 2

Title 3

Training Data

title generated by machine

without hand-crafted rules

(in its own words)

Page 52: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Abstractive Summarization

• Input: transcriptions of audio, output: title

ℎ1 ℎ2 ℎ3 ℎ4

RNN Encoder: read through the input

w1 w4 w2 w3 transcriptions of audio from automatic speech recognition (ASR)

𝑧1 𝑧2 ……

…… wA wB

RNN generator

Page 53: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Abstractive Summarization

刑事局偵四隊今天破獲一個中日跨國竊車 集團,根據調查國內今年七月開放重型機車上路後 ……

Document:

Human:跨國竊車銷贓情形猖獗直得國內警方注意

Machine:刑事局破獲中國車集

據印度報業托拉斯報道印度北方邦22 日發生一起小公共汽車炸彈爆炸事件造成 15 人死亡 3 人受傷 ……

Document:

Human: 印度汽車炸彈爆炸造成15人死亡

Machine: 印度發生汽車爆炸事件

感謝 盧柏儒 同學提供實驗結果

Page 54: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Unsupervised Abstractive Summarization

1 2 長文 短文

當做摘要

原來 的長文

台灣大學 … 灣學 … 台灣大學 …

http://laughl.com/archives/7131/【問號哪裡來】「黑人問號哥」原來大有來頭?!/

機器在密謀要 統治人類了 ……

Page 55: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Unsupervised Abstractive Summarization

1 2

3

長文 短文

當做摘要

原來 的長文

台灣大學 … 灣學 … 台灣大學 …

大量人寫的句子 判斷句子是不是人寫的

讓三號機覺得產生的句子是人寫的

台大 …

Page 56: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Unsupervised Abstractive Summarization

感謝 王耀賢 同學提供實驗結果

Page 57: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Outline

你 好 嗎?

從語音訊號到文字 你 好 嗎?

機器決定 要說(做)什麼

語音:

了解一個詞

了解一個句子

了解一整段對話 能不能畫圖?

“Girl with red hair and red eyes”

“Girl with yellow ribbon”

Page 58: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Data Collection

http://konachan.net/post/show/239400/aikatsu-clouds-flowers-hikami_sumire-hiten_goane_r

感謝曾柏翔助教、 樊恩宇助教蒐集資料

Page 59: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Released Training Data

• Data download link:

https://drive.google.com/open?id=0BwJmB7alR-AvMHEtczZZN0EtdzQ

• Anime Dataset:

• training data: 33.4k (image, tags) pair

• Training tags file format

• img_id <comma> tag1 <colon> #_post <tab> tag2 <colon> …

blue eyes red hair short hair

tags.csv

96 x 96

Page 60: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Conditional GAN

• 根據文字敘述畫出動漫人物頭像

MLDS 作業三 負責助教:曾柏翔

Black hair, blue eyes

Blue hair, green eyes

Red hair, long hair

Page 61: Towards Machine Comprehension of Spoken Content Machine... · 2017-11-09 · Understand the question W 2 … T Question W 1 V Q : vector representation for question Vs : consider

Concluding Remarks

你 好 嗎?

從語音訊號到文字 你 好 嗎?

機器決定 要說(做)什麼

語音:

我 很 好

了解一個詞

了解一個句子

了解一整段對話

Everything is based on Deep Learning