introduction for seq2seq(sequence to sequence) and rnn

20
Introduction to Sequence to Sequence Model 2017.03.16 Seminar Presenter : Hyemin Ahn

Upload: hye-min-ahn

Post on 05-Apr-2017

1.864 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: Introduction For seq2seq(sequence to sequence) and RNN

Introduction toSequence to Sequence Model

2017.03.16 SeminarPresenter : Hyemin Ahn

Page 2: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 2

Recurrent Neural Networks : For what?

Human remembers and uses the pattern of sequence.• Try ‘a b c d e f g…’

• But how about ‘z y x w v u t s…’ ?

The idea behind RNN is to make use of sequential in-

formation.

Let’s learn a pattern of a sequence, and utilize (estimate,

generate, etc…) it!

But HOW?

Page 3: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 3

Recurrent Neural Networks : Typical RNNs

OUTPUT

INPUT

ONE STEP DE-

LAY

HIDDEN STATE

RNNs are called “RECURRENT” because they perform the same task for every element of a sequence, with the output being depended on the previous computations.

RNNs have a “memory” which captures infor-mation about what has been calculated so far.

The hidden state captures some information about a sequence.

If we use , Vanishing/Exploding gradient prob-lem happens.

For overcome this, we use LSTM/GRU.

𝒉𝒕

𝒚 𝒕

𝒙 𝒕

𝑈 𝑊

𝑉

Page 4: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 4

Recurrent Neural Networks : LSTM

Let’s think about the machine, which guesses the dinner menu from things in shopping bag.

Umm,,Car-

bonara!

Page 5: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 5

Recurrent Neural Networks : LSTM

𝑪𝒕Cell state,Internal memory unit,Like a conveyor belt!

𝒉𝒕𝒙 𝒕

Pancake,Oil spaghetti, Fried food

Page 6: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 6

Recurrent Neural Networks : LSTM

𝑪𝒕Cell state,Internal memory unit,Like a conveyor belt!

𝒉𝒕

𝒙 𝒕

ForgetSome

Memories!

Pancake,Oil spaghetti, Fried food

Page 7: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 7

Recurrent Neural Networks : LSTM

𝑪𝒕Cell state,Internal memory unit,Like a conveyor belt!

𝒉𝒕

𝒙 𝒕

Pancake,Oil spaghetti, Fried food

ForgetSome

Memories!

LSTM learns (1) How to forget a memory when the and new input is given, (2) Then how to add the new memory with given and

Page 8: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 8

Recurrent Neural Networks : LSTM

𝑪𝒕Cell state,Internal memory unit,Like a conveyor belt!

𝒉𝒕

𝒙 𝒕

Oil /Cream /Tomato Spaghetti

InsertSome

Memories!

LSTM learns (1) How to forget a memory when the and new input is given, (2) Then how to add the new memory with given and

Page 9: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 9

Recurrent Neural Networks : LSTM

𝑪𝒕Cell state,Internal memory unit,Like a conveyor belt!

𝒉𝒕

𝒙 𝒕

Oil /Cream / Tomato Spaghetti, Carbonara

LSTM learns (1) How to forget a memory when the and new input is given, (2) Then how to add the new memory with given and

Page 10: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 10

Recurrent Neural Networks : LSTM

𝑪𝒕Cell state,Internal memory unit,Like a conveyor belt!

𝒉𝒕

𝒙 𝒕

Oil /Cream Spaghetti, Carbonara

LSTM learns (1) How to forget a memory when the and new input is given, (2) Then how to add the new memory with given and

Page 11: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 11

Recurrent Neural Networks : LSTM

𝑪𝒕Cell state,Internal memory unit,Like a conveyor belt!

𝒉𝒕

𝒚 𝒕

Carbonara

𝒙 𝒕

LSTM learns (1) How to forget a memory when the and new input is given, (2) Then how to add the new memory with given and

Page 12: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 12

Recurrent Neural Networks : LSTM

LSTM learns (1) How to forget a memory when the and new input is given, (2) Then how to add the new memory with given and

Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 13: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 13

Recurrent Neural Networks : LSTM

LSTM learns (1) How to forget a memory when the and new input is given, (2) Then how to add the new memory with given and

Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 14: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 14

Recurrent Neural Networks : LSTM

LSTM learns (1) How to forget a memory when the and new input is given, (2) Then how to add the new memory with given and

Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 15: Introduction For seq2seq(sequence to sequence) and RNN

CPSLAB (EECS) 15

Recurrent Neural Networks : GRU

05/02/2023

Maybe we can simplify this structure, efficiently!

GRU

Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 16: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 16

Sequence to Sequence Model: What is it?

h𝑒(1) h𝑒(2) h𝑒(3) h𝑒(4 ) h𝑒(5)LSTM/GRU

Encoder

LSTM/GRUDecoder

h𝑑(1) h𝑑(𝑇 𝑒)

Western FoodTo

Korean FoodTransition

Page 17: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 17

Sequence to Sequence Model: Implementation

The simplest way to implement sequence to sequence model is

to just pass the last hidden state of decoder to the first GRU cell of encoder!

However, this method’s power gets weaker when the encoder need to gener-ate longer sequence.

Page 18: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 18

Sequence to Sequence Model: Attention Decoder

Bidirectional GRU Encoder

Attention GRU Decoder

𝑐𝑡

For each GRU cell consisting the decoder, let’s differently pass the encoder’s information!

h 𝑖=[ h⃗𝑖h́𝑖 ] 𝑐 𝑖=∑𝑗=1

𝑇 𝑥

𝛼𝑖𝑗h 𝑗

Page 20: Introduction For seq2seq(sequence to sequence) and RNN

05/02/2023 CPSLAB (EECS) 20