introduction for seq2seq(sequence to sequence) and rnn

Introduction toSequence to Sequence Model

2017.03.16 SeminarPresenter : Hyemin Ahn

05/02/2023 CPSLAB (EECS) 2

Recurrent Neural Networks : For what?

Human remembers and uses the pattern of sequence.• Try ‘a b c d e f g…’

• But how about ‘z y x w v u t s…’ ?

The idea behind RNN is to make use of sequential in-

formation.

Let’s learn a pattern of a sequence, and utilize (estimate,

generate, etc…) it!

But HOW?

05/02/2023 CPSLAB (EECS) 3

Recurrent Neural Networks : Typical RNNs

OUTPUT

INPUT

ONE STEP DE-

LAY

HIDDEN STATE

RNNs are called “RECURRENT” because they perform the same task for every element of a sequence, with the output being depended on the previous computations.

RNNs have a “memory” which captures infor-mation about what has been calculated so far.

The hidden state captures some information about a sequence.

If we use , Vanishing/Exploding gradient prob-lem happens.

For overcome this, we use LSTM/GRU.

𝒉𝒕

𝒚 𝒕

𝒙 𝒕

𝑈 𝑊

𝑉

05/02/2023 CPSLAB (EECS) 4

Recurrent Neural Networks : LSTM

Let’s think about the machine, which guesses the dinner menu from things in shopping bag.

Umm,,Car-

bonara!

05/02/2023 CPSLAB (EECS) 5


𝑪𝒕Cell state,Internal memory unit,Like a conveyor belt!

𝒉𝒕𝒙 𝒕

Pancake,Oil spaghetti, Fried food

05/02/2023 CPSLAB (EECS) 6



𝒉𝒕

𝒙 𝒕

ForgetSome

Memories!


05/02/2023 CPSLAB (EECS) 7



𝒉𝒕

𝒙 𝒕


ForgetSome

Memories!

LSTM learns (1) How to forget a memory when the and new input is given, (2) Then how to add the new memory with given and

05/02/2023 CPSLAB (EECS) 8



𝒉𝒕

𝒙 𝒕

Oil /Cream /Tomato Spaghetti

InsertSome

Memories!


05/02/2023 CPSLAB (EECS) 9



𝒉𝒕

𝒙 𝒕

Oil /Cream / Tomato Spaghetti, Carbonara


05/02/2023 CPSLAB (EECS) 10



𝒉𝒕

𝒙 𝒕

Oil /Cream Spaghetti, Carbonara


05/02/2023 CPSLAB (EECS) 11



𝒉𝒕

𝒚 𝒕

Carbonara

𝒙 𝒕


05/02/2023 CPSLAB (EECS) 12



Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/

05/02/2023 CPSLAB (EECS) 13




05/02/2023 CPSLAB (EECS) 14




CPSLAB (EECS) 15

Recurrent Neural Networks : GRU

05/02/2023

Maybe we can simplify this structure, efficiently!

GRU


05/02/2023 CPSLAB (EECS) 16

Sequence to Sequence Model: What is it?

h𝑒(1) h𝑒(2) h𝑒(3) h𝑒(4 ) h𝑒(5)LSTM/GRU

Encoder

LSTM/GRUDecoder

h𝑑(1) h𝑑(𝑇 𝑒)

Western FoodTo

Korean FoodTransition

05/02/2023 CPSLAB (EECS) 17

Sequence to Sequence Model: Implementation

The simplest way to implement sequence to sequence model is

to just pass the last hidden state of decoder to the first GRU cell of encoder!

However, this method’s power gets weaker when the encoder need to gener-ate longer sequence.

05/02/2023 CPSLAB (EECS) 18

Sequence to Sequence Model: Attention Decoder

Bidirectional GRU Encoder

Attention GRU Decoder

𝑐𝑡

For each GRU cell consisting the decoder, let’s differently pass the encoder’s information!

h 𝑖=[ h⃗𝑖h́𝑖 ] 𝑐 𝑖=∑𝑗=1

𝑇 𝑥

𝛼𝑖𝑗h 𝑗

05/02/2023 CPSLAB (EECS) 19

Sequence to Sequence Model: Example codes

Codes Here @ Github

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py

05/02/2023 CPSLAB (EECS) 20

introduction for seq2seq(sequence to sequence) and rnn

Engineering