deconstructing data sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf ·...

67
Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural networks (2) Mar 21, 2017

Upload: hanga

Post on 26-Mar-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Deconstructing Data ScienceDavid Bamman, UC Berkeley

Info 290

Lecture 17: Neural networks (2)

Mar 21, 2017

Page 2: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

The perceptron, again

x1 β1

yx2

x3

β2

β3

x β

1 -0.5

1 -1.7

0 0.3

not

bad

movie

yi =�

1 if�F

i xiβi � 0�1 otherwise

Page 3: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

x1

h1

x2

x3

h2

y

Input Output“Hidden” Layer

W V

V1

V2

W1,1

W1,2

W2,1

W2,2

W3,1

W3,2

Page 4: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

x1

h1

x2

x3

h2

y

W V

x

1

1

0

not

bad

movie

W

-0.5 1.3

0.4 0.08

1.7 3.1

V

4.1

-0.9

y

-1

Page 5: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

x1

h1

x2

x3

h2

y

W V

hj = f� F�

i=1xiWi,j

�the hidden nodes are

completely determined by the input and weights

Page 6: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

W V

h2 = σ� F�

i=1xiWi,2

h1 = σ� F�

i=1xiWi,1

y = V1h1 + V2h2

x1

h1

x2

x3

h2

y

Page 7: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

W V

y = V1

�σ

� F�

i=1xiWi,1

��

� �� �h1

+V2

�σ

� F�

i=1xiWi,2

��

� �� �h2

we can express y as a function only of the input x and the weights W and V

x1

h1

x2

x3

h2

y

Page 8: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

y = V1

�σ

� F�

i=1xiWi,1

��

� �� �h1

+V2

�σ

� F�

i=1xiWi,2

��

� �� �h2

This is hairy, but differentiable

Backpropagation: Given training samples of <x,y> pairs, we can use stochastic gradient descent to find the values of W and V that minimize the loss.

Page 9: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

x1

h1

x2

x3

h2

y

Neural network structures

Output one real value

1

Page 10: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

x1

h1

x2

x3

h2

y

Neural network structures

Multiclass: output 3 values, only one = 1 in training data

y

y

1

0

0

Page 11: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

x1

h1

x2

x3

h2

y

Neural network structures

output 3 values, several = 1 in training data

y

y

1

1

0

Page 12: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Autoencoder

x1

h1

x2

x3

h2

x1

x2

x3

• Unsupervised neural network, where y = x • Learns a low-dimensional representation of x

Page 13: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Word embeddings• Learning low-dimensional representations of words

by framing a predicting task: using context to predict words in a surrounding window

the black cat jumped on the table

Page 14: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Dimensionality reduction… …the 1a 0an 0for 0in 0on 0

dog 0cat 0… …

4.1

-0.9

the

the is a point in V-dimensional space the is a point in 2-dimensional space

Page 15: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Word embeddings

• Transform this into a supervised prediction problem

x y

the cat

black cat

jumped cat

on cat

the cat

table cat

Page 16: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

x1

h1

x2

x3

h2

y

W V

x

0

1

0

cat

table

globe

W

-0.5 1.3

0.4 0.08

1.7 3.1

V

4.1 0.7 0.1

-0.9 1.3 0.3

y

y

y

1

0

0

cat

table

globe

cat

table

globe

Page 17: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

x1

h1

x2

x3

h2

y

W V

W

-0.5 1.3

0.4 0.08

1.7 3.1

V

4.1 0.7 0.1

-0.9 1.3 0.3

y

y

cat

table

globe

cat

table

globe

Only one of the inputs is nonzero.

= the inputs are really Wtable

Page 18: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Word embeddings

• Can you predict the output word from a vector representation of the input word?

Page 19: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

• Output: low-dimensional representation of words directly read off from the weight matrices.

V

cat table globe

4.1 0.7 0.1

-0.9 1.3 0.3

Word embeddings

Page 20: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

x

y

cat puppy

dog

wrench

screwdriver

Page 21: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

21

the black cat jumped on the table

the black dog jumped on the table

the black puppy jumped on the table

the black skunk jumped on the table

the black shoe jumped on the table

• Why this behavior? dog, cat show up in similar positions

Page 22: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

• Why this behavior? dog, cat show up in similar positions

the black [0.4, 0.08] jumped on the table

the black [0.4, 0.07] jumped on the table

the black puppy jumped on the table

the black skunk jumped on the table

the black shoe jumped on the table

To make the same predictions, these numbers need to be close to each other.

Page 23: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Dimensionality reduction… …the 1a 0an 0for 0in 0on 0

dog 0cat 0… …

4.1

-0.9

the

the is a point in V-dimensional space; representations for all words are completely independent

the is a point in 2-dimensional space representations are now structured

Page 24: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Euclidean distance

����F�

i=1(xi � yi)

2

Page 25: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

cos(x, y) =

�Fi=1 xiyi��F

i=1 x2i��F

i=1 y2i

x1 x2 x3

1 1 1

1 1 1

1 1 0

1 0 0

1 0 1

0 1 1

0 1 1

1 1 1

Cosine Similarity

• Euclidean distance measures the magnitude of distance between two points

• Cosine similarity measures their orientation

Page 26: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

euclidean

x

y

cat puppy

dog

wrench

screwdriver

Page 27: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

x

y

cat puppy

dog

wrench

screwdriver

cosine

Page 28: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Analogical inference

• Mikolov et al. 2013 show that vector representations have some potential for analogical reasoning through vector arithmetic.

king - man + woman ≈ queenapple - apples ≈ car - cars

Mikolov et al., (2013), “Linguistic Regularities in Continuous Space Word Representations” (NAACL)

Page 29: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Bolukbasi et al. (2016)

• Word vectors are trained on real data (web page, news, etc.) and reflect the inherent biases in how language is used

29

Page 30: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Code

http://mybinder.org/repo/dbamman/dds

30

code/vector_similarity

Page 31: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Assumptions

31

x1

h1

x2

x3

h2

y

y

y

cat

table

globe

cat

table

globe

Page 32: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Lexical Variation

• People use different words in different regions.

• Lexical variation in social media Eisenstein et al. 2010; O’Connor et al. 2010; Eisenstein et al. 2011; Hong et al. 2012; Doyle 2014

• Text-based geolocation Wing and Baldridge 2011; Roller et al. 2012; Ikawa et al. 2012

Page 33: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

“yinz”

Normalized document frequencies from 93M geotagged tweets (1.1B words).

Page 34: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

“bears”

Normalized document frequencies from 93M geotagged tweets (1.1B words).

Page 35: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

“bears” (IL)

• who's all watching the bears game ? :)

• watching bears game . no need to watch my lions . saw the score at the bottom of the screen . stafford and johnson are taking care of things .

• @USERNAME packers fans would be screaming that at bears fans if it had happened to chicago , all while laughing . schadenfreude .

Page 36: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

• troopers tracking brown bears on k beach 6/22/13 troopers ask that local_residents do not call law_enforcement ... @URL

• sci-tech : webcams make alaska bears accessible .

• angel rocks trail open ; dead calf moose might have attracted bears : fairbanks — state parks rangers on thursday ... @URL

“bears” (AK)

Page 37: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

ProblemHow can we learn lexical representations that are sensitive to geographical variation not simply in word frequency, but in meaning?

• Vector-space models of lexical semantics Lin 1998; Turney and Pantel 2010, Reisinger and Mooney 2010, Socher et al. 2013, Mikolov et al. 2013, inter alia

• Low-dimensional “embeddings” (w ∈ ℝK) Bengio et al. 2006, Collobert and Weston 2008, Mnih and Hinton 2008, Turian et al. 2010, Socher et al. 2011ff., Collobert et al. 2011, Mikolov et al. 2013; Levy and Goldberg 2014.

Page 38: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

“bears”• who's all watching the bears game ? :)

• watching bears game . no need to watch my lions . saw the score at the bottom of the screen . stafford and johnson are taking care of things .

• troopers tracking brown bears on k beach troopers ask that local_residents do not call law_enforcement ... @URL

• sci-tech : webcams make alaska bears accessible .

Page 39: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

“bears”• who's all watching the bears game ? :)

• watching bears game . no need to watch my lions . saw the score at the bottom of the screen . stafford and johnson are taking care of things .

• troopers tracking brown bears on k beach 6/22/13 troopers ask that local_residents do not call law_enforcement ... @URL

• sci-tech : webcams make alaska bears accessible .

IL

IL

AK

AK

Page 40: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Context + Metadata

my boy’s wicked smart MA

x = {wicked, wicked+MA}

y = smart

Page 41: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

41

x1

h1x2

x3 h2

y

y

y

cat

table

globe

cat

table

globex1

x2

x3

CA+cat

CA+table

CA+globe

Page 42: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Modelbears 3.76 10.4 -1.3red 0.3 4.10 13.3the 0.1 3.3 -1.2

zoo -10.3 -13.1 1.4|=|

K

Word

bears -0.30 -3.1 1.04red 4.5 0.3 -1.3the 1.3 -1.2 0.1zoo 5.2 7.2 1.5

|=|Word+Alabama

bears 5.6 8.3 -0.8red 3.1 0.14 6.8the -0.1 -0.7 1.4zoo 6.7 2.1 -3.7

|=|Word+Alaska

Page 43: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

x

y

z

bears

football

NFL

moose

elk

bears in AK

bears in IL

Page 44: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

“let’s go into the city”

Page 45: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

x

y

z

city in MA

Boston

city in WA

Seattle

Page 46: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

city

Terms with the highest cosine similarity to city in California

• valley • bay • downtown • chinatown • south bay • area • east bay • neighborhood • peninsula

Page 47: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

city

Terms with the highest cosine similarity to city in New York.

• suburbs • town • hamptons • big city • borough • neighborhood • downtown • upstate • big apple

Page 48: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

wicked

Terms with the highest cosine similarity to wicked in Kansas

• evil • pure • gods • mystery • spirit • king • above • righteous • magic

Page 49: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

wicked

Terms with the highest cosine similarity to wicked in Massachusetts

• super • ridiculously • insanely • extremely • goddamn • surprisingly • kinda • #sarcasm • sooooooo

Page 50: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Deeper networks

x1

h1

x2

x3

h2 y

W1 V

x3

h2

h2

h2

W2

Page 51: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

http://neuralnetworksanddeeplearning.com/chap1.html

Page 52: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Higher order features learned for image recognition Lee et al. 2009 (ICML)

Page 53: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Autoencoder

x1

h1

x2

x3

h2

x1

x2

x3

• Unsupervised neural network, where y = x • Learns a low-dimensional representation of x

Page 54: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Feedforward networksx1

h1

x2

x3

h2

y

Page 55: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Densely connected layerx1

x2

x3

x4

x5

x6

x7

h1

h2

h2

W

W

x

h

h = �(xW )

Page 56: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Convolutional networks

• With convolution networks, the same operation is (i.e., the same set of parameters) is applied to different regions of the input

Page 57: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

h1

h2

h3

Convolutional networksx1

x2

x3

x4

x5

x6

x7

W

x

h

h1 = �(x1W1 + x2W2 + x3W3)

h2 = �(x3W1 + x4W2 + x5W3)

h3 = �(x5W1 + x6W2 + x7W3)

Page 58: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

2D Convolution

0 0 0 0 00 1 1 1 00 1 1 1 00 1 1 1 00 0 0 0 0

http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/ https://docs.gimp.org/en/plug-in-convmatrix.html

blurring

Page 59: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Pooling7

3

1

9

2

1

0

5

3

• Down-samples a layer by selecting a single point from some set

• Max-pooling selects the largest value

7

9

5

Page 60: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Feedforward networksx1

h1

x2

x3

h2

y

Page 61: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Recurrent networks

x

h

y

input

label

hidden layer

Page 62: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Recurrent networksx1

h1

x2 x3

h2

y

x1

h1

x2 x3

h2

y

W

V

time step 1 time step 2

Page 63: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Recurrent networksx1

h1

x2 x3

h2

y

W

V

time step 2

U

h1 = σ

�����

input� �� �F�

i=1Wi,1xi +

hidden layer� �� �H�

j=iUj,1ht�1

j

�����

h1

h2

Page 64: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Recurrent networks

x1

h1

x2 x3

h2

y

x1

h1

x2 x3

h2

y

x1

h1

x2 x3

h2

y

x1

h1

x2 x3

h2

y

x1

h1

x2 x3

h2

y

RNNs often have a problem with long-distance dependencies.

tim cook apple

ORG

loves his

Page 65: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

LSTMs

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 66: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Recurrent networks/LSTMs

task x y

language models words in sequence the next word in a sequence

part of speech tagging words in sequence part of speech

machine translation words in sequence translation

Page 67: Deconstructing Data Sciencecourses.ischool.berkeley.edu/i290-dds/s17/slides/17_neural_nets_2.pdf · Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Neural

Midterm report, due Friday• 4 pages, citing 10 relevant sources

• Be sure to consider feedback!

• Data collection should be completed

• You should specify a validation strategy to be performed at the end

• Present initial experimental results