[pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
Post on 23-Jan-2018
9.860 Views
Preview:
TRANSCRIPT
오늘 당장 딥러닝 실험하기
2015. 08. 30.
김현호
1
소개
김현호- UST 컴퓨터전공- 한국전자통신연구원자동통역연구실
- Team Popong mobile담당
- 인공지능, 기계학습, 자연어처리
- stray.leone@gmail.com
2
순서
1.Neural Network 이해2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processinga.Gensim library
b.automatic word spacing by Recurrent Neural Network
3
순서
1.Neural Network 이해2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processinga.Gensim library
b.automatic word spacing by Recurrent Neural Network
4
순서
1.Neural Network 이해2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processinga.Gensim library
b.automatic word spacing by Recurrent Neural Network
5
순서
1.Neural Network 이해2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processinga.Gensim library
b.automatic word spacing by Recurrent Neural Network
6
순서
1.Neural Network 이해2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processinga.Gensim library
b.automatic word spacing by RNN
7
요즘딥러닝에대한관심
8
다수의딥러닝강연
9
10
11
12
13
14
Artificial Neural Network
15
Artificial Neural Network
인간신경망의구성요소인뉴런의동작방식이모티브가된기계학습시스템.
16
실제뉴런 vs 인공뉴런
17
실제뉴런 vs 인공뉴런
18
실제뉴런 vs 인공뉴런
19
실제뉴런 vs 인공뉴런
20
실제뉴런 vs 인공뉴런
신호전달방향
21
실제뉴런 vs 인공뉴런
신호전달방향
22
실제뉴런 vs 인공뉴런
신호전달방향
Weigh
t
23
Artificial Neural Network
24
Artificial Neural Network
25
Artificial Neural Network
26
Artificial Neural Network Learning
27
Artificial Neural Network Learning
28
Weight Weight Weight
Forward Propagation
29
Backward Propagation
30
Deep Neural Network
31
32
Deep Neural Network란….
Deep Neural Network란….
3층이상의 hidden layer를가진Artificial Neural Network
33
기존 Deep Learning의어려움
34
기존 Deep Learning의어려움
35
deeper than two or three level networks yieled
poorer results
Deep Learning이어려운이유
- Overfitting- Deep nets have lots of parameters
- Underfitting- Gradient descent Vanishing
36
Deep Learning의비약적발전
- Pretraining
- Drop Out
- Rectified Linear Unit
37
Pretraining 성능
38
“Why Does Unsupervised Pre-training Help Deep Learning?” 2010 bengio,
- pretraining initialization은random initialize보다better local minimum 에서시작한다.
39
Pretraining 성능
Without Pretraining With Pretraining
Pretraining방법1)Contrastive Divergence
a) http://www.quora.com/What-is-contrastive-divergence
b) https://www.youtube.com/watch?v=p4Vh_zMw-HQ&index=36&list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH
2)AutoEncoder
40
Drop out
41
Rectified Linear Unit (ReLU)
42
Activation function
Rectified Linear Unit (ReLU)
43
Activation function
Sigmoid function Rectified Linear Unit
Rectified Linear Unit (ReLU)
44
Epoch sigmoid ReLU
1 0.7053 0.94332 0.8302 0.9647
3 0.8684 0.9723
3 0.8837 0.9737
4 0.89 0.9763
5 0.895 0.9792
... .... ...
... ... ...
11 0.9116 0.9829
12 0.9127 0.9838
13 0.9142 0.9821
14 0.9152 0.9838
15 0.9159 0.9832
Rectified Linear Unit (ReLU)실험결과
실험조건- code :
https://github.com/Newmu/The
ano-Tutorials
- data : mnist
45
Data Sets
46
MNIST
47
Cifar-10
48
Data Sets
- MNIST- The MNIST database of handwritten digits
- 28x28 grayscale images
- 10 classes
- Cifar10- The CIFAR-10 dataset consists of 60000 32x32
colour images in 10 classes, with 6000 images per
class.
- word2vec
49
Deep Learning 실험
50
Deep Learning 실험시작
51
Theano 어원
- 여성수학자- 피타고라스의아내
52
Deep Learning Library 비교
53
출처 : http://t-robotics.blogspot.kr/2015/06/hw-sw.html#.Vd59KPntlBe
Theano
- Q) DNN을자동으로만들어주나요??
- A) 아니요, Deep Neural Network를
직접만들어야함…
54
Theano
55
- DNN model learning library
(x)
- matrix 연산등에유용한 library
(o)
Why Theano
- Definition- Theano is a Python library that allows you
to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.(http://deeplearning.net/software/theano/)
- Optimizing GPU-meta-programming code
generating array oriented optimizing math compiler
in Python(https://github.com/josephmisiti/awesome-machine-learning)
56
Why Theano
- cuda code 작성하지않고, python
code로 gpu 연산수행
- grad(), updates, function()
- symbolic function
57
Why Theano - grad(), updates, function()
gradients = T.grad() 하면직접 gradient가계산된다.
ex)
x = T.scalar()
gx = T.grad(x**2, x) ← x**2를 x에대해서gradient 값을구한다. (= 2x)
58
Why Theano - grad(), updates, function()
300 updates = [
301 (param, param - learning_rate * gparam)
302 for param, gparam in zip(classifier.params, gparams)
303 ]
……..
308 train_model = theano.function(
309 inputs=[index],
310 outputs=cost,
311 updates=updates,
312 givens={
313 x: train_set_x[index * batch_size: (index + 1) * batch_size],
314 y: train_set_y[index * batch_size: (index + 1) * batch_size]
315 }
316 )
59
Why Theano - grad(), updates, function()
60
This module provides function(), commonly accessed as
theano.function, the interface for compiling graphs into
callable objects.
You’ve already seen example usage in the basic tutorial...
something like this:
>>> x = theano.tensor.dscalar()
>>> f = theano.function([x], 2*x)
>>> print f(4) # prints 8.0
http://deeplearning.net/software/theano/library/compile/function.html
Why Theano - grad(), updates, function()
61
This module provides function(), commonly accessed as
theano.function, the interface for compiling graphs into
callable objects.
You’ve already seen example usage in the basic tutorial...
something like this:
>>> x = theano.tensor.dscalar()
>>> f = theano.function([x], 2*x)
>>> print f(4) # prints 8.0
http://deeplearning.net/software/theano/library/compile/function.html
inputoutput
Why Theano - grad(), updates, function()
62
This module provides function(), commonly accessed as
theano.function, the interface for compiling graphs into
callable objects.
You’ve already seen example usage in the basic tutorial...
something like this:
>>> x = theano.tensor.dscalar()
>>> f = theano.function([x], 2*x)
>>> print f(4) # prints 8.0
http://deeplearning.net/software/theano/library/compile/function.html
inputoutput
Why Theano - grad(), updates, function()
x = dmatrix('x')y = dmatrix('y')z = x + yf = theano.function([x,y], z) scalarscalar
scalar
Why Theano - grad(), updates, function()
x = dmatrix('x')y = dmatrix('y')z = x + yf = theano.function([x,y], z)
Theano represents symbolic mathematical computationsas graphs
scalarscalar
scalar
Why Theano - grad(), updates, function()
x = theano.tensor.dscalar('x')y = theano.tensor.dscalar('y')z = x + yf = theano.function([x,y], z)print f(4,3)array(7.0)
scalarscalar
scalar
Install Theano
- Environment : ubuntu 14.04 64bit
- Install document : http://deeplearning.net/software/theano/install_ubun
tu.html#install-ubuntu
66
$ sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
$ sudo pip install Theano
Download Tutorial code
$ git clone https://github.com/lisa-lab/DeepLearningTutorials.git
Cloning into 'DeepLearningTutorials'...
remote: Counting objects: 3652, done.
remote: Total 3652 (delta 0), reused 0 (delta 0), pack-reused 3652
Receiving objects: 100% (3652/3652), 7.79 MiB | 2.32 MiB/s, done.
Resolving deltas: 100% (2161/2161), done.
Checking connectivity... done.
$ ls
DeepLearningTutorials
67
Run DBN
DeepLearningTutorials$ cd code
DeepLearningTutorials/code$ python DBN.py
Using gpu device 0: GeForce GTX 770
Downloading data from
http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz
... loading data
... building the model
... getting the pretraining functions
... pre-training the model
Pre-training layer 0, epoch 0, cost -98.5296
Pre-training layer 0, epoch 1, cost -83.842
Pre-training layer 0, epoch 2, cost -80.688
Pre-training layer 0, epoch 3, cost -79.0362
Pre-training layer 0, epoch 4, cost -77.9295
68
DBN.py
13 from logistic_sgd import LogisticRegression, load_data
303 datasets = load_data(dataset)
304
305 train_set_x, train_set_y = datasets[0]
306 valid_set_x, valid_set_y = datasets[1]
307 test_set_x, test_set_y = datasets[2]
69
DBN.py
18 # start-snippet-1
19 class DBN(object):
…….
314 print '... building the model'
315 # construct the Deep Belief Network
316 dbn = DBN(numpy_rng=numpy_rng, n_ins=28 * 28,
317 hidden_layers_sizes=[1000, 1000, 1000],
318 n_outs=10)
70
28 * 28
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
DBN.py
71
325 pretraining_fns = dbn.pretraining_functions(train_set_x=train_set_x,326 batch_size=batch_size,
327 k=k)
……………
353 print '... getting the finetuning functions'
354 train_fn, validate_model, test_model = dbn.build_finetune_functions(355 datasets=datasets,
356 batch_size=batch_size,
357 learning_rate=finetune_lr
358 )
DBN.py
72
228 train_fn = theano.function(
229 inputs=[index],
230 outputs=self.finetune_cost,
231 updates=updates,
232 givens={
233 self.x: train_set_x[
234 index * batch_size: (index + 1) * batch_size
235 ],
236 self.y: train_set_y[
237 index * batch_size: (index + 1) * batch_size
238 ]
239 }
240 )
DBN.py
73
380 while (epoch < training_epochs) and (not done_looping):
381 epoch = epoch + 1
382 for minibatch_index in xrange(n_train_batches):
383
384 minibatch_avg_cost = train_fn(minibatch_index)
385 iter = (epoch - 1) * n_train_batches + minibatch_index
386
387 if (iter + 1) % validation_frequency == 0:
388
389 validation_losses = validate_model()
390 this_validation_loss = numpy.mean(validation_losses)
DNN using ReLU
import theano
from theano import tensor as T
from theano.sandbox.rng_mrg import
MRG_RandomStreams as RandomStreams
import numpy as np
from load import mnist
74
DNN using ReLU
def floatX(X):
return np.asarray(X, dtype=theano.config.floatX)
def init_weights(shape):
return theano.shared(floatX(np.random.randn(*shape) * 0.01))
def rectify(X):
return T.maximum(X, 0.)
def softmax(X):
e_x = T.exp(X - X.max(axis=1).dimshuffle(0, 'x'))
return e_x / e_x.sum(axis=1).dimshuffle(0, 'x')
75
DNN using ReLU
def model(X, w_h, w_h2, w_o):
h = rectify(T.dot(X, w_h))
h2 = rectify(T.dot(h, w_h2))
py_x = softmax(T.dot(h2, w_o))
return h, h2, py_x
def prop(cost, params, lr=0.001):
grads = T.grad(cost=cost, wrt=params)
updates = []
for p, g in zip(params, grads):
updates.append((p, p - lr * g))
return updates
76
trX, teX, trY, teY = mnist(onehot=True)
X = T.fmatrix()
Y = T.fmatrix()
w_h = init_weights((784, 625))
w_h2 = init_weights((625, 625))
w_o = init_weights((625, 10))
77
trX, teX, trY, teY = mnist(onehot=True)
X = T.fmatrix()
Y = T.fmatrix()
w_h = init_weights((784, 625))
w_h2 = init_weights((625, 625))
w_o = init_weights((625, 10))
h, h2, py_x = model(X, w_h, w_h2, w_o)
y_x = T.argmax(py_x, axis=1)
78
trX, teX, trY, teY = mnist(onehot=True)
X = T.fmatrix()
Y = T.fmatrix()
w_h = init_weights((784, 625))
w_h2 = init_weights((625, 625))
w_o = init_weights((625, 10))
h, h2, py_x = model(X, w_h, w_h2, w_o)
y_x = T.argmax(py_x, axis=1)
cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))
params = [w_h, w_h2, w_o]
updates = prop(cost, params, lr=0.001)
79
trX, teX, trY, teY = mnist(onehot=True)
X = T.fmatrix()
Y = T.fmatrix()
w_h = init_weights((784, 625))
w_h2 = init_weights((625, 625))
w_o = init_weights((625, 10))
h, h2, py_x = model(X, w_h, w_h2, w_o)
y_x = T.argmax(py_x, axis=1)
cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))
params = [w_h, w_h2, w_o]
updates = prop(cost, params, lr=0.001)
train = theano.function(inputs=[X, Y], outputs=cost, updates=updates)
predict = theano.function(inputs=[X], outputs=y_x)
80
for i in range(100):
for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)):
cost = train(trX[start:end], trY[start:end])
print np.mean(np.argmax(teY, axis=1) == predict(teX))
81
play with data
82
load_data()
172 def load_data(dataset):
…...
193 if os.path.isfile(new_path) or data_file == 'mnist.pkl.gz':
194 dataset = new_path
…...
204 print '... loading data'
205
206 # Load the dataset
207 f = gzip.open(dataset, 'rb')
208 train_set, valid_set, test_set = cPickle.load(f)
209 f.close()
83
data 만들기1
train_set.x.txt
84
input vector length
input
vector
size
data 만들기1
train_set.y.txt
85
input
vector
size
data 만들기1
86
from numpy import genfromtxt
import gzip, cPickle
…………….
train_set_x = genfromtxt(dir_path+"train_set.x.txt", delimiter=",")
…………………..
train_set = train_set_x, train_set_x
valid_set = valid_set_x, valid_set_x
test_set = test_set_x, test_set_x
print "writing to pkl.gz..."
data_set = [train_set, valid_set, test_set]
print "zip data into a file"
f= gzip.open(output_dir+str(i)+"_"+pkl_filename+".pkl.gz",'wb')
print "zip data file name is " + str(i)+"_"+pkl_filename+".pkl.gz"
cPickle.dump(data_set,f,protocol=2)
f.close()
for n, sentence in enumerate(file_lines):
……………………..
data_batch_fpath= vector_dir+"data_batch_"+str(n)+".npz"
……………………….
# save vector list
numpy.savez(data_batch_fpath,
data=numpy.asarray(sentence_vector_list),
labels=label_vector,
length=max_length,
dim=dimension)
87
data 만들기2
save, load model
88
save model
load model
Theano modes
89
Theano modes
90
.bashrc
226 # Theano Settings
227 export THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32,exception_verbosity=high
91
Deep Learning
For Natural Language Processing
92
Deep Learning
For Natural Language Processing
93
Deep Learning
For Natural Language Processing
- 나는밥을먹는다
Deep Learning
For Natural Language Processing
94
one-hot (1 of K)
representation
Deep Learning
For Natural Language Processing
- 나는밥을먹는다
- 나는밥을먹는다.
95
형태소단위로분리
one-hot (1 of K)
representation
Deep Learning
For Natural Language Processing
- 나는밥을먹는다
- 나는밥을먹는다
- 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0]
96
index 0(나) 1(가) 2(는) ... ... ... ... 999(.)
나 1 0 0 0 0 0 0 0
는 0 0 1 0 0 0 0 0
.. 0 0 0 0 0 1 0 0
.. 0 0 0 0 1 0 0 0
다 0 0 0 0 0 0 1 0
형태소단위로분리
one-hot (1 of K)
representation
문자의벡터로표현
Deep Learning
For Natural Language Processing
- 나는밥을먹는다
- 나는밥을먹는다
97
형태소단위로분리
word2vec
representation
문자의벡터로표현
Deep Learning
For Natural Language Processing
- 나는밥을먹는다
- 나는밥을먹는다
- Word2Vec model
- 밥 = [0.323112, -0.021232, …….. , 0.82123123]
98
형태소단위로분리
word2vec
representation
문자의벡터로표현
Deep Learning
For Natural Language Processing
- 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0]
- 밥 = [0.323112, -0.021232, …….. , 0.82123123]
99
word2vec
representation
one-hot (1 of K)
representation
Gensim
- definition- Gensim is a Python library for topic modelling, document indexing
and similarity retrieval with large corpora
- word2vec class- word vector representation
- multi threading
- Skip Gram
- Continuous Bag of Words
100
Gensim - import, settings
101
# imports
9 from gensim.models.word2vec import LineSentence
10 from gensim.models import word2vec
32 # settings
33 THEADS = 8 # progress with multi threading
34 DIMENSION = 50
35 SKIPGRAM = 1 # 1 is skip gram, 0 is cbow
36 WINDOW_SIZE = 8
37 NTimes = 10 # repeat number of sentences
38 min_count_of_word = 5
………………..
65 from gensim import utils
Gensim - training, save model
102
97 # load raw sentence
98 sentences = LineSentence(input_train_file_path)
99 # model settings
100 model = word2vec.Word2Vec(size=dimension, workers=THEADS,
min_count=min_count_of_word, sg=SKIPGRAM, window=WINDOW_SIZE)
101
102 # build voca and train
103 number_iter = NTimes # number of iterations (epochs) over the corpus
104 model.build_vocab(sentences)
105
106 ss = utils.RepeatCorpusNTimes(sentences, number_iter)
107 model.train(ss)
108 # save model
109 model.save(model_file_name)
110 model.save_word2vec_format(model_file_name + '.bin', binary=True)
Gensim - load model, test
103
83 try:
84 model = utils.SaveLoad.load(fname=model_file_name)
85 except:
86 print "failed to load. Retrying by load_word2vec_format() !!"
87 model =word2vec.load_word2vec_format(fname=model_file_name+".bin")
297 x = model [w.decode('utf-8')]
314 mw, score = model.most_similar(positive=[x])[0]
315 print "most similar : ",mw
316 print "target vector :", x
‘서울’의 most similar words
104
most similar words similarity
대구 0.4282917082309723
광주 0.4046330451965332
부산 0.40132588148117065
울산 0.3863871693611145
수원 0.38555505871772766
청주 0.35919708013534546
안양 0.35622960329055786
주왕산 0.3543151617050171
평택 0.3505415618419647
cebu 0.34598737955093384
Auto word spacing
with Recurrent Neural Network
105
- 0 0 1 0 1 0 0
- 나는밥을먹는다
- [0.323112, -0.021232, …….. , 0.82123123]
Deep Learning 실험하면서어려웠던것들
- layer의개수, layer당 node의개수, learning
rate, epoch횟수, batch횟수, activation
function 선택등선택해야할 parameter들이많다.
- parameter 바꿔서실험결과를확인하는데에오래걸린다.
- big data이기때문에 gpu memory 문제
106
Thank you
107
Setting GPU
building lmdb for caffe
Softmax
functionBias
Negative
Log Likelihood
http://goo.gl/forms/IR45liXoQ3
top related