[pycon 2015] 오늘 당장 딥러닝 실험하기 제출용

오늘 당장 딥러닝 실험하기

2015. 08. 30.

김현호

소개

김현호- UST 컴퓨터전공- 한국전자통신연구원자동통역연구실

- Team Popong mobile담당

- 인공지능, 기계학습, 자연어처리

- stray.leone@gmail.com

순서

1.Neural Network 이해2.Deep Neural Network

a.Pretraining

b.Rectified Linear Unit

c.Drop out

3.Theano library

4.Deep Learning code using Theano

5.Deep Learning for Natural Language

Processinga.Gensim library

b.automatic word spacing by Recurrent Neural Network

순서

a.Pretraining

c.Drop out

3.Theano library

순서

a.Pretraining

c.Drop out

3.Theano library

순서

a.Pretraining

c.Drop out

3.Theano library

순서

a.Pretraining

c.Drop out

3.Theano library

b.automatic word spacing by RNN

요즘딥러닝에대한관심

다수의딥러닝강연

Artificial Neural Network

인간신경망의구성요소인뉴런의동작방식이모티브가된기계학습시스템.

실제뉴런 vs 인공뉴런

신호전달방향

Artificial Neural Network

Artificial Neural Network Learning

Weight Weight Weight

Forward Propagation

Backward Propagation

Deep Neural Network

Deep Neural Network란….

3층이상의 hidden layer를가진Artificial Neural Network

기존 Deep Learning의어려움

deeper than two or three level networks yieled

poorer results

Deep Learning이어려운이유

- Overfitting- Deep nets have lots of parameters

- Underfitting- Gradient descent Vanishing

Deep Learning의비약적발전

- Pretraining

- Drop Out

- Rectified Linear Unit

Pretraining 성능

“Why Does Unsupervised Pre-training Help Deep Learning?” 2010 bengio,

- pretraining initialization은random initialize보다better local minimum 에서시작한다.

Pretraining 성능

Without Pretraining With Pretraining

Pretraining방법1)Contrastive Divergence

a) http://www.quora.com/What-is-contrastive-divergence

b) https://www.youtube.com/watch?v=p4Vh_zMw-HQ&index=36&list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH

2)AutoEncoder

Drop out

Rectified Linear Unit (ReLU)

Activation function

Sigmoid function Rectified Linear Unit

Epoch sigmoid ReLU

1 0.7053 0.94332 0.8302 0.9647

3 0.8684 0.9723

3 0.8837 0.9737

4 0.89 0.9763

5 0.895 0.9792

... .... ...

... ... ...

11 0.9116 0.9829

12 0.9127 0.9838

13 0.9142 0.9821

14 0.9152 0.9838

15 0.9159 0.9832

Rectified Linear Unit (ReLU)실험결과

실험조건- code :

https://github.com/Newmu/The

ano-Tutorials

- data : mnist

Data Sets

Cifar-10

Data Sets

- MNIST- The MNIST database of handwritten digits

- 28x28 grayscale images

- 10 classes

- Cifar10- The CIFAR-10 dataset consists of 60000 32x32

colour images in 10 classes, with 6000 images per

class.

- word2vec

Deep Learning 실험

Deep Learning 실험시작

Theano 어원

- 여성수학자- 피타고라스의아내

Deep Learning Library 비교

출처 : http://t-robotics.blogspot.kr/2015/06/hw-sw.html#.Vd59KPntlBe

Theano

- Q) DNN을자동으로만들어주나요??

- A) 아니요, Deep Neural Network를

직접만들어야함…

Theano

- DNN model learning library

- matrix 연산등에유용한 library

Why Theano

- Definition- Theano is a Python library that allows you

to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.(http://deeplearning.net/software/theano/)

- Optimizing GPU-meta-programming code

generating array oriented optimizing math compiler

in Python(https://github.com/josephmisiti/awesome-machine-learning)

Why Theano

- cuda code 작성하지않고, python

code로 gpu 연산수행

- grad(), updates, function()

- symbolic function

Why Theano - grad(), updates, function()

gradients = T.grad() 하면직접 gradient가계산된다.

x = T.scalar()

gx = T.grad(x**2, x) ← x**2를 x에대해서gradient 값을구한다. (= 2x)

300 updates = [

301 (param, param - learning_rate * gparam)

302 for param, gparam in zip(classifier.params, gparams)

……..

308 train_model = theano.function(

309 inputs=[index],

310 outputs=cost,

311 updates=updates,

312 givens={

313 x: train_set_x[index * batch_size: (index + 1) * batch_size],

314 y: train_set_y[index * batch_size: (index + 1) * batch_size]

This module provides function(), commonly accessed as

theano.function, the interface for compiling graphs into

callable objects.

You’ve already seen example usage in the basic tutorial...

something like this:

>>> x = theano.tensor.dscalar()

>>> f = theano.function([x], 2*x)

>>> print f(4) # prints 8.0

http://deeplearning.net/software/theano/library/compile/function.html

callable objects.

inputoutput

callable objects.

inputoutput

x = dmatrix('x')y = dmatrix('y')z = x + yf = theano.function([x,y], z) scalarscalar

scalar

x = dmatrix('x')y = dmatrix('y')z = x + yf = theano.function([x,y], z)

Theano represents symbolic mathematical computationsas graphs

scalarscalar

scalar

x = theano.tensor.dscalar('x')y = theano.tensor.dscalar('y')z = x + yf = theano.function([x,y], z)print f(4,3)array(7.0)

scalarscalar

scalar

Install Theano

- Environment : ubuntu 14.04 64bit

- Install document : http://deeplearning.net/software/theano/install_ubun

tu.html#install-ubuntu

$ sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git

$ sudo pip install Theano

Download Tutorial code

$ git clone https://github.com/lisa-lab/DeepLearningTutorials.git

Cloning into 'DeepLearningTutorials'...

remote: Counting objects: 3652, done.

remote: Total 3652 (delta 0), reused 0 (delta 0), pack-reused 3652

Receiving objects: 100% (3652/3652), 7.79 MiB | 2.32 MiB/s, done.

Resolving deltas: 100% (2161/2161), done.

Checking connectivity... done.

DeepLearningTutorials

Run DBN

DeepLearningTutorials$ cd code

DeepLearningTutorials/code$ python DBN.py

Using gpu device 0: GeForce GTX 770

Downloading data from

http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz

... loading data

... building the model

... getting the pretraining functions

... pre-training the model

Pre-training layer 0, epoch 0, cost -98.5296

DBN.py

13 from logistic_sgd import LogisticRegression, load_data

303 datasets = load_data(dataset)

305 train_set_x, train_set_y = datasets[0]

306 valid_set_x, valid_set_y = datasets[1]

307 test_set_x, test_set_y = datasets[2]

DBN.py

18 # start-snippet-1

19 class DBN(object):

…….

314 print '... building the model'

315 # construct the Deep Belief Network

316 dbn = DBN(numpy_rng=numpy_rng, n_ins=28 * 28,

317 hidden_layers_sizes=[1000, 1000, 1000],

318 n_outs=10)

28 * 28

DBN.py

325 pretraining_fns = dbn.pretraining_functions(train_set_x=train_set_x,326 batch_size=batch_size,

327 k=k)

……………

353 print '... getting the finetuning functions'

354 train_fn, validate_model, test_model = dbn.build_finetune_functions(355 datasets=datasets,

356 batch_size=batch_size,

357 learning_rate=finetune_lr

DBN.py

228 train_fn = theano.function(

229 inputs=[index],

230 outputs=self.finetune_cost,

231 updates=updates,

232 givens={

233 self.x: train_set_x[

234 index * batch_size: (index + 1) * batch_size

235 ],

236 self.y: train_set_y[

237 index * batch_size: (index + 1) * batch_size

DBN.py

380 while (epoch < training_epochs) and (not done_looping):

381 epoch = epoch + 1

382 for minibatch_index in xrange(n_train_batches):

384 minibatch_avg_cost = train_fn(minibatch_index)

385 iter = (epoch - 1) * n_train_batches + minibatch_index

387 if (iter + 1) % validation_frequency == 0:

389 validation_losses = validate_model()

390 this_validation_loss = numpy.mean(validation_losses)

DNN using ReLU

import theano

from theano import tensor as T

from theano.sandbox.rng_mrg import

MRG_RandomStreams as RandomStreams

import numpy as np

from load import mnist

DNN using ReLU

def floatX(X):

return np.asarray(X, dtype=theano.config.floatX)

def init_weights(shape):

return theano.shared(floatX(np.random.randn(*shape) * 0.01))

def rectify(X):

return T.maximum(X, 0.)

def softmax(X):

e_x = T.exp(X - X.max(axis=1).dimshuffle(0, 'x'))

return e_x / e_x.sum(axis=1).dimshuffle(0, 'x')

DNN using ReLU

def model(X, w_h, w_h2, w_o):

h = rectify(T.dot(X, w_h))

h2 = rectify(T.dot(h, w_h2))

py_x = softmax(T.dot(h2, w_o))

return h, h2, py_x

def prop(cost, params, lr=0.001):

grads = T.grad(cost=cost, wrt=params)

updates = []

for p, g in zip(params, grads):

updates.append((p, p - lr * g))

return updates

trX, teX, trY, teY = mnist(onehot=True)

X = T.fmatrix()

Y = T.fmatrix()

w_h = init_weights((784, 625))

w_h2 = init_weights((625, 625))

w_o = init_weights((625, 10))

X = T.fmatrix()

Y = T.fmatrix()

h, h2, py_x = model(X, w_h, w_h2, w_o)

y_x = T.argmax(py_x, axis=1)

X = T.fmatrix()

Y = T.fmatrix()

cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))

params = [w_h, w_h2, w_o]

updates = prop(cost, params, lr=0.001)

X = T.fmatrix()

Y = T.fmatrix()

cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))

params = [w_h, w_h2, w_o]

updates = prop(cost, params, lr=0.001)

train = theano.function(inputs=[X, Y], outputs=cost, updates=updates)

predict = theano.function(inputs=[X], outputs=y_x)

for i in range(100):

for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)):

cost = train(trX[start:end], trY[start:end])

print np.mean(np.argmax(teY, axis=1) == predict(teX))

play with data

load_data()

172 def load_data(dataset):

…...

193 if os.path.isfile(new_path) or data_file == 'mnist.pkl.gz':

194 dataset = new_path

…...

204 print '... loading data'

206 # Load the dataset

207 f = gzip.open(dataset, 'rb')

208 train_set, valid_set, test_set = cPickle.load(f)

209 f.close()

data 만들기1

train_set.x.txt

input vector length

vector

data 만들기1

train_set.y.txt

vector

data 만들기1

from numpy import genfromtxt

import gzip, cPickle

…………….

train_set_x = genfromtxt(dir_path+"train_set.x.txt", delimiter=",")

…………………..

train_set = train_set_x, train_set_x

valid_set = valid_set_x, valid_set_x

test_set = test_set_x, test_set_x

print "writing to pkl.gz..."

data_set = [train_set, valid_set, test_set]

print "zip data into a file"

f= gzip.open(output_dir+str(i)+"_"+pkl_filename+".pkl.gz",'wb')

print "zip data file name is " + str(i)+"_"+pkl_filename+".pkl.gz"

cPickle.dump(data_set,f,protocol=2)

f.close()

for n, sentence in enumerate(file_lines):

……………………..

data_batch_fpath= vector_dir+"data_batch_"+str(n)+".npz"

……………………….

# save vector list

numpy.savez(data_batch_fpath,

data=numpy.asarray(sentence_vector_list),

labels=label_vector,

length=max_length,

dim=dimension)

data 만들기2

save, load model

save model

load model

Theano modes

.bashrc

226 # Theano Settings

227 export THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32,exception_verbosity=high

Deep Learning

For Natural Language Processing

Deep Learning

- 나는밥을먹는다

Deep Learning

one-hot (1 of K)

representation

Deep Learning

- 나는밥을먹는다.

형태소단위로분리

one-hot (1 of K)

representation

Deep Learning

- 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0]

index 0(나) 1(가) 2(는) ... ... ... ... 999(.)

나 1 0 0 0 0 0 0 0

는 0 0 1 0 0 0 0 0

.. 0 0 0 0 0 1 0 0

.. 0 0 0 0 1 0 0 0

다 0 0 0 0 0 0 1 0

one-hot (1 of K)

representation

문자의벡터로표현

Deep Learning

word2vec

representation

Deep Learning

- Word2Vec model

- 밥 = [0.323112, -0.021232, …….. , 0.82123123]

word2vec

representation

Deep Learning

- 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0]

- 밥 = [0.323112, -0.021232, …….. , 0.82123123]

word2vec

representation

one-hot (1 of K)

representation

Gensim

- definition- Gensim is a Python library for topic modelling, document indexing

and similarity retrieval with large corpora

- word2vec class- word vector representation

- multi threading

- Skip Gram

- Continuous Bag of Words

Gensim - import, settings

# imports

9 from gensim.models.word2vec import LineSentence

10 from gensim.models import word2vec

32 # settings

33 THEADS = 8 # progress with multi threading

34 DIMENSION = 50

35 SKIPGRAM = 1 # 1 is skip gram, 0 is cbow

36 WINDOW_SIZE = 8

37 NTimes = 10 # repeat number of sentences

38 min_count_of_word = 5

………………..

65 from gensim import utils

Gensim - training, save model

97 # load raw sentence

98 sentences = LineSentence(input_train_file_path)

99 # model settings

100 model = word2vec.Word2Vec(size=dimension, workers=THEADS,

min_count=min_count_of_word, sg=SKIPGRAM, window=WINDOW_SIZE)

102 # build voca and train

103 number_iter = NTimes # number of iterations (epochs) over the corpus

104 model.build_vocab(sentences)

106 ss = utils.RepeatCorpusNTimes(sentences, number_iter)

107 model.train(ss)

108 # save model

109 model.save(model_file_name)

110 model.save_word2vec_format(model_file_name + '.bin', binary=True)

Gensim - load model, test

83 try:

84 model = utils.SaveLoad.load(fname=model_file_name)

85 except:

86 print "failed to load. Retrying by load_word2vec_format() !!"

87 model =word2vec.load_word2vec_format(fname=model_file_name+".bin")

297 x = model [w.decode('utf-8')]

314 mw, score = model.most_similar(positive=[x])[0]

315 print "most similar : ",mw

316 print "target vector :", x

‘서울’의 most similar words

most similar words similarity

대구 0.4282917082309723

광주 0.4046330451965332

부산 0.40132588148117065

울산 0.3863871693611145

수원 0.38555505871772766

청주 0.35919708013534546

안양 0.35622960329055786

주왕산 0.3543151617050171

평택 0.3505415618419647

cebu 0.34598737955093384

Auto word spacing

with Recurrent Neural Network

- 0 0 1 0 1 0 0

- [0.323112, -0.021232, …….. , 0.82123123]

Deep Learning 실험하면서어려웠던것들

- layer의개수, layer당 node의개수, learning

rate, epoch횟수, batch횟수, activation

function 선택등선택해야할 parameter들이많다.

- parameter 바꿔서실험결과를확인하는데에오래걸린다.

- big data이기때문에 gpu memory 문제

Thank you

Setting GPU

building lmdb for caffe

Softmax

functionBias

Negative

Log Likelihood

http://goo.gl/forms/IR45liXoQ3

[pycon 2015] 오늘 당장 딥러닝 실험하기 제출용

Technology

제출용 최종student guidance slide

핵무기, 지금 당장 금지해야...

회사소개서 이나우모바일 201402 제출용 [호환...

emc 설계 대책사례 -...

2013 가을, 문화매거진 오늘 day 가을밤 물들다...

제출용 최종student guidance swing

어제, 오늘 대안언어축제

인터렉최종피티 제출용

가입 신청서 회사...

리얼타임 소셜 비즈니스 플랫폼 제출용

[자료집] 안전한 사회를 위해 당장 실천해야...

[작가와의 산책 : 지금 당장 마케팅...

가입 신청서 회사 제출용 - kia...

syu 오늘 점심은 뭐먹지? 개요

£¼야놀자_회사소개서_201505.pdf · contents...

대학원 학위논문 작성 지침 -...

사무소 계획 -...

조 제출용 ppt 리서치 & 초안

energytransitionkorea.orgenergytransitionkorea.org/sites/default/files/2019-12...축...

검색어로 기록하는 오늘 volume 020...