introduction to chainer (ll ring recursive)
TRANSCRIPT
-
Introduction to Chainer
Preferred Networks
2015/9/5 LL Ring Recursive@ 1stRing
-
(@delta2323_)
2012.3 PFI 2014.10 PFN
Chainer
http://delta2323.github.io
NIPS2014ICML2015
1!
2
-
git clone https://github.com/pfnet/chainer.git
-
Chainerhttp://chainer.org
PFNPFI
201569
1.3.0201592
1.3.1 (9/16) 1.4.0 (9/30)
MIT (Expat)
HPhttp://chainer.org
https://github.com/pfnet/chainer
Twitter@ChainerOfficial
Google GroupChainer Uesr Group
Contribution Guidehttp://docs.chainer.org/en/stable/contribution.
html
PowerfulCUDAGPU
Flexible
IntuitivePython
-
x1
xN
h1
hH
kM
k1
yM
y1
Forward
Backward
5
50%
-
AI
+
+
QSAR
()
e
-
Deep Q Network*
* Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.** CaffeDeep Q-Network http://d.hatena.ne.jp/muupan/20141021/1413850461*** PFI2014 http://www.ustream.tv/recorded/53153399
7
-
Kingma, Diederik P., et al. "Semi-supervised learning with deep generative models." Advances in Neural Information Processing Systems. 2014.
http://soumith.ch/eyescream/
Eye Scream Project http://soumith.ch/eyescream/
A Neural Algorithm of Artistic Style [Gatys+'15]
-
9
https://research.preferred.jp/2015/06/distributed-deep-reinforcement-learning/
http://rll.berkeley.edu/deeplearningrobotics/
-
10
PubChem
55
MPI
MPI
TSUBAME 824GPU(K40) MPI
-
Neural Network
x1
xN
h1
hH
kM
k1
yM
y1
f1f2
f3
W2/b2W1/b1
tM
t1
Forward
Backward
W1:1 b1:1 W2:2 b2:2
11
Forward h = f1(x) = Sigmoid(W1x+b1) k = f2(h) = Sigmoid(W2h+b2) y = f3(k) = SoftMax(k)
f3i(k) = exp(ki)/_{j} exp(kj)
-
DeepLearning
Caffe Chainer
n
Blob Variable
Layer Function Net (FunctionSet)
Solver Optimizer
12
(DAG)
-
Forward Propagation
Forward(Loss)
Loss
Forward
-
Chain Rule
Forward Propagationy = f(x; )
: Layer
L
,
x y
-
Backward Propagation
Backward
Backward
-
Backward
( : )
SGD / Momentum / AdaGrad / ADADELTA / RMSprop / Adam etc
http://imgur.com/a/Hqolp
-
OSLinuxUbuntu 14.04
MacOSWindows
Python(Cpython)
2.7+/3.4+
Numpy1.9+Six1.9+
CUDACUDA6.5+
pip install chainer
-
Github Stars20155
Theano
PyLearn2
https://twitter.com/fchollet/status/635891305084796929
-
Github Stars20158
https://twitter.com/fchollet/status/635891305084796929
PyLearn2
Theano
-
2
Chainer
Python
CuPyGPUNumPy
NumPy
CPU GPU
BLAS CUDAToolkit cuDNN
NumPy CuPy
Chainer
Python
NumPyPythonPythonNumPy
-
GoogLeNet, NTM, Recursive Net, LSTM
Chainer Caffe167 2058
GoogleNet
(2012)AlexNet*, 7
(2014) GoogLeNet**, 22
22
* ImageNet Classification with Deep Convolutional Neural Networks http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf** Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014).
ChainerDefine-by-Run
-
Define-and-Run
prototxt, yaml, Lua etc.
Caffe/Torch/Theano
1
f g
x f g
-
Define-by-Run
for
x yf
x = chainer.Variable(...)y = f(x)z = g(x)
zg
=
Chainer
-
Forward
x = chainer.Variable(np.array(1))
y = chainer.Variable(np.array(1))
z = x**2 + 2*x*y + y
z.backward()
Split
x
y
_ ** 2
2 * _ _ * _
_ + _ z
_ + _
chainer.Variable
chainer.Function
-
Forward
Split
x
y
_ ** 2
2 * _ _ * _
_ + _ z
_ + _
x = chainer.Variable(np.array(1))
y = chainer.Variable(np.array(1))
z = x**2 + 2*x*y + y
z.backward()
-
MNIST# (1) Model definition
model = FunctionSet(
l1=F.Linear(784, 100),
l2=F.Linear(100, 100),
l3=F.Linear(100, 10)).to_gpu()
opt = optimizers.SGD()
opt.setup(model)
# (2) Forward computation
def forward(x, t):
h1 = F.relu(model.l1(x))
h2 = F.relu(model.l2(h1))
y = model.l3(h2)
return F.softmax_cross_entropy(y, t)
# (3) Training loop
for epoch in xrange(n_epoch):
for i in xrange(0, N, batchsize):
x = Variable(to_gpu(...))
t = Variable(to_gpu(...))
opt.zero_grads()
loss = forward(x, t)
loss.backward()opt.update()
784 100 100 10
0:2%1:5%2:90%
9:1%
-
FunctionSet# Model definition
model = FunctionSet(
l1=F.Linear(784, 100),
l2=F.Linear(100, 100),
l3=F.Linear(100, 10)).to_gpu()
opt = optimizers.SGD()
opt.setup(model)
# Forward computation
def forward(x, t):
h1 = F.relu(model.l1(x))
h2 = F.relu(model.l2(h1))
y = model.l3(h2)
return F.softmax_cross_entropy(y, t)
# Training loop
for epoch in xrange(n_epoch):
for i in xrange(0, N, batchsize):
x = Variable(to_gpu(...))
t = Variable(to_gpu(...))
opt.zero_grads()
loss = forward(x, t)
loss.backward()opt.update()
FunctionFunctionSet
784 100 100 10
0:2%1:5%2:90%
9:1%
-
Optimizer# Model definition
model = FunctionSet(
l1=F.Linear(784, 100),
l2=F.Linear(100, 100),
l3=F.Linear(100, 10)).to_gpu()
opt = optimizers.SGD()
opt.setup(model)
# Forward computation
def forward(x, t):
h1 = F.relu(model.l1(x))
h2 = F.relu(model.l2(h1))
y = model.l3(h2)
return F.softmax_cross_entropy(y, t)
# Training loop
for epoch in xrange(n_epoch):
for i in xrange(0, N, batchsize):
x = Variable(to_gpu(...))
t = Variable(to_gpu(...))
opt.zero_grads()
loss = forward(x, t)
loss.backward()opt.update()
Optimizer
784 100 100 10
0:2%1:5%2:90%
9:1%
-
# Model definition
model = FunctionSet(
l1=F.Linear(784, 100),
l2=F.Linear(100, 100),
l3=F.Linear(100, 10)).to_gpu()
opt = optimizers.SGD()
opt.setup(model)
# Forward computation
def forward(x, t):
h1 = F.relu(model.l1(x))
h2 = F.relu(model.l2(h1))
y = model.l3(h2)
return F.softmax_cross_entropy(y, t)
# Training loop
for epoch in xrange(n_epoch):
for i in xrange(0, N, batchsize):
x = Variable(to_gpu(...))
t = Variable(to_gpu(...))
opt.zero_grads()
loss = forward(x, t)loss.backward()opt.update()
784 100 100 10
0:2%1:5%2:90%
9:1%
-
# Model definition
model = FunctionSet(
l1=F.Linear(784, 100),
l2=F.Linear(100, 100),
l3=F.Linear(100, 10)).to_gpu()
opt = optimizers.SGD()
opt.setup(model)
# Forward computation
def forward(x, t):
h1 = F.relu(model.l1(x))
h2 = F.relu(model.l2(h1))
y = model.l3(h2)
return F.softmax_cross_entropy(y, t)
# Training loop
for epoch in xrange(n_epoch):
for i in xrange(0, N, batchsize):
x = Variable(to_gpu(...))
t = Variable(to_gpu(...))
opt.zero_grads()
loss = forward(x, t)
loss.backward()opt.update()
784 100 100 10
0:2%1:5%2:90%
9:1%
-
Python (if / for / while etc)
ForRNN
def forward(x, t, train=True):h = F.relu(model.l1(x))y = model.l2(h)if train:loss = F.softmax_cross_entropy(y, t)return loss
else:prob = F.softmax(y)acc = F.accuracy(y, t)return acc
y sceloss
y smprob acc
acc
-
y
y
truncated BPTT
x f y g z
y g z
y.unchain_backward()
x = Variable()
y = f(x)
z = g(y)
y.unchain_backward()
BPTTBack Propagation Through TimeRNNtruncated BPTTBPTT
-
Caffe Reference Model
Caffe Model ZooBVLC Reference ModelChainerfunction
func = CaffeFunction('path/to/bvlc_reference_caffenet.caffemodel')
x = Variable()
y, = func(inputs={'data': x}, outputs=['fc8'])
CaffeC++Model ZooCaffeWiki
-
CuPyGPUNumPy
cupy.ndarray
numpy.ndarray
etc
ElementwiseReduction
CPUGPU
def softmax(x)
xp = get_array_module(x)
y = x x.max(axis=1, keepdims=True)
y = xp.exp(y)
return y / y.sum(axis=1, keepdims=True) xp = numpy/cupy
OK
numpy/cupy
-
ChainerDefine-by-RunPython
Python
CuPyCPUGPU
HPhttp://chainer.org
https://github.com/pfnet/chainer
Twitter@ChainerOfficial
Google GroupChainer Uesr Group
Contribution Guidehttp://docs.chainer.org/en/stable/contribution.html
git clone https://github.com/pfnet/chainer.git
Your Contribution is Welcomed!!
-
MochaJulia
Chiyuan Zhang (MIT)
v0.0.9(2015721)
MIT Expat License
train LeNet with MNIST
https://github.com/pluskid/Mocha.jl#hell
o-world
Caffe
Caffe
Caffe
Pure Julia / C++ / GPU