[dl輪読会]dl hacks輪読

DL Hacks輪読

2016/11/25黒滝紘生

趣旨

- ネットワークの構造を、ある程度自動で決められないか

- ICLR2017の4つの論文などを紹介する

- カテゴリ

- ハイパーパラメータ推定 ("HyperBand", "Neural Architecture Search with RL")

- メタネットワークで生成 ("HyperNetworks")

- レイヤーのスキップ≒ResNet系 ("DCNN Design Pattern")

- その他 (刈り込み/追加など)

2

目次




- その他 (刈り込みなど )

3

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

- https://arxiv.org/abs/1603.06560 , https://openreview.net/forum?id=ry18Ww5ee ICLR2017 UR(Openreviewのショートバージョンの方が読みやすい )

- SVHNやCIFAR-10用ネットワークのハイパーパラメータ調整タスク

- 「ハイパーパラメータの組み合わせに対して、限られたデータ資源 (データ数、バッチ数など )を割り当てる

bandit問題」として定式化する。

- 先行研究の"Successive Halving"では、「広く浅く割り当てる vs狭く深く割り当てる」の調整ができなかった。

- Successive Halvingの、ハイパーハイパーパラメータを、更にグリッドサーチすることで、

最新手法(SMAC_early)と同等or上回る結果を得た。

4

https://arxiv.org/abs/1603.06560

https://openreview.net/forum?id=ry18Ww5ee


Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves

- IJCAI 2015- https://pdfs.semanticscholar.org/044f/0b1d5d0b421abbc7569ba4cc4bf859fd9801.pdf - 前ページのHyperbandのベースライン (SMAC_early)の提案論文

- ハイパーパラメータサーチには、 (この論文の時点で )3つの方法があった

- Baysian OptimizationによるSpearmint- Random forestによるSMAC

- 密度推定によるTree Parzen Estimator(TPE)

- この論文では、SMACとTPEに対し、人間のエキスパートを真似た early stoppingを入れて、

良い性能を出した

5

Neural Network Architecture Optimization through Submodularity and Supermodularity

- http://arxiv.org/abs/1609.00074 Sep 2016- Baysian Optimizationによる最適化のState of the art

6

http://arxiv.org/abs/1609.00074

http://arxiv.org/abs/1609.00074

Neural Architecture Search with Reinforcement Learning

- Google Brain, ICLR2017 under review- https://arxiv.org/abs/1611.01578- 強化学習とRNNで、1項目ごとに決めていく (下図)- CIFAR10とPenn Treebank用のネットワークを生成した

7

Online Adaptation of Deep Architectures with Reinforcement Learning

- ECAI 2016- https://arxiv.org/abs/1608.02292- 強化学習で、Denoising Autoencoderの構造を学習する

- (画像は、ベースラインの論文のもの。この mergeやincrementを、動作と捉えてRLする。)

8

目次





9

HyperNetworks- https://arxiv.org/abs/1609.09106, Sep 2016, ICLR2017 under review

- http://blog.otoro.net/2016/09/28/hyper-networks/- RNNには、毎時間の重みが変化しない制約があった。

- 小さなLSTMから、毎時間メインのLSTMの重みを出力することで、解決した。

10



http://blog.otoro.net/2016/09/28/hyper-networks/

http://blog.otoro.net/2016/09/28/hyper-networks/

HyperNetworks- テキスト生成

- 大きなResNetの重み生成

- 手書き文字生成 (2D混合ガウス分布を、HyperNetworkで生成していく )

- Tensorflowの通常のRNNCellとして使える。

- ネットワークからネットワーク重みを生成するアイデアは、 HyperNEAT(後述)から来ている。

- Character-Level Penn Treebank と Hutter Prize Wikipedia でstate of the art

11

Evolving Neural Networks through Augmenting Topologies

- Evolutionary Computation 2002 Vol.10-2- http://dx.doi.org/10.1162/106365602320169811- 遺伝的アルゴリズム+αで、入力ノードと出力ノードの間の分岐を変化させる。

12

A Hypercube-based Encoding for Evolving Large-scale Neural Networks

- Artificial Life 2009 Vol.15-2- http://www.mitpressjournals.org/doi/abs/10.1162/artl.2009.15.2.15202#.WDd3JKJ95TY- メタネットワークに、 (エッジの始点 , 終点)を入力すると、 (そのエッジのウェイト )が出力される。

- 小さいネットワーク (CPNN)で、様々なメインネットワーク構造を表せる

13

Convolution by Evolution

- http://mlanctot.info/files/papers/gecco16-dppn.pdf- Google DeepMind- CPNNを微分で学習可能にした "DPNN"を提案

- 構造は変化するが、重みの値は BPで学習する

14

http://mlanctot.info/files/papers/gecco16-dppn.pdf

http://mlanctot.info/files/papers/gecco16-dppn.pdf

他のNEAT

- 画素の密集地に多くのネットワーク分岐を割り当てる

- CNNの前処理に使う

- ATARIのタスクに使う

- 制御タスクに使う

- しかし、GAの重さがネックとなっていた

- HyperNetworkは、全体をBPにして、応用先を変えることで解決した

15

目次





16

Deep Convolutional Neural Network Design Patterns

- https://arxiv.org/abs/1611.00847 , ICLR2017 under review- ここ数年の、CNN構造いじる系論文のサーベイ

- さらに、構造いじりのアイデアのデザインパターンを提唱している。

- デザインパターン : 頻出テクニックに、名前をつけて、会話しやすくしたもの。

- デザインパターンを元に、いくつかの新しいネットワークを提案している。

17



Training Very Deep Networks

- https://arxiv.org/abs/1507.06228 , ICML 2015 DL Workshop -> NIPS 2015 Highlighted Paper- Highway Networksの論文

- Resnetの恒等写像がゲートになっている

18

Deep Networks with Stochastic Depth

- https://arxiv.org/abs/1603.09382- ResNetのブロックを、訓練時のみ、ランダムに消した。テスト時は全使用

- 深さ方向のDropout。

19

Densely Connected Convolutional Networks

- https://arxiv.org/abs/1608.06993- Resnet with Stochastic Depthと同じ著者

- 前のConv Layerの出力を、1つ上のLayerだけでなく、

その先のLayerにも入力する (いわゆるconcat layer)。- 先のレイヤーほど太っていくが、

- 1. 4レイヤーずつでリセットする。 (Dense Block)- 2. レイヤーの増加幅 (Growth Rate)を、小さくする。

- この2つによって、パラメータを増やしすぎずに済む。

下のレイヤーの情報を再利用できるためと考えられている。

- SVHN、CIFAR-{10,100}でstate of the art

20

Resnet in Resnet: Generalizing Resnet Architectures

- ICLR 2016 Workshop- http://arxiv.org/abs/1603.08029

21

Residual Networks are Exponential Ensembles of Relatively Shallow Networks

- NIPS 2016- https://arxiv.org/abs/1605.06431- 左のResNetが、実は右のように展開したものと等価なことを示した。

22



FractalNet: Ultra-Deep Neural Networks without Residuals

- http://arxiv.org/abs/1605.07648- 恒等写像でバイパスするResNetと違い、同じ関数 (レイヤー)を2回合成したパスとの concatでバイパスする。

よって、少ないレイヤーからスタートして、倍々に深さが増える

- さらに、ResNet w/ stochastic depth同様に、各レイヤーを確率的に落として FFルートを作る、 "Drop-path"という手法を提案している

- だいたいVGG-16やResNetと同じ精度が出る

23

Xception: Deep Learning with Depthwise Separable Convolutions

- https://arxiv.org/abs/1610.02357- Inceptionを一般化、発展させた

24

目次





25

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

- ICLR 2016 Best Paper Award- https://arxiv.org/abs/1510.00149 , https://github.com/songhan/Deep-Compression-AlexNet- 刈り込み、量子化、ハフマン符号化を組み合わせて、ネットワークを圧縮する

26



Blockout: Dynamic Model Selection for Hierarchical Deep Networks

- https://arxiv.org/abs/1512.05246- DropoutやDropconnectを一般化したもの。

- これらを「確率的なノードグループへの接続割り当て」と解釈した

(Dropoutがグループ1個、connectがN個)。- 決められたグループ数について、「 i番目のグループへの接続率」を BPで学習させた。

- CIFARで良い性能を出した

27

Deconstructing the Ladder Network Architecture

- https://arxiv.org/abs/1511.06430 , ICML2016 ラストがY.Bengio

- Ladderの元論文

"Semi-Supervised Learning with Ladder Networks (Rasmus, 2015)"の疑問点をいろいろ検証したり、

構造を改善した

- Autoencoderに似てる

28



Using Fast Weights to Attend to the Recent Past

- https://arxiv.org/abs/1610.06258 , 2ndがG.Hinton- activationとweightの中間のスピードで更新される "Fast weight"を導入することで、性能が上がった。

- Fast weightは隠れ状態h(t)から計算され、一種のattentionと見なせる。また生物学的にも根拠がある。

- 具体的には、RNNのh(t)とh(t+1)の間に、S回のh_s(t) (s=0..S)の隠れ状態の移り変わりを考える

(Eq.2, Figure 1)。- この移り変わりでは、h(t){h(t)^T}に基づく接続A (Fast weight)と、普通の接続W (Slow weight)を混ぜ

合わせている (Eq.1)。

29



[dl輪読会]dl hacks輪読

Technology