res netと派生研究の紹介

ResNetと派生研究の紹介2016-06-04

Masataka Nishimori

主旨

● ResNetとは何か？● ResNetの派生研究ではどういったものがあるのか？● TensorFlowで実装してみて気づいたこと

ResNetとは

● 概要- Deep Residual Network[1]の略称- MSRA開発の ImageNet 2015優勝アルゴリズム- 残差 (Residual)を取り入れることで多層でも性能劣化軽減- ImageNetでは 152層と非常に多層 (従来は 20層程度）

[1]. He, Kaiming, et al. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015).引用 : He, Kaiming, et al. "Identity mappings in deep residual networks." arXiv preprint arXiv:1603.05027 (2016).

どれぐらい深いのか？

引用 : Deep Residual Learning MSRA @ ILSVRC & COCO 2015 competitions

- 2014年優勝アルゴリズムの 7倍近く層数が増加．- 1000層以上のネットワークも論文中で提案．

http://kaiminghe.com/ilsvrc15/ilsvrc2015_deep_residual_learning_kaiminghe.pdf

深ければ良いのか？

● 少なくとも広いよりは深い方が良いらしい． [1]

[1]. Eldan, Ronen, and Ohad Shamir. "The Power of Depth for Feedforward Neural Networks." arXiv preprint arXiv:1512.03965 (2015).

単純に多層にすると．．．

引用 : He, Kaiming, et al. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015).

● 従来は性能が悪くなる● CIFAR 10の例 (左 : 従来 , 右 :

ResNet)● 多層だと従来は誤差増加

なぜ多層にするのが難しいのか？● 勾配の消失

○ 原因■ 逆誤差伝播で小さな重みが何度も乗算されるため [1]

○ 緩和方法■ Careful Initialization[2]■ Hidden Layer Supervision[3]■ Batch Normalization[4]■ ResNetの Identity Mapping(後述 )

[1]. Huang, Gao, et al. "Deep networks with stochastic depth." arXiv preprint arXiv:1603.09382 (2016).[2]. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforwardneural networks. In: International conference on artificial intelligence and statistics. (2010) 249–256[3] Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. arXiv preprint arXiv:1409.5185 (2014)[4] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network trainingby reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)

なぜ多層にするのが難しいのか？

● 特徴量情報の劣化○ 原因

■ FeedWorwardでランダムに初期化された重みによって特徴が消えてしまい，後ろの層に伝わってくれないため [1]

○ 緩和方法■ ResNetの Identity Mapping(後述 )

[1]. Huang, Gao, et al. "Deep networks with stochastic depth." arXiv preprint arXiv:1603.09382 (2016).

なぜ多層にするのが難しいのか？

● 学習に時間がかかる○ 原因

■ 層数が増えるほど計算時間も増加．■ ResNetも ImageNet用に数週間学習に費やす [1].■ TITAN X(1台 )だと CIFAR10で 20層 :2時間 , 110層 :半日程度

○ 緩和方法■ 金と時間 (ResNet)[2]■ Dropoutで確率的に層数を変更 [1](後述 )

[1]. Huang, Gao, et al. "Deep networks with stochastic depth." arXiv preprint arXiv:1603.09382 (2016).[2]. He, Kaiming, et al. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015).

ResNetの Identity Mappingとは

従来のネットワーク

ショートカットする道を作り，何層も前の層を情報を足す．この足し上げる部分のことを Identity Mappingと呼ぶ．

引用 : Deep Residual Learning MSRA @ ILSVRC & COCO 2015 competitions

http://kaiminghe.com/ilsvrc15/ilsvrc2015_deep_residual_learning_kaiminghe.pdf

なぜ解決できているのか？

うまく学習できているとき● xが最適であれば， weight

layer部分は 0になってショートカット部分のみで良い．

● 最適付近なら，重みを少しだけ更新してあげれば良い

なぜ解決できているのか？

● 前の前の層を足すことで，Feed Forward時に特徴量の情報の消失を防いでいる．

● 逆誤差伝播時にも消失が起こりづらい形式で学習できるようになっている．

CIFAR 10での実験

左 : 従来手法 , 右 : ResNet. 太線 : テスト誤差 , 破線 : 検証誤差

ResNetを CIFAR 10で実験してみても，層数が増えるほど精度が上がる

ただ，いろいろと疑問は残る● モデル構造

○ ほんとにその構造が最良 ?[1,2,3]● 最適化手法

○ SGD+Momentumが最良 ?[3]● 学習時間

○ なんとか節約できないか ?[4]結果，派生研究が大量に出現する．[1]. He, Kaiming, et al. "Identity mappings in deep residual networks." arXiv preprint arXiv:1603.05027 (2016).[2]. Szegedy, Christian, Sergey Ioffe, and Vincent Vanhoucke. "Inception-v4, inception-resnet and the impact of residual connections on learning." arXiv preprint arXiv:1602.07261 (2016).[3]. Training and investigating Residual Nets[4]. Huang, Gao, et al. "Deep networks with stochastic depth." arXiv preprint arXiv:1603.09382 (2016).

http://torch.ch/blog/2016/02/04/resnets.html

派生研究 : モデル構造

● ResNet考案者の追加実験．● BN(Batch Norm)と RELUの位置での性能評価

○ BNと ReLUを畳み込みの前に行う方式が一番性能がよいとの報告

引用 : He, Kaiming, et al. "Identity mappings in deep residual networks." arXiv preprint arXiv:1603.05027 (2016).

派生研究 : モデル構造

注 ). NSize=18は 110層の意 , BN: Batch Normそもそも最後の ReLUが要らないという報告

引用 : Training and investigating Residual Nets


実験 : モデル構造

● 32層で CIFAR 10に適用● 元の論文通りが最良● 層数が増えると， BN, ReLU両方前が良いのかも

派生研究 : モデル構造● Googleの論文● Image Net Classificationで

ResNetを超える精度を出せるよう改良してみたという内容

● Top Error-5○ ResNet: 3.57%○ 本論文 : 3.08%

[1]. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

こうやって

これがこうなって

...

こうじゃ！

知見

● 1000層を超えると不安定になってくるので， 0.1 〜 0.3倍をInception部分にかけてあげると良い

派生研究 : 最適化手法の変更

● 110層 ReNetで CIFAR 10に適用● 論文通りが最良

引用 : Training and investigating Residual Nets


実験 : 最適化手法の変更

自前でやってみても論文通りが最良(32層 ResNetで CIFAR 10に適用 )

派生研究 : 時間短縮

● 確率的にショートカットのみを残すようにすることで，時間短縮を実現．

● 従来の ResNetよりも精度向上

引用 : Huang, Gao, et al. "Deep networks with stochastic depth." arXiv preprint arXiv:1603.09382 (2016).

実装時に気づいたこと

● 重みの初期化方法に気をつける．○ 0.01のガウス分布で適当に初期化とかするとダメ．○ std = √(2/(k*k*c)) で初期化 (k = カーネルサイズ , c = チャンネル数 )

[1]● 畳み込み層ではバイアスを追加しないようにする．● Adamを使っとけば良いとか思わない．● Global Average Poolingは [2] 参照

[1]. He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE International Conference on Computer Vision. 2015.[2]. Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).

結論

● ResNet○ 残差で 100層以上でも安定して学習できるようになった

● 派生研究○ モデル構造

■ 畳み込む前に BN+ReLUが良さそう○ 最適化手法

■ SGD+Momentumが現状では最良○ 時間短縮

■ Dropoutを使う．● リポジトリ

○ https://github.com/namakemono/cifar10-tensorflow

https://github.com/namakemono/cifar10-tensorflow

References

[1] He, Kaiming, et al. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015).

ResNetの論文[2] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).

Batch Normについての論文[3]. He, Kaiming, et al. "Identity mappings in deep residual networks." arXiv preprint arXiv:1603.05027 (2016).

ResNetのモデル構造に関する考察[4]. He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE International Conference on Computer Vision. 2015.

ResNetの重みの初期化方法記載

References

[5]. Training and investigating Residual Nets, ResNetのモデルと最適化手法の変更による性能比較

[6]. CS231n Convolutional Neural Networks for Visual Recognition, Leaning Rate変更による考察

[7]. Eldan, Ronen, and Ohad Shamir. "The Power of Depth for Feedforward Neural Networks." arXiv preprint arXiv:1512.03965 (2015).

広くより深くのほうが性能高いことを説明している論文[8]. Huang, Gao, et al. "Deep networks with stochastic depth." arXiv preprint arXiv:1603.09382 (2016).

Dropoutの導入で時間短縮を実現


https://cs231n.github.io/neural-networks-3/#update

res netと派生研究の紹介

Technology