efficient cnn - github pages · samsung electronics jinwon lee. moneyball. xlnet(yang, arxiv 19 jun...
TRANSCRIPT
![Page 1: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/1.jpg)
Efficient CNN
Samsung ElectronicsJinwon Lee
![Page 2: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/2.jpg)
![Page 3: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/3.jpg)
MoneyBall
![Page 4: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/4.jpg)
XLNet(Yang, arxiv 19 Jun 2019)
• Parameters 340 million parameters
• Training 512 TPU v3 chips for 500K steps
2.5 days
• 512 TPU x 2.5 days x $8
= $245,000
Slide Credit : 윤승현’s presentation @NVIDIA KR Conference 2019
![Page 5: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/5.jpg)
Convolutional Neural Networks
• GPipe 556 million parameters
• AmoebaNet 450 K40 GPU, 7 days training
• NAS 800 GPU, 28 days training
![Page 6: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/6.jpg)
Toward More Accurate Models
• The impressive results of the current object category recognition systems are obtained on a limited number(20~1000) of basic-level object categories
• With large scale dataset with 22,000 categories, the performance drops to 30%
Slide Credit : Prof. Sung Ju Hwang
![Page 7: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/7.jpg)
Efficient Models for On-device AI
![Page 8: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/8.jpg)
그래서 오늘은…
Convolutional Neural Network
그 중에서도
Efficient CNN에는 어떤 model들이 있는지…
논문의 핵심 idea들 위주로
살펴보겠습니다
그리고 몇가지 bonus insight도…
![Page 9: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/9.jpg)
오늘 다루지 않는 것들
• AutoML (ㅠㅠ)
• Pruning
• Knowledge Distillation
• 오늘 소개할 논문들의 Experimental Results
![Page 10: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/10.jpg)
오늘 살펴볼 논문들…
1. Going Deeper with Convolutions
2. Very Deep Convolutional Networks for Large-Scale Image Recognition3. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and ...
4. SqueezeNext: Hardware-Aware Neural Network Design
5. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision ...6. MobileNetV2: Inverted Residuals and Linear Bottlenecks
7. Searching for MobileNetV3
8. ShuffleNet: An Extremely Efficient Convolutional Neural Network for ...
9. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
10. Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions
11. CondenseNet: An Efficient DenseNet using Learned Group ...12. All You Need is a Few Shifts: Designing Efficient Convolutional Neural ...
13. Fully Learnable Group Convolution for Acceleration of Deep Neural ...
![Page 11: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/11.jpg)
깨알홍보 – 12PR
![Page 12: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/12.jpg)
Convolution
• H=E, W=F인 경우, Parameter 수 : R x S x C x M
연산량 : R x S x C x H x W x M
Memory Access Cost : R x S x C x M + (H x W) x (C + M)
![Page 13: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/13.jpg)
1. Bottleneck Layer, Global Average Pooling
![Page 14: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/14.jpg)
2. Filter Factorization
![Page 15: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/15.jpg)
2. Filter Factorization
![Page 16: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/16.jpg)
3. Depthwise Separable Convolution
![Page 17: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/17.jpg)
3. Depthwise Separable Convolution
Depthwise convolution Pointwise convolution
![Page 18: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/18.jpg)
MobileNetV3
![Page 19: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/19.jpg)
4. Group Convolution
![Page 20: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/20.jpg)
Grouped Convolution
Figure from https://blog.yani.io/filter-group-tutorial/
![Page 21: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/21.jpg)
ShuffleNet
![Page 22: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/22.jpg)
CondenseNet
![Page 23: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/23.jpg)
FLGC – CVPR 2019
![Page 24: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/24.jpg)
5. Shift
![Page 25: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/25.jpg)
ShiftNet
![Page 26: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/26.jpg)
All You Need is a Few Shifts – CVPR 2019
![Page 27: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/27.jpg)
6. Use Direct Metric
![Page 28: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/28.jpg)
ShuffleNet V2 – Guide 1
• Equal channel width minimizes memory access cost (MAC).
• Let h and w be the spatial size of the feature map, the FLOPs of the 1 x 1 convolution is 𝐵 = ℎ𝑤𝑐1𝑐2.
• The memory access cost (MAC), or the number of memory access operations, is 𝑀𝐴𝐶 = ℎ𝑤(𝑐1+ 𝑐2) + 𝑐1𝑐2.
• MAC has a lower bound given by FLOPs. It reaches the lower bound when the numbers of input and output channels are equal.
𝑀𝐴𝐶 ≥ 2 ℎ𝑤𝐵 +𝐵
ℎ𝑤
![Page 29: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/29.jpg)
ShuffleNet V21. Use "balanced” convolutions (equal channel width).
2. Be aware of the cost of using group convolution.
3. Reduce the degree of fragmentation.
4. Reduce element-wise operations.
![Page 30: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/30.jpg)
Most DNN Accelerators
![Page 31: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/31.jpg)
Energy is very Important
![Page 32: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/32.jpg)
Energy?!
![Page 33: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/33.jpg)
Key Insights
![Page 34: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/34.jpg)
Where is the Energy Consumed?
![Page 35: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/35.jpg)
EfficientNet
![Page 36: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/36.jpg)
EfficientNet
![Page 37: Efficient CNN - GitHub Pages · Samsung Electronics Jinwon Lee. MoneyBall. XLNet(Yang, arxiv 19 Jun 2019) •Parameters 340 million parameters •Training 512 TPU v3 chips for 500K](https://reader034.vdocuments.pub/reader034/viewer/2022042109/5e8a11f39e4075121b1cd066/html5/thumbnails/37.jpg)
Thank you