Chung, Suk won / AI & HPC Category Manager
PART 1. 시작하기전에:
ReviewWhat is Deep Learning?
What is Machine Learning?
*Total: 42 Pages / 3 Parts
More complex function (function of functions)
s
be
ba
f
f1f2
f3
F: (s,be,ba,f) -> price
price = F(s,be,ba,f) = f3(f2(f1(s,be,ba,f)))
Rule-based AI, traditional ML and DL
4
Artificial Intelligence
Rule-based AI Machine Learning
if (s==100 and be==2and ba==2 and f==9)
then price = $1000000;
else if (…) then …else if (…) then …
Traditional ML Deep Learning
List of features:
- ‘s’ : surface
- ‘be’ : # bedrooms
- ‘ba’ : # bathrooms
- ‘f’ : # floors
Define a ‘Model’ :
F: (s,be,ba,f) -> price
price = F(s,be,ba,f) = w1*s + w2*be + w3*ba + w4*f
Train a model find the best values of w1, w2, w3, w4
Tasks complex for human, but easily formalized through rules
Traditional machine learningRequires feature engineering
5
Training DataMachine learning alg
orithmFeature engineering
DataLearned model (pred
iction function)Feature extraction Prediction
Training
Prediction
Deep learning
Artificial neural networks
Machine learning
Artificial intelligence
Deep learning
6
Training Data Deep learning algorithm
DataLearned model (transformation and prediction f
unction)Prediction
Training
Prediction (inference)
Deep learning
Artificial neural networks
Machine learning
Artificial intelligence
Efficient data representations, no more feature engineering
Deep Learning
7
InferenceApplying this capability
to new data
Trained ModelNew capability
optimized for
performance
New DataApp or ServiceFeaturing Capability
“cat”
“?”
“dog” “cat”
Training Dataset
“dog”
“cat”
“dog”
“cat”
“dog”
“cat”
TrainingLearning a new capability
from existing data
UntrainedNeural Network Model
정리: AI algorithms
8
Artificial
Intelligence
1. Top-DownDeductive
Laws/Rules
Handcrafted
2. Bottom-UpInductive
Input Ranking
Machine Learningand statistics
ClusteringUnlabeled Data
Unsupervised
Predict Category
Predict Point Regression
ClassificationLabeled Data
Supervised
o Neural Network
. Deep Learning
. etc.
o Decision Tree
. Random Forest
. etc.
o Bayesian
. Naïve Bayes
.etc.
o K-methods
. K-means
. etc.
o Regression
. Linear
. etc.
o etc.
Reinforcement
PART 2. AI 비즈니스트렌드
AI Business TrendWhat is your tool?
Where can we utilize it?What can we expect in future?
*Total: 42 Pages / 3 Parts
Where would the road take us?
10
Advances in artificial intelligence will transform modern life by reshaping transportation, health, science, finance, and the
military.
“High-level machine intelligence” (HLMI) is achieved when unaided machines can ac- complish every task better and
more cheaply than human workers.
Grace et al , When Will AI Exceed Human Performance? Evidence from AI Experts
Writing a bestseller –2049
Driving a truck - 2027
Math Research - 2060
Surgeon -2043
Retail - 2031 Full Automation of labor – 2140
AI Business Growth 예측
HPE’s View: What is the Market Size for AI?IDC forecasts spending on AI-focused hardware, software, and services to reach $58bn by 2021
HPE’s View: AI Global IT Spend by Industry: Who’s got the money??Sample AI Use Cases Across Different Industry Verticals
Banking & Securities,
19%
Government, 17%
Manufacturing &
Natural Res., 17%
Comms, Media &
Services, 16%
Retail, 7%
Insurance, 7%
Utilities, 5%
Healthcare Providers,
4%
Transportation, 4% Education, 2%Wholesale Trade, 2%
AI 기반 이미지/비디오 분석
0M
200M
400M
600M
800M
1,000M
2016 2020
10억개의 보안 카메라가 전세계에 설치(2020)
일간 300억개의 프레임 분석 필요
실제 세계에서의 기존 비디오 분석은 신뢰성이 떨어지는 문제
74%
9…
2010 2011 2012 2013 2014 2015 2016
Accuracy
이미지분류
Human
Hand-coded CV
Deep Learning
AI 기능 지능형 비디오 분석은
인간의 인지 능력을 뛰어 넘었음
GPU 기반 비디오 분석의 성능 가속
Nvidia P4/T4
Max. Cameras Per System720p/15fps/h.264
Detection Detection + Attribute Intrusion Line Crossing
HPE Edgeline EL1000 9 6 9 9
HPE Edgeline EL4000 36 24 36 36
이미지 자동 분석 기반의 품질 검사기준 구성 품질 검사
케이블 연결이 안됨
배터리부재
잘못된 위치에케이블 연결
Augmented Reality를적용한 파이프 변형 여부 검사
육안 판별이 불가능한초미세 영역에 대한 분석
2016 StackGAN (Generative Adversarial Network)
17
새로운 데이터를 생성하는 인공지능과 생성된 데이터가 진짜 인지 혹은 가짜인지를 판별하는 두 인공지능이 서로경쟁하며 진짜와 같은 가상의 결과물을 생성
PART 3. 시스템과솔루션
InfrastructureHow can we make it?
*Total: 42 Pages / 3 Parts
Deep learning frameworks : WHAT is these?
19
Optimized linear algebra libraries, many
support BLAS interface, hardware specific
Hardware-specific libraries for basic operations for
deep neural networks (BLAS + FFT, convolutions, etc)
Deep learning and machine learning frameworks
High-level APIs
UI, development tools
(TensorFlow, CNTK, Theano, MXNet)
cuBLAS, MKL, OpenBLAS, rocBLAS, MIOpenGEMM
Accelerator-specific drivers and softwareNVIDIA drivers, CUDA, ROCm
Brew (Caffe2)TF Layers (TensorFlow)
Gluon (MXNet)
NVIDIA DIGITS (Caffe, Torch, TensorFlow)
CNTK
MIOpen
Most popular frameworks
20https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software
Software Affiliated company
Supported HW Written in Interface Good for
x86NVIDIA GPUsAMD GPUs
C++, Python Python, C++, Java, Go, Swift
All use cases
x86NVIDIA GPUs
C++ Python Natural language processing
x86NVIDIA GPUs
C++, Python Python, C++, Scala, Julia, Perl, R
All use cases
x86NVIDIA GPUsAMD GPUs
C++ Python, bash Image processing
x86NVIDIA GPUs
C++ bash Speech recognition
Functional & Applications ViewAn End to End Data Pipeline
Analytic Services
“IoT”Edge Processing of data in motion
“Fast Data”Core Processing of data in motion
“Big Data”Analysis of data at rest
“AI”Deep Learning/Machine Learning
Parallel Data Flow Management
“Data Lake”
Distributed Data Flow Mgmt.
Parallel Analytic FrameworkData Acquisition
HPC Storage
Data Science ToolchainsData Flow Design, Data Science Workbench, Model Management, Application Deployment
Business Systems
Services and Solutions
Local Data Mgmt.
Container Management
Analytic Services
Model Serving
Model Serving
Models
Edge Infrastructure Management
Deep Learning
NoSQLHPC Storage
21
HPE enables AI from “intelligent edge to core data center”
Intelligent Edge(inference)
Cost optimized Storage Performance optimized storage
Core data center(deep learning)
TRAINING DATAEDGE
DATA
HPE Apollo 6500 Gen10
WekaIO
NVIDIA® Tesla® GPU acceler
ators
HPE System Management
Software
IOTDevices
HPE Aruba
HPE Edgeline
HPE Networking
HPE Apollo
HPE Pointnext
HPE OneView
HPE DMF
HPE Apollo
HPE ProLiant
HPE Apollo
HPE ProLiant
HPE Synergy
InfiniBand
Confidential
Qumulo
Scality
Ceph
HPE Organizational ViewAn End to End Data Pipeline
Analytic Services
NoSQL
Parallel Data Flow Mgmt
“Data Lake”
Parallel Analytic Framework
HPC Storage
Model
ServingModels
Deep Learning
Distributed Data Flow MgmtData Acquisition
Local Data Mgmt
Container Management
Analytic Services
Model S
erving
Edge Infrastructure Mgmt
Aruba
“IoT”Edge Processing of data in motion
“Fast Data”Core Processing of data in motion
“Big Data”Analysis of data at rest
“AI”Deep Learning/Machine Learning
Data Science Toolchains
Business Systems
Services and Solutions
HPE Storage
Enterprise Solutions and Performance
HPE Storage
Enterprise Solutions and Performance
HPC & AI BU
HPC & AI BU
Pointnext
23
인공 신경망의 예
InputHiddenLayer 1
HiddenLayer 2
Output InputHiddenLayer 1
HiddenLayer 2 Output
Convolutional Images
Fully connected Speech, text, sensor
Recurrent Speech, text, sensor
InputHiddenLayer 1
Output
많이사용되는 Deep Learning 모델
Name Type Model size(# params)
Model size (MB)GFLOPs
(forward pass)
AlexNet CNN 60,965,224 233 MB 0.7
GoogleNet CNN 6,998,552 27 MB 1.6
VGG-16 CNN 138,357,544 528 MB 15.5
VGG-19 CNN 143,667,240 548 MB 19.6
ResNet50 CNN 25,610,269 98 MB 3.9
ResNet101 CNN 44,654,608 170 MB 7.6
ResNet152 CNN 60,344,387 230 MB 11.3
Eng Acoustic Model RNN 34,678,784 132 MB 0.035
TextCNN CNN 151,690 0.6 MB 0.009
Application 별추천사항
Infrastructure
Frameworks
Typical layers
Data type
Data
제조Verticals 정유 & 가스 자율주행음성 소셜미디어
Speech Images Sensor dataVideo
Small Moderate Large
CNNFully-connect
edRNN
TensorFlow Caffe 2 CNTK …
x86 GPUs FPGAs TPU ? …
…
Torch
Neural Network sits here
54
HPE Deep Learning Cookbook
벤치마크 테스트데이터 제공
52
Deep learning워크로드에대한적용가이드
– 8개의Deep Learning 프레임워크기반의 11개워크로드에대해서 8종의HPE 하드웨어구성에대한정보제공
벤치마크및아키텍처툴에대한오
픈소스화
– Deep Learning벤치마크도구를GitGub에공개예정
– Deep Learning성능분석도구
– Hpe.com에표준아키텍처정보공개
– 워크로드에 대한 성능 예측치를 제공하여 최적의 시스템 사이징 근거 자료 제공
Benchmarking Suite Architecture
29
Benchmarking S
uite
Default param
eters
Benchmarks s
pecification
TensorFlow la
uncher
Caffe launche
r
Caffe2 launch
er
TensorRT lau
ncher
TF CNN Benchmark
Caffe
Caffe2
Benchmarks
TensorRT Benchm
arks
MXNet launch
er
MXNet
Benchmarks
TensorFlow
Caffe2
TensorRT
MXNet
PyTorch launc
her
PyTorch Benchmar
ksPyTorch
NVCNN
NVCNN Horovod
Tensor2Tensor
ONNX Logo was taken from onnx.ai
Which benchmarks to run
Configures benchmarks, runs one at a time
Mediators between experimenter and frameworks
Runs inference and training
Standard or custom frameworks
HPE Apollo 6500 XL270d 서버Deep Learning 훈련전용 8 GPU 서버
HPE ProLiant XL270d Gen10- SMX2 NVLINK Type
HPE ProLiant XL270d Gen10- PCIe Type
Redhat/CentOS/Suse/Ubuntu/windows
Framework/Library 설치지원
PCI 4:1 8:1 설정펌웨어에서가능
GPU 연결아키텍처
NVLink 기반 연결 PCIe 기반 연결
Choice of accelerator topologies to suit your specific workloads
32
S
W
S
W
CPU
2
CPU
1
S
W
S
W
GPU 1 GPU 4
GPU 3 GPU 2
GPU 5 GPU 8
GPU 7 GPU 6
S
W
S
W
CPU
2
CPU
1
S
W
S
W
GPU 1 GPU 4
GPU 3 GPU 2
GPU 5 GPU 8
GPU 7 GPU 6
Enhanced performance with hybrid-cube mesh accelerator topology using NVLink 2.0 for deep learning / AI and
HPC applications
Traditional PCIe with 4:1 topology for most HPC applications, as they do not rely on GPU:GPU commu
nications heavily
PCIe accelerators with 8:1 topology suits select HPC
and deep learning training, for easiest and most efficient GPUDirect enabled code
NVLink 2.0 PCIe 4:1 PCIe 8:1
S
W
S
W
CPU
2
CPU
1
S
W
S
W
GPU 1 GPU 4
GPU 3 GPU 2
GPU 5 GPU 8
GPU 7 GPU 6
DEEP LEARNING 계산요구특징 (이미지예시)
CONVOLUTION FULLY CONNECTED FULLY CONNECTED CONVOLUTION
(연산 성능) (메모리 대역폭) (메모리 대역폭) (연산 성능)
WEIGHT UPDA
TEDNN WEIGHTS
(GPU간 데이터 교환 성능)
많은 수의GPU Core
3D 스택킹 메모리TSV High BW 메모리
NVLink
추측 오차보정
많은 수의GPU Core
Infiniband를통한 Server Node간통신성능향샹
대형 신경망 모델 (Task 병렬화) 소형신경망모델 (Data 병렬화)
Batch
Predictions
Flower
Errors
Node B
Node A
Node A
Node B
Batch
Batch
Predictions
Flower
Errors
Predictions
House
Errors
4개의 100Gbps Infiniband 연결로대역폭개선
Deep Learning (Training) Workloads Require High Speed FabricRDMA is Key, bandwidth is Key
35
50% Better Performance & Linear Scaling with RDMA
Data Source: Courtesy of Mellanox Benchmark Test
6.5X Faster Training with 100GbE than
10GbE
Higher is better Higher is better
HPE ConfidentialHPE Confidential – Customer NDA required
HPE InfiniBand EDR 제품군
HPE EDR IB/EN 100Gb Adapters Mellanox EDR 36p Switches Mellanox EDR Modular Switches
216-port (12U+1U shelf)
648-port (28U+1U shelf)
324-port (16U+1U shelf)
‒ Mellanox CX-4 ASIC
‒ 1 포트 또는 2 포트
‒ EDR IB , 100GbE 지원
Mellanox IB ED
R 36p 스위치
‒ 36 개의 QSFP28 포트
‒ Managed , unmanage
d 모델
HPE Apollo 6000용 EDR 100Gb
A6000 새쉬 내장형 (24 downlink , 12 uplink ports)
HPE ICE-XA 용 Single, Dual,
Performance Dual Port EDR
HPE SGI 8600 Premium EDR IB Switch (18
downlink, 3 crosslink, 36 uplink ports)
37
“Make AI Work”IT를위한 AI 현업을위한 AI 미래를위한 AI
HPE InfoSight24x7 자원모니터링, 장애를예측하고선제적으로대응
HPE Aruba IntroSpect지능형엣지에서공격을탐지하는시스템
HPE Pointnext 컨설팅워크샵, 개발지원, 사용량기반 IT 서비스
AI 전용인프라GPU 컴퓨팅서버, 고성능스토리지/네트워크
검증되고 통합된 AI 솔루션
HPE Deep Learning Cookbook
미래를위한 AI 인프라Hewlett Packard Labs 을통한선진컴퓨팅기술 (Dot Product
Engine, Optical Computing 등)