aws cloud 2018- aws의 새로운 통합 머신러닝 플랫폼 서비스, amazon sagemaker...
TRANSCRIPT
AWS의새로운통합머신러닝플랫폼서비스Amazon�SageMaker
김무현 / AWS�솔루션즈아키텍트
목차
• 머신러닝프로세스리뷰• Amazon�SageMaker�소개• Amazon�SageMaker�주요기능• 데모
모든개발자와데이터과학자들이사용할수있는서비스를제공하는것입니다.
ML on AWS: Our mission
F R A M E W O R K S A N D I N T E R F A C E SA W S D E E P L E A R N I N G A P I
Apache MXNet TensorFlowCaffe2 Torch KerasCNTK PyTorch GluonTheano
P L A T F O R M S E R V I C E S
V I S I O N
AWS DeepLens
L A N G U A G E
A P P L I C A T I O N S E R V I C E S
Amazon Rekognit ion Amazon Pol ly Amazon Lex
Amazon Rekognit ion Video Amazon Transcr ibe Amazon Trans late Amazon Comprehend
Alexa for Bus iness
V R / I R Amazon Sumer ianAmazon Kines isVideo Streams
AWS�머신러닝스택
Amazon SageMaker
머신러닝프로세스를함께볼까요
Data Visualization &Analysis
Business Problem –
Data Collection
Data Integration
Data Preparation &Cleaning
Feature Engineering
Model Training &Parameter Tuning
Model Evaluation
Are Business Goals met?
Model Deployment
Monitoring & Debugging
– Predictions
YesNo
Dat
a A
ugm
enta
tion
Feat
ure
Aug
men
tati
on
Re-trainingML problem framing
머신러닝프로세스
Data Visualization &Analysis
Business Problem –
Data Collection
Data Integration
Data Preparation &Cleaning
Feature Engineering
Model Training &Parameter Tuning
Model Evaluation
Are Business Goals met?
Model Deployment
Monitoring & Debugging
– Predictions
YesNo
Dat
a A
ugm
enta
tion
Feat
ure
Aug
men
tati
on
Re-trainingML problem framing
탐색: 분석
•정확한질문을만들는데도움을주는것• Domain Knowledge
Data Visualization &Analysis
Business Problem –
Data Collection
Data Integration
Data Preparation &Cleaning
Feature Engineering
Model Training &Parameter Tuning
Model Evaluation
Are Business Goals met?
Model Deployment
Monitoring & Debugging
– Predictions
YesNo
Dat
a A
ugm
enta
tion
Feat
ure
Aug
men
tati
on
Re-trainingML problem framing
통합: 데이터아키텍처
•데이터플렛폼구현• Amazon S3• AWS Glue• Amazon Athena• Amazon EMR• Amazon Redshift Spectrum
Data Visualization &Analysis
Business Problem –
Data Collection
Data Integration
Data Preparation &Cleaning
Feature Engineering
Model Training &Parameter Tuning
Model Evaluation
Are Business Goals met?
Model Deployment
Monitoring & Debugging
– Predictions
YesNo
Dat
a A
ugm
enta
tion
Feat
ure
Aug
men
tati
on
Re-trainingML problem framing
왜 SageMaker를 만들었나 학습환경제공
• 노트북환경구성및관리
• 학습클러스터구성및관리
• 데이터커넥터작성
• 대용량데이터를수용할수있는확장성있는 ML 알고리즘
• 여러노드를사용할수있는분산 ML 학습알고리즘
• 모델결과물에대한보안
Data Visualization &Analysis
Business Problem –
Data Collection
Data Integration
Data Preparation &Cleaning
Feature Engineering
Model Training &Parameter Tuning
Model Evaluation
Are Business Goals met?
Model Deployment
Monitoring & Debugging
– Predictions
YesNo
Dat
a A
ugm
enta
tion
Feat
ure
Aug
men
tati
on
Re-trainingML problem framing
왜 SageMaker를 만들었나 배포환경제공
• 인퍼런스클러스터구성및운영
• 인퍼런스 API를확장성있게구성하고운영하기
• 모델예측결과에대한모니터링과디버깅
• 모델버저닝과성능추적
• 새로운모델을운영환경에A/B 테스팅형태로배포
데이터과학자와개발자들이스마트어플리케이션에사용될머신러닝기반의모델을빠르고쉽게만들도록해주는
완전관리형서비스
Amazon SageMakerNEW!
Amazon SageMaker구축
사전에빌드된
노트북인스턴스
고도로최적화된
머신러닝알고리즘들
한번클릭으로
ML, DL, 커스텀알고리즘학습
구축사전에빌드된
노트북인스턴스
하이퍼파라메터최적화를통한손쉬운학습
학습
Amazon SageMaker
고도로최적화된
머신러닝알고리즘들
엔지니어링노력이필요없는배포
확장성이있는관전관리형모델호스팅
구축사전에빌드된
노트북인스턴스
배포 학습
Amazon SageMaker
고도로최적화된
머신러닝알고리즘들
한번클릭으로
ML, DL, 커스텀알고리즘학습
하이퍼파라메터최적화를통한손쉬운학습
End-to-End 머신러닝플랫폼
제로셋업 유연한모델학습 초당과금
$
머신러닝모델을확장성이있도록빌드,학습,배포
Amazon�SageMaker
Amazon ECR
Model Training (on EC2)
Amazon SageMaker
Client application
Training code
학습코드및학습데이터세트
준비
Amazon ECR
Model Training (on EC2)
Trai
ning
dat
a
Training code Helper code
Client application
Training code
Amazon SageMaker
학습환경구성및
학습수행
Amazon ECR
Model Training (on EC2)
Trai
ning
dat
a
Mod
el a
rtifa
cts
Training code Helper code
Client application
Inference code
Training code
Amazon SageMaker
학습완료된모델저장및
예측코드준비
Amazon ECR
Model Training (on EC2)
Model Hosting (on EC2)
Trai
ning
dat
a
Mod
el a
rtifa
cts
Training code Helper code
Helper codeInference code
Client application
Inference code
Training code
Amazon SageMaker
예측환경구성및
모델호스팅
Amazon ECR
Model Training (on EC2)
Model Hosting (on EC2)
Trai
ning
dat
a
Mod
el a
rtifa
cts
Training code Helper code
Helper codeInference code
Client application
Inference code
Training code
Inference requestInference response
Inference Endpoint
Amazon SageMaker
예측엔드포인트를통한 API 서비
스제공
Amazon ECR
Model Training (on EC2)
Model Hosting (on EC2)
Trai
ning
dat
a
Mod
el a
rtifa
cts
Training code Helper code
Helper codeInference code
Gro
und
Trut
h
Client application
Inference code
Training code
Inference requestInference response
Inference Endpoint
Amazon SageMaker
새로운학습데이터수집및재학습, 배포 …
Intuit에서말하는 SageMaker의장점들
노트북환경을 Ad-hoc 하게구성하고관리해야했음
모델배포환경선택의제약
팀간의컴퓨트자원경쟁
SageMaker 노트북에서쉬운데이터탐색가능
가상화를활용한유연한배포환경구성
Auto-scale이지원되는모델호스팅환경
SageMaker 사용전 SageMaker 사용후
Model Hosting (SageMaker)
SageMaker를이용한실시간사기탐지기능
Calculate Features
Reader
Cleanser
Processor
Data
Lookup
Training
Feature StoreModel Training (SageMaker)
Model
Client ServiceAmazon EM
R
Amazon SageMaker
Amazon SageMaker
Amazon SageMaker
1 2 3 4
I I I INotebook 인스턴스 ML 알고리즘들 ML 학습서비스 ML 호스팅서비스
1
INotebook 인스턴스
데이터탐색을통한분석을바로수행
노트북작성
AWS 데이터베이스서비스에 ELT 접근
S3 데이터레이크접근
• 추전/개인화
• 이상거래탐지
• 예측
• 이미지분류
• 이탈예측
• 마케팅이메일, 캠패인타케팅
• 로그프로세스및이상탐지
• 음성을텍스트로변환
• …
“Just add data”
Zero�Setup
예제 - Jupyter 노트북만들기
SageMaker가없다면 …
1. AWS Deep Learning AMI 선택2. EC2 인스턴스생성3. Jupuyter 노트북서비스구동4. SSH 터널링설정5. Jupyter 노트북접속6. 문제생기면, 3번부터반복
SageMaker를이용하면 …
1. Jupyter 인스턴스생성요청2. AWS 콘솔에서 Jupyter 노트북열기
IAM Role
데모
2
IAlgorithms Training code
• Matrix Factorization• Regression• Principal Component Analysis• K-Means Clustering• Gradient Boosted Trees• Time-series Prediction• Image Classification• 더많은알고리즘추가예정
Amazon provided Algorithms
Bring Your Own Script (SageMaker builds the Container)
SageMaker Estimators in Apache Spark Bring Your Own Algorithm (You build the Container)
속도와큰데이터에최적화된 ML Algorithm
데이터셋스트리밍을통한저렴한학습비용
싱글패스로빠른학습
아주큰데이터셋에대한학습이가능
다양한ML 알고리즘제공
IAlgorithms
Amazon�SageMaker ML�알고리즘특징
빌트인ML�Algorithm 종류
문제 알고리즘 러닝형태
Discrete Classification,Regression
Linear Learner Supervised
XGBoost Algorithm Supervised
Discrete Recommendations Factorization Machines Supervised
Image Classification Image Classification Algorithm Supervised, CNN
Neural Machine Translation Sequence to Sequence Supervised, seq2seq
Time-series Prediction DeepAR Supervised, RNN
Discrete Groupings K-Means Algorithm Unsupervised
Dimensionality Reduction PCA (Principal Component Analysis) Unsupervised
Topic Determination Latent Dirichlet Allocation (LDA) Unsupervised
Neural Topic Model (NTM) Unsupervised, Neural Network Based
알고리즘을조금더살펴보면…
Neural�Machine�Translation
• Recurrent�Neural�Networks�(RNNs)와 Convolutional�Neural�Network�(CNN)�모델을 attention과 함께사용하는 encoder-decoder�아키텍처
• 활용예 - 기계번역, 텍스트요약, 음성을텍스트로변환
Time-series�Forecasting
• 관련된여러 Time-series�학습데이터들로부터패턴을학습해서정확한예측모델을생성하는알고리즘
• https://arxiv.org/abs/1704.04110• 특정시점에대한예측및특정기간에대한예측결과제공
SageMaker를 이용한학습의단순화
1.학습환경구성1) EC2, EBS 생성2)필요시클러스터구성
2.EC2에필요한파일들복사1) 학습스크립트복사2) 학습데이터복사
3.학습수행
4.학습완료후,1)모델을영구스토리지로이동2)학습환경삭제
Amazon�SageMaker를 Apache�Spark와 함께사용
Apache Spark
데이터전처리 모델학습 모델호스팅
Amazon SageMaker
• 연동을위한 SageMaker Spark SDK 제공• Spark ML 파이프라인에 SageMaker를통합해서학습및모델호스팅에사용할수있음
어떤머신러닝/딥러닝프레임워크와도함께사용
• ML/DL 프레임워크• 학습스크립트• 예측스크립트• API 설정
ECR
학습데이터 API 서비스
학습된모델
•학습수행•모델호스팅
Training code
• Matrix Factorization• Regression• Principal Component Analysis• K-Means Clustering• Gradient Boosted Trees• And More!
Amazon provided Algorithms
Bring Your Own Script (SageMaker builds the Container)
Bring Your Own Algorithm (You build the Container)
3
IML 학습서비스
학습데이터로딩
모델결과물저장
Fully managed –
Secured–
Amazon ECR예측코드저장
SageMaker Estimators in Apache Spark
CPU GPU HPO
유연한분산학습환경을관리형서비스로제공
4
IML Hosting Service
Amazon ECR
Amazon SageMaker
쉬운모델배포
Amazon ECR
Model Artifacts
Inference Image
모델생성
ModelName: prod
Amazon SageMaker
4
IML Hosting Service
쉬운모델배포
Amazon ECR
Model Artifacts
Inference Image
여러버전의모델
모델의 버전들생성
Amazon SageMaker
4
IML Hosting Service
쉬운모델배포
Amazon ECR
30 50
10 10
InstanceType: c3.4xlarge
InitialInstanceCount: 3
ModelName: prod
VariantName: primary
InitialVariantWeight: 50
Model Artifacts
Inference Image
가중치를적용한ProductionVariants생성
Amazon SageMaker
ProductionVariant
여러버전의모델
4
IML Hosting Service
쉬운모델배포
Amazon ECR
30 50
10 10
Model Artifacts
Inference Image
하나이상의ProductionVariant를이용해서EndpointConfiguration 생성EndpointConfiguration
Amazon SageMaker
ProductionVariant
InstanceType: c3.4xlarge
InitialInstanceCount: 3
ModelName: prod
VariantName: primary
InitialVariantWeight: 50
여러버전의모델
4
IML Hosting Service
쉬운모델배포
Amazon ECR
30 50
10 10
Model Artifacts
Inference Image
EndpointConfiguration 를이용해서 Endpoint 생성
EndpointConfiguration
Amazon SageMaker
ProductionVariant
InstanceType: c3.4xlarge
InitialInstanceCount: 3
ModelName: prod
VariantName: primary
InitialVariantWeight: 50
여러버전의모델
Inference Endpoint
4
IML Hosting Service
쉬운모델배포
데모: 학습에서배포까지
데모 어떤문제를풀어볼까요
데모 어떤문제를풀어볼까요
Species
0 – Iris Setosa1 – Iris Versicolor2 – Iris Virginica
데이터
학습데이터: 120 샘플 / iris_training.csv테스트데이터: 30 샘플 / iris_test.csv
데모 학습, 예측스크립트준비
데모 S3�버킷이름정의
데모 SageMaker�TensorFlow�객체생성
데모 학습수행
데모 학습수행
데모 학습수행
데모 모델배포및테스트
데모 모델배포및테스트
데모 모델배포및테스트
데모 모델배포및테스트
1. “Getting�started�with�Amazon�SageMaker”를 통한간단한실습
2. Amazon�SageMaker�SDK 사용법익히기• Python:�https://github.com/aws/sagemaker-python-sdk• Spark:�https://github.com/aws/sagemaker-spark
3. SageMaker�예제들
https://github.com/awslabs/amazon-sagemaker-examples
본강연이끝난후…
감사합니다