oracle autonomous data warehouse를 활용한손쉬운 machine … · 집값예측...
TRANSCRIPT
1
1
Oracle Autonomous Data Warehouse를활용한손쉬운 Machine Learning의활용
Dongwook Lee / Principal Sales ConsultantAutonomous Database Team / Oracle Korea
Safe harbor statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.
The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.
2 Copyright © 2019 Oracle and/or its affiliates
Copyright © 2019 Oracle and/or its affiliates4
Decision Tree 모델 에 대한 각 언어별 구현
https://commons.wikimedia.org/wiki/File:CART_tree_titanic_survivors_KOR.png?uselang=ko
5 Copyright © 2019 Oracle and/or its affiliates
Oracle Machine Learning 의 역사
Oracle Data Mining 9iR2 (9.2.0.1.0 - May 2002)
Oracle Advanced Analytic Oracle R Enterprise
Oracle Data Mining
Oracle Autonomous DW
Apache Zeppelin
Oracle Data Mining(OML4SQL)
OML4Py (coming soon)
Option 이아닌 Default 적용
DB Option
DB Option
Machine Learning 프로세스 ( 어려운 이유 )
문제정의 / Feed Back
데이터수집 & 전처리
탐색적데이터평가
모델 적용
모델생성 / 평가
Pandas ( Data 처리 )NumPy ( 수학함수 )SciPy ( 과학함수 )Python( Programming )Pillow ( 이미지처리 Lib )
Matplotlib ( 대표그래프 Lib ) , Graphviz ( Decision Tree 그래프등 )
Sklearn (모델 Lib)
일반적으로작업의 80%
Copyright © 2019 Oracle and/or its affiliates6
Operationalizing and Embedding ML
ML 모델을실제생산에사용하는141 명의응답에기초
How long does it take to put a defined model into operational use?
?
?
Copyright © 2019 Oracle and/or its affiliates
8
Machine Learning 프로세스 ( Oracle Machine Learning )
문제정의 / Feed Back
데이터수집 & 전처리
탐색적데이터평가
모델 적용
모델생성 / 평가
SQLPL / SQLZeppelin Note Book
SQLPL / SQLOAC Visualization
Copyright © 2019 Oracle and/or its affiliates
9
Move the Algorithms, Not the Data!
Machine Learning Algorithms Require Data
X1
X2
A1 A2 A3 A4 A5 A6 A7
Copyright © 2019 Oracle and/or its affiliates
Oracle Machine Learning 장점
More Models
Better Models
Faster, More Secure
Less Cost
Oracle ML General ML
Data Move • Data 이동이 없음• File 형태로 Data 를 이동 후 분석 알
고리즘 수행
Security • 암호화된 DB 내부에서 작동 • 평문으로 된 File로 Download 해야 함
Preparation
• SQL , PL/SQL 등을 통한Data 정제
• Automatic Data Prep 기능을통한 Data 정규화, 표준화
• Pandas , Numpy , Scipy 등Programing 능력이 필요Data Source 에서 새로운 File 을추가로 Load 해야 함
Data Refresh
• 실시간 Data 를 반영한Model 구현
• Deploy 작업 없이 즉시Model 적용
• 비정기적 또는 배치 작업을 통한Data 반영
• Deploy 작업을 통한 사용
Data 분석 인력의 확충
• 최고의 분석 효율은 업무 전문가가 분석할 때• 기존의 분석 팀 + 업무 전문가의 분석역량 증대• 기업의 총 분석 역량 증가
Copyright © 2019 Oracle and/or its affiliates10
Copyright © 2019 Oracle and/or its affiliates11
ML 에서의 알고리즘
지도학습
비지도학습
Attribute Importance
Classification
Regression
Anomaly Detection
Association Rules
Clustering
Feature Extraction
목표로하는값과상관관계가있는항목찾기
목표값식별 ( 목표값이범주형 )
목표값예측 ( 목표값이연속형 )
통계적기법을사용한이상치찾기
연관관계분석(장바구니분석이유명)
군집화
특성추출및조합 ( Text 분석에서사용됨 )
Oracle Machine Learning Algorithms
CLASSIFICATIONNaïve BayesLogistic Regression (GLM)Decision TreeRandom ForestNeural NetworkSupport Vector MachineExplicit Semantic Analysis
CLUSTERINGHierarchical K-MeansHierarchical O-ClusterExpectation Maximization (EM)
ANOMALY DETECTIONOne-Class SVM
TIME SERIESForecasting - Exponential SmoothingIncludes popular models e.g. Holt-Winters with trends, seasonality, irregularity, missing data
REGRESSIONLinear ModelGeneralized Linear ModelSupport Vector Machine (SVM)Stepwise Linear regressionNeural Network
ATTRIBUTE IMPORTANCEMinimum Description LengthPrincipal Comp Analysis (PCA)Unsupervised Pair-wise KL DivCUR decomposition for row & AI
ASSOCIATION RULESA priori/ market basket
PREDICTIVE QUERIESPredict, cluster, detect, features
SQL ANALYTICSSQL WindowsSQL PatternsSQL Aggregates
FEATURE EXTRACTIONPrincipal Comp Analysis (PCA)Non-negative Matrix FactorizationSingular Value Decomposition (SVD)Explicit Semantic Analysis (ESA)
TEXT MINING SUPPORTAlgorithms support textTokenization and theme extractionExplicit Semantic Analysis (ESA) for document similarity
STATISTICAL FUNCTIONSBasic statistics: min, max, median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc.
MODEL DEPLOYMENTSQL—1st Class Objects Oracle RESTful API (ORDS)OML Microservices (for Apps)
X1
X2
A1 A2 A3A4 A5 A6 A7
Copyright © 2019 Oracle and/or its affiliates11
Copyright © 2019 Oracle and/or its affiliates15
Oracle ML 동영상 요약
1978 년도보스턴집값예측
집값변동과관련있는항목검토
AttributeImportant 수행
연관성비율확인
데이터분포확인
집값예측(Regression)
기본설정
모델평가R Square 값확인
모델 평가Coefficient 값 확인
보다설명력이높은모델필요
모델평가R Square 값확인
Auto Feature Selection
Auto Feature Generation
집값예측(Regression)
추가설정
모델 평가Coefficient 값 확인
보다설명력이높은모델필요
현재 집값과예측 집값 비교
Copyright © 2019 Oracle and/or its affiliates16
Oracle ML 동영상 하이라이트 요약
Sample 명령어를 사용하여Training Data 생성
Minus 명령어를 사용하여차집합 Test Data 생성
Copyright © 2019 Oracle and/or its affiliates21
Oracle ML 동영상 하이라이트 요약
Auto Feature SelectAuto Feature Generation에 대한 설정을 Enable
Copyright © 2019 Oracle and/or its affiliates22
Oracle ML 동영상 하이라이트 요약
Auto Feature SelectAuto Feature Generation
를 사용한 Train 모델의R_SQ 값 90.13
Copyright © 2019 Oracle and/or its affiliates23
Oracle ML 동영상 하이라이트 요약
Auto Feature SelectAuto Feature Generation
를 사용한 Test Data의R_SQ 값 86.49
Copyright © 2019 Oracle and/or its affiliates24
Oracle ML 동영상 하이라이트 요약
Auto Feature Generation옵션에 의해 자동 생성된
Feature
AutoML – new with OML4PyIncrease data scientist productivity – reduce overall compute time
Auto ModelSelection
Much faster than exhaustive search
Auto FeatureSelection
>50% reduction in features
AutoTune
Significant score improvement
MLModel
Enables non-expert users to leverage Machine Learning
DataTable
Auto Feature Selection– Reduce # of features by
identifying most predictive
– Improve performance and accuracy
Auto Model Selection– Identify in-database
algorithm that achieves highest model quality
– Find best model faster than with exhaustive search
Auto Tune Hyperparameters– Significantly improve
model accuracy
– Avoid manual or exhaustive search techniques
Copyright © 2019 Oracle and/or its affiliates33
Reduce # features by identifying most relevantImprove performance and accuracy
Auto Feature Selection: examples
0
5
10
15
20
25
30
299 9
Trai
nin
g ti
me
(sec
on
ds)
ML training time
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
299 9
Acc
ura
cy
Prediction Accuracy
33xreduction
+4%
OpenML dataset 312 with 1925 rows, 299 columns OpenML dataset 40996 with 56K rows, 784 columns
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
784 309
Acc
ura
cy
Prediction Accuracy
+18%
60% reduction1.3X time reduction to build SVM Gaussian model
97% reduction
Copyright © 2019 Oracle and/or its affiliates34
Cross-Platform Machine LearningMultiple user interfaces and APIsDeployed in cloud and on-premisesFrom database to entire data management ecosystem
Oracle Cloud SQLOML4R
OML4Python
REST
OML4SQL
SQL Developer
Popular RIDEs
Popular Python IDEs
OML Notebooks
SelectUser Interface, e.g.
APIOptions
Cloud or On-premises
Reach broaderData Sources
Oracle Object Storage
Big DataService (HDFS)
NoSQLDatabases
KafkaStreams
Amazon S3
Azure Blob Storage
Oracle Database
Data Lake
OML4Spark
Oracle Big Data SQL
OCI Data Science
Copyright © 2019 Oracle and/or its affiliates35
Summary
• Oracle Machine Learning 은 Data 분석을보다쉽고빠르게할수있게도와주는 In-DB Machine Learning도구입니다
• Converged 된 DB 에통합된모든 Data 를 SQL 을사용하여분석할수있게합니다
• SQL 이외 Python , R ,시각화제품등다양한분석언어로확장해나가고있습니다
Copyright © 2019 Oracle and/or its affiliates37
Thank you
Dongwook Lee
Executive Vice PresidentApplications & Product Development
39 Copyright © 2019 Oracle and/or its affiliates