oracle autonomous data warehouse를 활용한손쉬운 machine … · 집값예측...

40
1 Oracle Autonomous Data Warehouse활용한 손쉬운 Machine Learning의 활용 Dongwook Lee / Principal Sales Consultant Autonomous Database Team / Oracle Korea

Upload: others

Post on 20-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

1

1

Oracle Autonomous Data Warehouse를활용한손쉬운 Machine Learning의활용

Dongwook Lee / Principal Sales ConsultantAutonomous Database Team / Oracle Korea

Safe harbor statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.

The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.

2 Copyright © 2019 Oracle and/or its affiliates

Oracle Machine Learning 설명

Program agenda with highlight

1

Copyright © 2019 Oracle and/or its affiliates4

Decision Tree 모델 에 대한 각 언어별 구현

https://commons.wikimedia.org/wiki/File:CART_tree_titanic_survivors_KOR.png?uselang=ko

5 Copyright © 2019 Oracle and/or its affiliates

Oracle Machine Learning 의 역사

Oracle Data Mining 9iR2 (9.2.0.1.0 - May 2002)

Oracle Advanced Analytic Oracle R Enterprise

Oracle Data Mining

Oracle Autonomous DW

Apache Zeppelin

Oracle Data Mining(OML4SQL)

OML4Py (coming soon)

Option 이아닌 Default 적용

DB Option

DB Option

Machine Learning 프로세스 ( 어려운 이유 )

문제정의 / Feed Back

데이터수집 & 전처리

탐색적데이터평가

모델 적용

모델생성 / 평가

Pandas ( Data 처리 )NumPy ( 수학함수 )SciPy ( 과학함수 )Python( Programming )Pillow ( 이미지처리 Lib )

Matplotlib ( 대표그래프 Lib ) , Graphviz ( Decision Tree 그래프등 )

Sklearn (모델 Lib)

일반적으로작업의 80%

Copyright © 2019 Oracle and/or its affiliates6

Operationalizing and Embedding ML

ML 모델을실제생산에사용하는141 명의응답에기초

How long does it take to put a defined model into operational use?

?

?

Copyright © 2019 Oracle and/or its affiliates

8

Machine Learning 프로세스 ( Oracle Machine Learning )

문제정의 / Feed Back

데이터수집 & 전처리

탐색적데이터평가

모델 적용

모델생성 / 평가

SQLPL / SQLZeppelin Note Book

SQLPL / SQLOAC Visualization

Copyright © 2019 Oracle and/or its affiliates

9

Move the Algorithms, Not the Data!

Machine Learning Algorithms Require Data

X1

X2

A1 A2 A3 A4 A5 A6 A7

Copyright © 2019 Oracle and/or its affiliates

Oracle Machine Learning 장점

More Models

Better Models

Faster, More Secure

Less Cost

Oracle ML General ML

Data Move • Data 이동이 없음• File 형태로 Data 를 이동 후 분석 알

고리즘 수행

Security • 암호화된 DB 내부에서 작동 • 평문으로 된 File로 Download 해야 함

Preparation

• SQL , PL/SQL 등을 통한Data 정제

• Automatic Data Prep 기능을통한 Data 정규화, 표준화

• Pandas , Numpy , Scipy 등Programing 능력이 필요Data Source 에서 새로운 File 을추가로 Load 해야 함

Data Refresh

• 실시간 Data 를 반영한Model 구현

• Deploy 작업 없이 즉시Model 적용

• 비정기적 또는 배치 작업을 통한Data 반영

• Deploy 작업을 통한 사용

Data 분석 인력의 확충

• 최고의 분석 효율은 업무 전문가가 분석할 때• 기존의 분석 팀 + 업무 전문가의 분석역량 증대• 기업의 총 분석 역량 증가

Copyright © 2019 Oracle and/or its affiliates10

Copyright © 2019 Oracle and/or its affiliates11

ML 에서의 알고리즘

지도학습

비지도학습

Attribute Importance

Classification

Regression

Anomaly Detection

Association Rules

Clustering

Feature Extraction

목표로하는값과상관관계가있는항목찾기

목표값식별 ( 목표값이범주형 )

목표값예측 ( 목표값이연속형 )

통계적기법을사용한이상치찾기

연관관계분석(장바구니분석이유명)

군집화

특성추출및조합 ( Text 분석에서사용됨 )

Oracle Machine Learning Algorithms

CLASSIFICATIONNaïve BayesLogistic Regression (GLM)Decision TreeRandom ForestNeural NetworkSupport Vector MachineExplicit Semantic Analysis

CLUSTERINGHierarchical K-MeansHierarchical O-ClusterExpectation Maximization (EM)

ANOMALY DETECTIONOne-Class SVM

TIME SERIESForecasting - Exponential SmoothingIncludes popular models e.g. Holt-Winters with trends, seasonality, irregularity, missing data

REGRESSIONLinear ModelGeneralized Linear ModelSupport Vector Machine (SVM)Stepwise Linear regressionNeural Network

ATTRIBUTE IMPORTANCEMinimum Description LengthPrincipal Comp Analysis (PCA)Unsupervised Pair-wise KL DivCUR decomposition for row & AI

ASSOCIATION RULESA priori/ market basket

PREDICTIVE QUERIESPredict, cluster, detect, features

SQL ANALYTICSSQL WindowsSQL PatternsSQL Aggregates

FEATURE EXTRACTIONPrincipal Comp Analysis (PCA)Non-negative Matrix FactorizationSingular Value Decomposition (SVD)Explicit Semantic Analysis (ESA)

TEXT MINING SUPPORTAlgorithms support textTokenization and theme extractionExplicit Semantic Analysis (ESA) for document similarity

STATISTICAL FUNCTIONSBasic statistics: min, max, median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc.

MODEL DEPLOYMENTSQL—1st Class Objects Oracle RESTful API (ORDS)OML Microservices (for Apps)

X1

X2

A1 A2 A3A4 A5 A6 A7

Copyright © 2019 Oracle and/or its affiliates11

Copyright © 2019 Oracle and/or its affiliates13

Oracle ML 예제

Program agenda with highlight

2 Oracle ML 동영상 하이라이트요약

Copyright © 2019 Oracle and/or its affiliates15

Oracle ML 동영상 요약

1978 년도보스턴집값예측

집값변동과관련있는항목검토

AttributeImportant 수행

연관성비율확인

데이터분포확인

집값예측(Regression)

기본설정

모델평가R Square 값확인

모델 평가Coefficient 값 확인

보다설명력이높은모델필요

모델평가R Square 값확인

Auto Feature Selection

Auto Feature Generation

집값예측(Regression)

추가설정

모델 평가Coefficient 값 확인

보다설명력이높은모델필요

현재 집값과예측 집값 비교

Copyright © 2019 Oracle and/or its affiliates16

Oracle ML 동영상 하이라이트 요약

Sample 명령어를 사용하여Training Data 생성

Minus 명령어를 사용하여차집합 Test Data 생성

Copyright © 2019 Oracle and/or its affiliates17

Oracle ML 동영상 하이라이트 요약

Copyright © 2019 Oracle and/or its affiliates18

Oracle ML 동영상 하이라이트 요약

Copyright © 2019 Oracle and/or its affiliates19

Oracle ML 동영상 하이라이트 요약

Training Model 의R_SQ 값 74.66

Copyright © 2019 Oracle and/or its affiliates20

Oracle ML 동영상 하이라이트 요약

생성된 ML 모델을함수형태로 바로 사용 가능

Copyright © 2019 Oracle and/or its affiliates21

Oracle ML 동영상 하이라이트 요약

Auto Feature SelectAuto Feature Generation에 대한 설정을 Enable

Copyright © 2019 Oracle and/or its affiliates22

Oracle ML 동영상 하이라이트 요약

Auto Feature SelectAuto Feature Generation

를 사용한 Train 모델의R_SQ 값 90.13

Copyright © 2019 Oracle and/or its affiliates23

Oracle ML 동영상 하이라이트 요약

Auto Feature SelectAuto Feature Generation

를 사용한 Test Data의R_SQ 값 86.49

Copyright © 2019 Oracle and/or its affiliates24

Oracle ML 동영상 하이라이트 요약

Auto Feature Generation옵션에 의해 자동 생성된

Feature

Program agenda with highlight

3 Oracle ML 동영상

Program agenda with highlight

3

4 Data Visualization

상관관계매트릭스 그래프

중요도 분석의 결과 저장 된Data 를 호출하여 시각화

집값 하위 계층 비율시각화

첫 번째 집값 예측 모델Coefficient 결과 시각화

두 번째 집값 예측 모델Coefficient 결과 시각화

Congratulations!

Data Scientist

Copyright © 2019 Oracle and/or its affiliates31

Program agenda with highlight

5 Oracle ML Road Map

AutoML – new with OML4PyIncrease data scientist productivity – reduce overall compute time

Auto ModelSelection

Much faster than exhaustive search

Auto FeatureSelection

>50% reduction in features

AutoTune

Significant score improvement

MLModel

Enables non-expert users to leverage Machine Learning

DataTable

Auto Feature Selection– Reduce # of features by

identifying most predictive

– Improve performance and accuracy

Auto Model Selection– Identify in-database

algorithm that achieves highest model quality

– Find best model faster than with exhaustive search

Auto Tune Hyperparameters– Significantly improve

model accuracy

– Avoid manual or exhaustive search techniques

Copyright © 2019 Oracle and/or its affiliates33

Reduce # features by identifying most relevantImprove performance and accuracy

Auto Feature Selection: examples

0

5

10

15

20

25

30

299 9

Trai

nin

g ti

me

(sec

on

ds)

ML training time

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

299 9

Acc

ura

cy

Prediction Accuracy

33xreduction

+4%

OpenML dataset 312 with 1925 rows, 299 columns OpenML dataset 40996 with 56K rows, 784 columns

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

784 309

Acc

ura

cy

Prediction Accuracy

+18%

60% reduction1.3X time reduction to build SVM Gaussian model

97% reduction

Copyright © 2019 Oracle and/or its affiliates34

Cross-Platform Machine LearningMultiple user interfaces and APIsDeployed in cloud and on-premisesFrom database to entire data management ecosystem

Oracle Cloud SQLOML4R

OML4Python

REST

OML4SQL

SQL Developer

Popular RIDEs

Popular Python IDEs

OML Notebooks

SelectUser Interface, e.g.

APIOptions

Cloud or On-premises

Reach broaderData Sources

Oracle Object Storage

Big DataService (HDFS)

NoSQLDatabases

KafkaStreams

Amazon S3

Azure Blob Storage

Oracle Database

Data Lake

OML4Spark

Oracle Big Data SQL

OCI Data Science

Copyright © 2019 Oracle and/or its affiliates35

Program agenda with highlight

6 Summary

Summary

• Oracle Machine Learning 은 Data 분석을보다쉽고빠르게할수있게도와주는 In-DB Machine Learning도구입니다

• Converged 된 DB 에통합된모든 Data 를 SQL 을사용하여분석할수있게합니다

• SQL 이외 Python , R ,시각화제품등다양한분석언어로확장해나가고있습니다

Copyright © 2019 Oracle and/or its affiliates37

Thank you

Dongwook Lee

Executive Vice PresidentApplications & Product Development

39 Copyright © 2019 Oracle and/or its affiliates