ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... ·...

30
Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University Of Seoul 김태준(Jun Kim) [email protected] 2017. 04. 03. 서울시립대학교 정보기술관

Upload: others

Post on 28-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Ch 4. A Fact Table for Each Process

Star Schema

Data Mining Lab. University Of Seoul

김태준(Jun Kim) [email protected] 2017. 04. 03.

서울시립대학교 정보기술관

Page 2: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Overview

• Process를 나누어 fact table로 만드는 올바른 방법

• 잘못된 fact table 설계 예

• Drill Across

Page 3: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Fact Tables and Business Processes

• Dimensional model은 사람들이 어떻게 세상을 측정(measure)하는지 서술

• 각 star schema는 특정 process를 서술하는 measure들을 담고있는 한 fact table을 가짐

• Fact 혹은 measure의 context는 dimension을 통해 알 수 있음

• Business process 예) 주문, 배송

Page 4: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Process• Process는 여러개의 subprocess로 나누어질 수 있음

• Subprocess로 나눠야할지 모르겠다면, 다음과 같은 질문을:- Fact들이 동시에 일어나는가?- Fact들이 같은 수준의 detail(grain)을 같는가?

• 위 질문 중 하나라도 NO라면 나눠야함

• 즉, 다음과 같은 fact들은 다른 fact table에 저장해야함:- Facts that have different timing - Facts that have different grain

• 여러 fact들을 한 fact table에 담으면 문제가 생길 수 있음

Page 5: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

BADFacts that have Different Timing

주문(order)과 배송(shipment)은 동시에 일어나지 않음

Page 6: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Problem: 한 가지 fact(shipment)에 집중하여 report를 생성하는 경우

BAD

배송되지 않은 것까지 0으로 출력

Page 7: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Generic Fact BAD

두 process를 비교하기 힘듬

리포트에 별도의 formatting 필요

PRODUCT Quantity Shipped

111 100

222 200

Bad solution:

Page 8: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Separate Fact TablesGOOD!

Shared dimensions

Separate facts

Page 9: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Special row for no shipment

BADorder: 주문 하나 shipment: 배송 하나

Facts that have Different Grain

Page 10: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Special row for no shipment

BADorder: 주문 하나 shipment: 배송 하나

Facts that have Different Grain

Page 11: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

NULL as Special Key BAD

[Not a shipper] is gone

Page 12: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Separate Fact TablesGOOD!

Role playing keys: Ch.6

Shipment fact만 shipper dim.과 연결

Page 13: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Drilling Across

• 두 개 이상의 fact table로부터 fact들을 분석하는 것

• 두 단계로 이루어짐:1) 각 fact table을 요약2) 요약된 정보들을 통합

• 한번에 fact table들을 join하는 것은 위험

• Drilling up, down과는 아무 상관이 없음

Page 14: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Drilling Across Example

Drilling Across

배송되지 않은 주문 한 건

주문 vs. 배송

Page 15: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

The Peril of Joining Fact TablesBAD

Cross join(cartesian product)하면 두개가 됨X

double counted

Page 16: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

ANSI SQL Join

• Filter logic과 relationship logic 분리

• Table 간의 관계가 한눈에 보임

• Execution plan은 동일

Page 17: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Drilling Across Overview

Step 1

Step 2

각 fact table에 대해 질의, 동일한 수준의 차원을 갖는 결과를 얻음

각 질의로부터 얻은 결과 통합

Page 18: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Step 1. 각 Fact table에 질의

Page 19: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Step 2. 질의 결과 통합

Page 20: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Drilling Across

Page 21: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

3 ways to Drill AcrossCase 1 Case 2 Case 3

Page 22: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Softwares

Tableu PowerBI Excel Zeppelin

Hive

OLAP Engine: Kylin, Lens

ETL Spark

Page 23: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Case 1 Case 2 Case 3

Network traffic

Reporting tool이 drill across를 지원하지 않으면 개발이 매우 힘들어짐

Page 24: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Case 1 Case 2 Case 3

Temporary tables

Report 생성 후 매번 지워야함

Page 25: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Case 1 Case 2 Case 3

GOOD!

• 모든 과정을 SQL로 처리

• Reporting tool은 결과 시각화만 담당

Page 26: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Case 3 SQL Example

Query 결과

Page 27: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Step 1. Query for each fact tableStep 2. Summarize them

Page 28: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Step 1. Query for each fact tableStep 2. Summarize them

Page 29: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

Summary

• 각각의 process를 분석할 수 있게 fact table을 분리할 것

• 분리해야할지 명확하지 않다면, 두 가지 질문을 하라:- Same time?- Same grain?

• Drill across시에 한번에 fact table들을 join하지 말고,2 단계에 걸쳐 join할 것

Page 30: ch4-a fact table for each processdatamining.uos.ac.kr/wp-content/uploads/2017/02/ch4-a... · 2017-04-03 · Ch 4. A Fact Table for Each Process Star Schema Data Mining Lab. University

감사합니다김태준(Jun Kim)

[email protected]