1 資料倉儲介紹 data warehousing and olap 楊立偉教授 台灣大學工管系

27
1 資資資資資資 資資資資資資 Data Warehousing and OLAP Data Warehousing and OLAP 資資資資資 資資資資資 資資資資資資資 資資資資資資資

Upload: wendy-hutchinson

Post on 14-Dec-2015

240 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

1

資料倉儲介紹資料倉儲介紹Data Warehousing and Data Warehousing and

OLAPOLAP

楊立偉教授楊立偉教授台灣大學工管系台灣大學工管系

Page 2: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

2

AgendaAgenda

1.1. IntroductionIntroduction

2.2. Data Warehouse TheoryData Warehouse Theory

3.3. System FeaturesSystem Features

4.4. DemoDemo

5.5. DiscussionsDiscussions

Page 3: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

3

1. 1. IntroductionIntroduction

Page 4: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

4

1.1 Introduction1.1 Introduction

• A A data warehousedata warehouse is a subject- is a subject-oriented, integrated, time-variant, oriented, integrated, time-variant, nonvolatile collection of data in nonvolatile collection of data in support of management decisionssupport of management decisions

Page 5: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

5

1.1 Introduction (cont’d)1.1 Introduction (cont’d)

How are organizations using data warehouse ?How are organizations using data warehouse ?

1.1. Increasing customer focusIncreasing customer focus, which includes the analysis of , which includes the analysis of

customer buying patterns.customer buying patterns.

2.2. Repositioning products and managing product portfoliosRepositioning products and managing product portfolios

by comparing the performance of sales by time or by comparing the performance of sales by time or

regions, in order to fine-tune production strategiesregions, in order to fine-tune production strategies

3.3. Analyzing operations and looking for Analyzing operations and looking for sources of profitsources of profit

4.4. Managing the customer relationship, making Managing the customer relationship, making

environmental corrections, and managing the cost of environmental corrections, and managing the cost of

corporate assetscorporate assets

Page 6: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

6

1.2 Data Warehouse Characteristics1.2 Data Warehouse Characteristics

• It is a database designed for It is a database designed for analytical analytical taskstasks, using data from multiple , using data from multiple applicationsapplications

• It supports a relatively small number of It supports a relatively small number of users with relatively long interactionsusers with relatively long interactions

• Its usage is Its usage is read-intensiveread-intensive• Its content is periodically updatedIts content is periodically updated

Page 7: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

7

1.2 Data Warehouse Characteristics 1.2 Data Warehouse Characteristics (cont’d)(cont’d)

• It contains current and historical data to It contains current and historical data to provide a historical perspective of provide a historical perspective of informationinformation

• It contains a few large tablesIt contains a few large tables• Each query frequently results in a large Each query frequently results in a large

result set and involves frequent full table result set and involves frequent full table scan and multi-table joinsscan and multi-table joins

Page 8: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

8

1.3 Datawarehousing1.3 Datawarehousing

HeterogeneousData Sources

Data Cleaning

Data Integration

AndConsolidation

InteractiveAnalysis

MakingStrategicDecisions

Constructing Data warehouse Using Data Warehouse

• The Processing of constructing and using The Processing of constructing and using data warehousesdata warehouses

Page 9: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

9

1.4 Three-tier System Architecture1.4 Three-tier System Architecture

Datawarehouse Server

Operational DBMS

OLAP ToolsExecutives or

Decision Making Staffs

IT or DatawarehouseAdministrators

Page 10: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

10

2. 2. Data Warehouse TheoryData Warehouse Theory

Page 11: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

11

2.1 Data Warehouse Theory2.1 Data Warehouse Theory

• Why not use Database directly ?Why not use Database directly ?– The update-driven approach is inefficient.The update-driven approach is inefficient.– Potentially expensive for frequent queries.Potentially expensive for frequent queries.

• Use Data warehouse insteadUse Data warehouse instead– The query-driven approach is enough for The query-driven approach is enough for

making strategic decisions.making strategic decisions.– Separate the operational DBMS for daily Separate the operational DBMS for daily

and critical operations.and critical operations.

Page 12: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

12

2.2 Data Cube2.2 Data Cube

• A multidimensional, logical view of the A multidimensional, logical view of the

datadata

• Concept hierarchyConcept hierarchy

– Multiple data granularity Multiple data granularity 多重的資料顆粒度多重的資料顆粒度– Data summarization Data summarization 資料加總資料加總– Data generalization Data generalization 資料一般化資料一般化

Page 13: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

13• A 3-dimension Data CubeA 3-dimension Data Cube

Page 14: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

14

• Drill-down on time data for Q1Drill-down on time data for Q1 • Roll-up on addressRoll-up on address

Page 15: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

15

• Adding a dimension supplierAdding a dimension supplier

Page 16: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

16

2.3 Efficient Data Cube 2.3 Efficient Data Cube ComputationComputation

• The challenges : The challenges : 22NN combinations combinations– Concept hierarchy and AggregationsConcept hierarchy and Aggregations

makes it more complicated !makes it more complicated !

• Materialization of data cube Materialization of data cube 如何實作如何實作– Materialize every, none, or some ?Materialize every, none, or some ?

– Algorithms for selectionAlgorithms for selection

• Based on sizeBased on size

• Based on sharing,Based on sharing,

• Based on access frequency.Based on access frequency.

Address, Time, Item

Address, Time Address, Item Time, Item

Address TimeItem

ALL

Page 17: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

17

2.4 2.4 On-Line Analytical Processing On-Line Analytical Processing (OLAP)(OLAP)

• Fast on-line processing of data cubes Fast on-line processing of data cubes or multi-dimensional databasesor multi-dimensional databases

• OLAP operations: OLAP operations: – DrillingDrilling– Pivoting Pivoting 樞紐分析樞紐分析– Slicing and DicingSlicing and Dicing– Filtering, etc.Filtering, etc.

Page 18: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

18

2.4 2.4 On-Line Analytical Processing On-Line Analytical Processing (Cont’d)(Cont’d)

• A multidimensional, logical view of the data.A multidimensional, logical view of the data.

• Interactive analysis of the data (drill, pivot, slice_dice, Interactive analysis of the data (drill, pivot, slice_dice, filter) and Quick response to OLAP queries.filter) and Quick response to OLAP queries.

• Summarization and aggregations at every dimension Summarization and aggregations at every dimension intersection.intersection.

• Retrieval and display of data in 2-D or 3-D cross-tabs, Retrieval and display of data in 2-D or 3-D cross-tabs, charts, and graphs, with easy pivoting of the axes.charts, and graphs, with easy pivoting of the axes.

• Analytical modeling: deriving ratios, variance, etc. Analytical modeling: deriving ratios, variance, etc. and involving data across many dimensions.and involving data across many dimensions.

• Forecasting, trend analysis, and statistical analysis.Forecasting, trend analysis, and statistical analysis.

Page 19: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

19

3. System Feature3. System Feature

Page 20: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

20

3.1 Data sources supported3.1 Data sources supported

• ODBC-compatible DBMSODBC-compatible DBMS– Oracle, Microsoft SQL, MySQL, IBM DB2, etc.Oracle, Microsoft SQL, MySQL, IBM DB2, etc.

• FilesFiles– MS Access, MS Excel, etc.MS Access, MS Excel, etc.– Text files (CSV-format)Text files (CSV-format)

Page 21: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

21

3.2 Data Cleansing 3.2 Data Cleansing 資料清洗資料清洗• Database schema translationDatabase schema translation

– Field selection and mappingField selection and mapping– Field re-namingField re-naming– Field aggregating and derivingField aggregating and deriving

• Data filteringData filtering• Data value conversionData value conversion

– Data value mappingData value mapping– Data value functionData value function– Date value conversion and decompositionDate value conversion and decomposition

Page 22: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

22

3.3 Building of Data Cube3.3 Building of Data Cube

• Support for multi-dimension dataSupport for multi-dimension data• Support for concept hierarchySupport for concept hierarchy

Page 23: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

23

3.5 Interactive Front-end Tools3.5 Interactive Front-end Tools

• User-defined multi-dimensionUser-defined multi-dimension• User-defined dimension hierarchyUser-defined dimension hierarchy• User-defined data granularityUser-defined data granularity• Real-time graph capabilitiesReal-time graph capabilities

– Bar chartBar chart– Pie chartPie chart– Line chartLine chart

Page 24: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

24

3.6 Other features3.6 Other features

• Web-based OLAP GUIWeb-based OLAP GUI– Easy to access from InternetEasy to access from Internet

• Easy to integrated with other systemsEasy to integrated with other systems– Import / Export capabilityImport / Export capability

Page 25: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

25

4. Demo4. Demo

Page 26: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

26

5. 5. DiscussionsDiscussions

Page 27: 1 資料倉儲介紹 Data Warehousing and OLAP 楊立偉教授 台灣大學工管系

27

5.1 Roadmap5.1 Roadmap

• Integrated with Integrated with Data miningData mining– Major Group / Sales Analysis Major Group / Sales Analysis 主力客群主力客群– Prospects Analysis and Forecast Prospects Analysis and Forecast 潛在購買分析與預測潛在購買分析與預測– Association of Customers and Sales Association of Customers and Sales 關聯分析關聯分析– Market Segment Recommendation Market Segment Recommendation 市場區隔市場區隔– Other Business Intelligence applicationOther Business Intelligence application

• Integrated to Integrated to e-Marketinge-Marketing– 1-to-1 Personalization & Recommendation 1-to-1 Personalization & Recommendation 個人化推薦個人化推薦– Target marketing Target marketing 目標行銷目標行銷– Loyalty program Loyalty program 客戶忠誠度計劃客戶忠誠度計劃