1 資料倉儲介紹 data warehousing and olap 楊立偉教授 台灣大學工管系
TRANSCRIPT
1
資料倉儲介紹資料倉儲介紹Data Warehousing and Data Warehousing and
OLAPOLAP
楊立偉教授楊立偉教授台灣大學工管系台灣大學工管系
2
AgendaAgenda
1.1. IntroductionIntroduction
2.2. Data Warehouse TheoryData Warehouse Theory
3.3. System FeaturesSystem Features
4.4. DemoDemo
5.5. DiscussionsDiscussions
3
1. 1. IntroductionIntroduction
4
1.1 Introduction1.1 Introduction
• A A data warehousedata warehouse is a subject- is a subject-oriented, integrated, time-variant, oriented, integrated, time-variant, nonvolatile collection of data in nonvolatile collection of data in support of management decisionssupport of management decisions
5
1.1 Introduction (cont’d)1.1 Introduction (cont’d)
How are organizations using data warehouse ?How are organizations using data warehouse ?
1.1. Increasing customer focusIncreasing customer focus, which includes the analysis of , which includes the analysis of
customer buying patterns.customer buying patterns.
2.2. Repositioning products and managing product portfoliosRepositioning products and managing product portfolios
by comparing the performance of sales by time or by comparing the performance of sales by time or
regions, in order to fine-tune production strategiesregions, in order to fine-tune production strategies
3.3. Analyzing operations and looking for Analyzing operations and looking for sources of profitsources of profit
4.4. Managing the customer relationship, making Managing the customer relationship, making
environmental corrections, and managing the cost of environmental corrections, and managing the cost of
corporate assetscorporate assets
6
1.2 Data Warehouse Characteristics1.2 Data Warehouse Characteristics
• It is a database designed for It is a database designed for analytical analytical taskstasks, using data from multiple , using data from multiple applicationsapplications
• It supports a relatively small number of It supports a relatively small number of users with relatively long interactionsusers with relatively long interactions
• Its usage is Its usage is read-intensiveread-intensive• Its content is periodically updatedIts content is periodically updated
7
1.2 Data Warehouse Characteristics 1.2 Data Warehouse Characteristics (cont’d)(cont’d)
• It contains current and historical data to It contains current and historical data to provide a historical perspective of provide a historical perspective of informationinformation
• It contains a few large tablesIt contains a few large tables• Each query frequently results in a large Each query frequently results in a large
result set and involves frequent full table result set and involves frequent full table scan and multi-table joinsscan and multi-table joins
8
1.3 Datawarehousing1.3 Datawarehousing
HeterogeneousData Sources
Data Cleaning
Data Integration
AndConsolidation
InteractiveAnalysis
MakingStrategicDecisions
Constructing Data warehouse Using Data Warehouse
• The Processing of constructing and using The Processing of constructing and using data warehousesdata warehouses
9
1.4 Three-tier System Architecture1.4 Three-tier System Architecture
Datawarehouse Server
Operational DBMS
OLAP ToolsExecutives or
Decision Making Staffs
IT or DatawarehouseAdministrators
10
2. 2. Data Warehouse TheoryData Warehouse Theory
11
2.1 Data Warehouse Theory2.1 Data Warehouse Theory
• Why not use Database directly ?Why not use Database directly ?– The update-driven approach is inefficient.The update-driven approach is inefficient.– Potentially expensive for frequent queries.Potentially expensive for frequent queries.
• Use Data warehouse insteadUse Data warehouse instead– The query-driven approach is enough for The query-driven approach is enough for
making strategic decisions.making strategic decisions.– Separate the operational DBMS for daily Separate the operational DBMS for daily
and critical operations.and critical operations.
12
2.2 Data Cube2.2 Data Cube
• A multidimensional, logical view of the A multidimensional, logical view of the
datadata
• Concept hierarchyConcept hierarchy
– Multiple data granularity Multiple data granularity 多重的資料顆粒度多重的資料顆粒度– Data summarization Data summarization 資料加總資料加總– Data generalization Data generalization 資料一般化資料一般化
13• A 3-dimension Data CubeA 3-dimension Data Cube
14
• Drill-down on time data for Q1Drill-down on time data for Q1 • Roll-up on addressRoll-up on address
15
• Adding a dimension supplierAdding a dimension supplier
16
2.3 Efficient Data Cube 2.3 Efficient Data Cube ComputationComputation
• The challenges : The challenges : 22NN combinations combinations– Concept hierarchy and AggregationsConcept hierarchy and Aggregations
makes it more complicated !makes it more complicated !
• Materialization of data cube Materialization of data cube 如何實作如何實作– Materialize every, none, or some ?Materialize every, none, or some ?
– Algorithms for selectionAlgorithms for selection
• Based on sizeBased on size
• Based on sharing,Based on sharing,
• Based on access frequency.Based on access frequency.
Address, Time, Item
Address, Time Address, Item Time, Item
Address TimeItem
ALL
17
2.4 2.4 On-Line Analytical Processing On-Line Analytical Processing (OLAP)(OLAP)
• Fast on-line processing of data cubes Fast on-line processing of data cubes or multi-dimensional databasesor multi-dimensional databases
• OLAP operations: OLAP operations: – DrillingDrilling– Pivoting Pivoting 樞紐分析樞紐分析– Slicing and DicingSlicing and Dicing– Filtering, etc.Filtering, etc.
18
2.4 2.4 On-Line Analytical Processing On-Line Analytical Processing (Cont’d)(Cont’d)
• A multidimensional, logical view of the data.A multidimensional, logical view of the data.
• Interactive analysis of the data (drill, pivot, slice_dice, Interactive analysis of the data (drill, pivot, slice_dice, filter) and Quick response to OLAP queries.filter) and Quick response to OLAP queries.
• Summarization and aggregations at every dimension Summarization and aggregations at every dimension intersection.intersection.
• Retrieval and display of data in 2-D or 3-D cross-tabs, Retrieval and display of data in 2-D or 3-D cross-tabs, charts, and graphs, with easy pivoting of the axes.charts, and graphs, with easy pivoting of the axes.
• Analytical modeling: deriving ratios, variance, etc. Analytical modeling: deriving ratios, variance, etc. and involving data across many dimensions.and involving data across many dimensions.
• Forecasting, trend analysis, and statistical analysis.Forecasting, trend analysis, and statistical analysis.
19
3. System Feature3. System Feature
20
3.1 Data sources supported3.1 Data sources supported
• ODBC-compatible DBMSODBC-compatible DBMS– Oracle, Microsoft SQL, MySQL, IBM DB2, etc.Oracle, Microsoft SQL, MySQL, IBM DB2, etc.
• FilesFiles– MS Access, MS Excel, etc.MS Access, MS Excel, etc.– Text files (CSV-format)Text files (CSV-format)
21
3.2 Data Cleansing 3.2 Data Cleansing 資料清洗資料清洗• Database schema translationDatabase schema translation
– Field selection and mappingField selection and mapping– Field re-namingField re-naming– Field aggregating and derivingField aggregating and deriving
• Data filteringData filtering• Data value conversionData value conversion
– Data value mappingData value mapping– Data value functionData value function– Date value conversion and decompositionDate value conversion and decomposition
22
3.3 Building of Data Cube3.3 Building of Data Cube
• Support for multi-dimension dataSupport for multi-dimension data• Support for concept hierarchySupport for concept hierarchy
23
3.5 Interactive Front-end Tools3.5 Interactive Front-end Tools
• User-defined multi-dimensionUser-defined multi-dimension• User-defined dimension hierarchyUser-defined dimension hierarchy• User-defined data granularityUser-defined data granularity• Real-time graph capabilitiesReal-time graph capabilities
– Bar chartBar chart– Pie chartPie chart– Line chartLine chart
24
3.6 Other features3.6 Other features
• Web-based OLAP GUIWeb-based OLAP GUI– Easy to access from InternetEasy to access from Internet
• Easy to integrated with other systemsEasy to integrated with other systems– Import / Export capabilityImport / Export capability
25
4. Demo4. Demo
26
5. 5. DiscussionsDiscussions
27
5.1 Roadmap5.1 Roadmap
• Integrated with Integrated with Data miningData mining– Major Group / Sales Analysis Major Group / Sales Analysis 主力客群主力客群– Prospects Analysis and Forecast Prospects Analysis and Forecast 潛在購買分析與預測潛在購買分析與預測– Association of Customers and Sales Association of Customers and Sales 關聯分析關聯分析– Market Segment Recommendation Market Segment Recommendation 市場區隔市場區隔– Other Business Intelligence applicationOther Business Intelligence application
• Integrated to Integrated to e-Marketinge-Marketing– 1-to-1 Personalization & Recommendation 1-to-1 Personalization & Recommendation 個人化推薦個人化推薦– Target marketing Target marketing 目標行銷目標行銷– Loyalty program Loyalty program 客戶忠誠度計劃客戶忠誠度計劃