09. data warehouse (dw) & on-line analytic processing (olap)
DESCRIPTION
09. Data Warehouse (DW) & On-line Analytic Processing (OLAP). Rev: Feb, 2013 Euiho (David) Suh , Ph.D. POSTECH Strategic Management of Information and Technology Laboratory (POSMIT: http://posmit.postech.ac.kr) Dept. of Industrial & Management Engineering POSTECH. Contents. Cost. Bond. - PowerPoint PPT PresentationTRANSCRIPT
09. Data Warehouse (DW) &On-line Analytic Processing (OLAP)
Rev: Feb, 2013
Euiho (David) Suh, Ph.D.
POSTECH Strategic Management of Information and Technology Laboratory(POSMIT: http://posmit.postech.ac.kr)
Dept. of Industrial & Management EngineeringPOSTECH
Contents1 Data Warehouse
1) Introduction of Data Warehouse
2) Concepts for Data Warehouse
3) Difficulties and Trends
2 On-line Analytic Processing (OLAP)1) Introduction of OLAP
2) Concepts for OLAP
3 Case Study
3
■ Data Warehouse
– Stores static data that has been extracted from other databases in an organization– Central source of data that has been cleaned, transformed, and cataloged– Data is used for data mining, analytical processing, analysis, research, decision sup-
port
Definition of Data Warehouse 1. Data Warehouse1) Introduction of Data Warehouse
Integrated
Non-volatile
Time variant
A data warehouse is a collection of data in support of manage-ment’s decisions
Scattered Information Cleaned Data Warehouse Query & Distribute to End User
0
50
100
SalesHR
Cost
Finance
Bond
Customer
4
■ Data Warehouse architecture
Data Warehouse Architecture 1. Data Warehouse1) Introduction of Data Warehouse
* Building the Data Warehouse *Use of Data Warehouse
Data Warehouse
External file
OLTP System
Back up file
Enterprise server
Workgroup server Query,
Reporting tool
OLAP tool
Datamining Application
EIS/DSS Application
Web browserSlice/Dice
SQLSQL
SQL
SQL
SQL
SQL
SQL
Data MartSource Data
MDB
RDB
Infra, Data integration and Administration
Application development, Data access & Use
5
■ Technical architecture for a data warehousing system
Data Warehouse Architecture
DataAcquisitionComponent
DesignComponent
DataManager
Component
InformationDirectory
Component
DataDelivery
Component
MiddlewareComponent
Data AccessComponent
warehousedata
warehousemetadata
externaldata
externalmetadata
sourcedata
Management Component
1. Data Warehouse1) Introduction of Data Warehouse
6
■ Definition of database– Integrated collection of logically related data elements
■ Common Database Structures (Types)– Hierarchical
• Early DBMS structure• Records arranged in tree-like structure• Relationships are one-to-many
– Network• Used in some mainframe DBMS packages• Many-to-many relationships
– Relational• Most widely used structure• Data elements are stored in tables• Row represents a record; column is a field• Can relate data in one file with data in another,
if both files share a common data element– Multidimensional
• Variation of relational model• Uses multidimensional structures to
organize data• Data elements are viewed as being in cubes• Popular for analytical databases that support Online Analytical Processing (OLAP)
– Object-Oriented• Store data together with the appropriate methods for accessing it i.e. encapsulation• Information is represented in the form of objects as used in object-oriented programming
Introduction of Database 1. Data Warehouse2) Concepts for Data Warehouse
Relational Struc-ture Object-Oriented
Structure
7
■ Metadata– Data about data (similar to catalog card in library)– Define the data in the data warehouse– Enable to find the data in data warehouse, more easily and fast
■ Data Marts– Collection of database– Comparing with Data Warehouse, data marts are usually smaller and focus on a par-
ticular subject or department. – Data marts are subsets of larger Data Warehouse
■ Data Warehouse vs. Data Mart– Data in Data Warehouse• The data needs to be gathered from all the relevant transactional systems that produce it,
cleansed and validated, and made available from a system-of-record that ensures the referential integrity of the data
– Data in Data Mart• The data needs to be presented in a structure that is intuitive to the users and facilitates their
ability to query the data that is relevant to their needs
Metadata and Data Marts 1. Data Warehouse2) Concepts for Data Warehouse
8
■ Data Warehouse built on top of DB
Information Flow 1. Data Warehouse2) Concepts for Data Warehouse
Internal / External
Database
Data Warehouse
MetadataRepository
Internal / External
Database
Data Marts
Finance Management Reporting
Accounting
SalesMarketing
9
■ Data Warehouse Components
Data Warehouse Components 1. Data Warehouse2) Concepts for Data Warehouse
10
■ Applications and Data Marts
Applications and Data Marts 1. Data Warehouse2) Concepts for Data Warehouse
11
Difficulties in implementing DW
■ Complete Alignment– Make sure you have full involvement and buy -in from those that represent your users -
the consumers of your data warehouse.
■ Iterative & Frequent Update– Consider all aspects of the process of researching your data sources, capturing and
transmitting that data to the data warehouse, transforming and loading it into the data warehouse and accounting for its lineage.
■ Risk– Make sure you develop a proper risk management plan.
1. Data Warehouse3) Difficulties and Trends
12
Future Trends
■ Enterprise Data Warehouse– The enterprise data warehouse, whether a single store or integrated data marts across
a variety of platforms, yields a view of the operation previously unattainableby Don Hatcher, SAS
■ Real-time– Organization move to more real-time data transformation and seek to better leverage
common metadata across applications by Allan Houpt, CA
■ Capacity– The future of data warehousing is all about ever larger data warehouses - in fact I just
read about a U.S. Government effort to create petabyte repositoriesby Roman Bukary, SAP Director of Market Strategy
1. Data Warehouse3) Difficulties and Trends
13
Definition of OLAP
■ OLAP (On-Line Analytical Processing)– The dynamic enterprise analysis required to create, manipulate, animate and synthesis
information from Enterprise Data Models * Providing OLAP: An IT Mandate
E.F. Codd (1993)
– FASMI (Fast Analysis of Shared Multidimensional Information)• This definition was first used in early 1995, and has not needed revision since
Pendse & Greeth (1995)
2. OLAP1) Introduction of OLAP
FAST
ANALYSIS
SHARED
MULTIDIMENSIONAL
INFORMATION
14
OLAP Architecture
■ OLAP Architecture
2. OLAP1) Introduction of OLAP
15
From OLTP to OLAP
■ Data used in OLAP– Sales data of June? (OLTP)– Multi-dimensional data (having many features) (OLAP)
■ Direct Access: EUC Environment
■ From What to Why– OLTP: Storing primitive data, supporting routine business operation (What) – OLAP: Storing cumulative data, supporting business goal (Why)
2. OLAP2) Concepts for OLAP
Information Source
Information Broker Information
Consumer
16
OLTP vs. OLAP
■ OLTP vs. OLAP
2. OLAP2) Concepts for OLAP
OLTP OLAP
Definition On-Line Transaction Processing On-Line Analytical ProcessingObjective Operational Analytical
Focus Daily repetitious work Decision support in organizationDeveloper Computer expert End-user
User Simple operator Special analyst
Storing Current value Summarized and Consolidated data
Use Repetitive UnstructuredResponse Immediate Delayed
Data Updated SummarizedUpdate Field Recomputation
Amount of Data Small MuchData Structure Complex Simple
Database RDB MDBData period Past, Current Past, Current, FutureQuery type Regular Irregular, Analytical
17
Enterprise IT Architecture
■ OLTP/OLAP Enterprise IT Architecture
2. OLAP2) Concepts for OLAP
18
Data Warehouse vs. OLAP Server
■ Data Warehouse vs. OLAP Server
2. OLAP2) Concepts for OLAP
Data Warehouse OLAP Server
Objective Ready to all kinds of retrieval Specialized retrieval
Characteristics Data Storage Computation Engine
Query Type Read only Read/Write
Response Flexible Consistent, rapid
Content Historical, present Historical, present, Future
Data Structure Plain Multi-dimensional
Amount of Data Huge, much detail Much, detail Development pe-
riod A few month, yrs A few weeks, months
19
Two types of OLAP
■ MOLAP
■ ROLAP
2. OLAP2) Concepts for OLAP
Clients
Clients
Clients
MDBMS
RDBMS MD Processing
Query
SQL
SQL Respond
MD Processing
Query
Respond
20
From RDB to MDB
■ Basic Data Structure of MDB & RDB
– RDB: OLTP, Data Warehouse
■ RDB as OLAP Server– Cannot handle and represent Multi-dimensional relationship well– Cannot summarize data well
■ MDB as OLAP Server– Gives many managerial viewpoints– EUC– Supports analysis functionality
Table
Field, Row
Record,Column
Cube
Dimension
Hierarchy
– MDB: OLAP
2. OLAP2) Concepts for OLAP
21
Reference
■ Euiho Suh, “EIS_DSS_OLAP_DW (PPT Slide)”, POSMIT Lab. (POSTECH Strategic Management of Information and Technology Laboratory)
■ Euiho Suh, “OLAP (PPT Slide)”, POSMIT Lab. (POSTECH Strategic Management of Information and Technology Laboratory)
■ O’Brien & Marakas, “Introduction to Information Systems – Sixteenth Edition”, McGraw – Hill, Chapter 5