dw concepts ii

Upload: asurianand

Post on 13-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 DW Concepts II

    1/52

    Data Warehousing.

    Data Warehouse basic concepts

    Data Warehouse Approach

    Data Modeling concepts

    OLAP (Online Analytical Processing)

    Data Warehouse Implementation

    Next steps in Data Warehousing

    March 2005

  • 7/27/2019 DW Concepts II

    2/52

    Data Warehouse- Concepts

    Module 1

    Data Warehouse basic concepts

  • 7/27/2019 DW Concepts II

    3/52

    What is DSS?

    Decision Support System

    Mainly used by business to take some strategic decisions basedon the trends (comparing current fiscal to previous) and projectthe numbers based on history and some parameters

    Not to run the business, OLTP systems takes care of the day today activities of a business. Example SAP Order Managementtakes care of the orders which an organization gets. In the DSSwe collect all the data to do the analysis.

  • 7/27/2019 DW Concepts II

    4/52

    Advantages of DSS

    A grocery store chain in US gives various information from DSSdirectly to store manager. Example, the system can predict the aparticular stock outage in the store. Based on the history systemknows for every 3 hours there should be sale on one particularitem, if the DSS system did not see a transaction from last 3hours it sends an SMS to current shifts manager mobile. Thatsthe level you can go with the DSS. It takes time to get there.(Now this DW is called Active DW)

    One of the retail major does the customer profiling, store salesanalysis etc on the data warehouse, its implemented on

    Teradata.

  • 7/27/2019 DW Concepts II

    5/52

    OLTP

    Online Transaction processing system

    Examples of OLTP systems are order management,payroll etc

    Always follows 3

    rd

    normal form, while designing thedatabase

    All the Data Manipulation Language (DMLInsert,Update, Delete, Select) types are active

    Deal with specific data (customer x, product z etc)

  • 7/27/2019 DW Concepts II

    6/52

    OLTP vs DSS

    More DML operations(Update, Delete,Inserts)

    Point Queries

    Very specific whileissuing queries

    Less history(approximately 6months to 1 year)

    Used for day todayactivities (must to runthe business)

    No change in the data (Noupdates and deletes)

    Queries based on time

    period, set of products, set

    of customers etc

    Maintains the history.

    Used mainly for analytics

    (trend analysis, customer

    behavior etc)

  • 7/27/2019 DW Concepts II

    7/52

    General DSS Architecture

    Source Data

    OLTP 1

    OLTP 2

    Market Place

    Web clicks

    Data

    Warehouse

    Database

    Database

    Pre

    Defined

    Reports

    Ad hocReporting

    OLAP

    Cubes

    ODS

    Staging

    DB

    ETL

    (Tool or

    TSQL)

    Close the loop (write back to OLTP about the findings in DSS

    Data

    Mining

  • 7/27/2019 DW Concepts II

    8/52

  • 7/27/2019 DW Concepts II

    9/52

    Example for a DSS

    OLTP 1OLTP 2

    OLTP 3 OLTP 4

    Data

    WarehouseOLAP

    Reporting

    Analytics

  • 7/27/2019 DW Concepts II

    10/52

    Data Warehouse- Concepts

    Module 2

    Data Warehouse Approach

  • 7/27/2019 DW Concepts II

    11/52

    Distributed Approach

    Various departments can start creating

    different data marts. Each can start working

    independently and see the ROI in a short

    span. In the long run integrating these data adds

    the complexity and Cost will be higher as

    there are more systems to maintain.

  • 7/27/2019 DW Concepts II

    12/52

    Gives only part ofthe answer

    Requires time andeffort to put thepieces together

    No guarantee its

    the right answer

    Distributed Approach

    How We Are Different

  • 7/27/2019 DW Concepts II

    13/52

    Centralized Approach

    Centralized data warehouse contains the data in one place, easyto answer any business question. In the long run this has thecost advantage over the non-centralized data warehouse. Notvery easy to implement as it needs more time and resources.ROI wont be seen until the implementation is completed. So

    recommended approach is to implement the centralized datawarehouse is, start with one subject area and keep adding onesubject area at a time, this way organization will get the see theROI at various stages.

  • 7/27/2019 DW Concepts II

    14/52

    Delivers oneversion of

    the truthfor

    increasedconfidenceand speed indecision-making

    Centralized Approach to DSS

    How We Are Different

  • 7/27/2019 DW Concepts II

    15/52

    Data Warehouse- Concepts

    Module 3

    Data Modeling concepts

  • 7/27/2019 DW Concepts II

    16/52

    Data Modeling

    OLTP3rdNormal Form

    DSSDimensional Modeling

    (Star Schema

    Snow flake schema)

    Why Dimensional Modeling?

  • 7/27/2019 DW Concepts II

    17/52

    Data Modeling for OLTP

    Usually 3rdnormal form.

    Advantages : Flexibility to modify for the

    changes. No redundancy of the data in the

    model.

    Disadvantages : Complex queries to

    generate the reports as the number of tables

    to join are usually high.

  • 7/27/2019 DW Concepts II

    18/52

    Dimensional Modeling for DSS

    Star Schema, Snowflake schema

    Based on RDBMS we have to choose what type of model suitsbetter.

    Example: Teradata is an RDBMS which can give the results inreasonable time as its a parallel processing database engine in

    the market. So we can design the Enterprise data model in the3rdnormal form. But we cant have the same approach for SQLserver or Oracle, we should think of denormalizing the datamodel.

    Star Schema makes queries run faster as the number of tables tojoin is less.

    In star schema all the hierarchies defined per dimension will bestored in single table. So the data redundancy is high. In snowflake we can have one more table for the hierarchy. Thats thedifference between the star schema and snow flake schema.

  • 7/27/2019 DW Concepts II

    19/52

    Star Schema

    Star schema is optimized for queries. You will

    have some level of redundant data available

    in star schema based data model.

  • 7/27/2019 DW Concepts II

    20/52

    Star Schema (in RDBMS)

  • 7/27/2019 DW Concepts II

    21/52

    Star Schema Example

  • 7/27/2019 DW Concepts II

    22/52

    Star Schema with Sample Data

  • 7/27/2019 DW Concepts II

    23/52

    Snow flake

    Snow flake wont have much of redundantdata as most of the dimensions will have alook table. This way the number of joins

    between the tables will become more. Both have advantages and dis advantages,

    so analyze the end users requirements andspace constraints to pick the best.

  • 7/27/2019 DW Concepts II

    24/52

    The Snowflake Schema

    STORE KEYStore Dimension

    Store Description

    City

    District ID

    State

    District_ID

    District Desc.

    Region_ID

    Region_ID

    Region Desc.

    Regional Mgr.

    STORE KEY

    PRODUCT KEY

    PERIOD KEY

    Dollars

    Units

    Price

    Store Fact Table

  • 7/27/2019 DW Concepts II

    25/52

    The Snowflake Schema No LEVEL in dimension tables

    Dimension tables are normalized by decomposing at the attribute level

    Each dimension table has one key for each level of the dimensions

    hierarchy

    The lowest level key joins the dimension table to both the fact table and

    the lower level attribute table

  • 7/27/2019 DW Concepts II

    26/52

    The Snowflake Schema

    Additional features: The original Store Dimension table, completely de-

    normalized, is kept intact, since certain queries can benefit by its all-

    encompassing content.

    In practice, start with a Star Schema and create the snowflakes withqueries. This eliminates the need to create separate extracts for each

    table, and referential integrity is inherited from the dimension table.

    Disadvantage: Complicated maintenance and metadata, explosion in the number

    of tables in the database

  • 7/27/2019 DW Concepts II

    27/52

    ETL (EExtract)

    ExtractGetting data out of the source systems.This may be just a DTS package which pulls thedata, or exporting a table to a flat file in the sourcesystem.

    In Teradata we have Fast Export utility where wecan export the data to a flat file.

    In Oracle we have SQL*Loader to export the data toa flat file.

    In SQL Server we can use a DTS package to do the

    same job

  • 7/27/2019 DW Concepts II

    28/52

    ETL (TTransform)

    TransformIts not necessary to have the samedata model in source and destination. When thedata model is different from source obviously wehave to modify the source data to destinations datamodel. This process is called transformation.

    Example : When we receive data from variousdistis about the reseller information we wont get thegeo information. So in the transformation logic wewill have some code which assigns the respective

    geo based on the country from which you aregetting the data.

    This is the simple example on transformation.

  • 7/27/2019 DW Concepts II

    29/52

    ETL (LLoad)

    LoadLoding the transformed data into thedestination datamoel (data warehouse).

    As there are export functionality available in eachRDBMS there is an utility to import the data into the

    database. TeradataFast Import

    OracleSQL*Loader

    Sybase - bcp

  • 7/27/2019 DW Concepts II

    30/52

    Data Refresh in DSS

    We have to refresh the data in DSS from varioussource systems in timely manner.

    While doing so, either we should do a full refresh ofa particular table or capture only the changed data

    (this process is called delta) Usually for fact tables we go for delta refresh and for

    dimension tables we go for full refresh. As theenvironment is getting bigger and bigger almost allthe tables will become delta loads.

  • 7/27/2019 DW Concepts II

    31/52

    Data Warehouse- Concepts

    Module 4

    OLAP (Online Analytical Processing)

  • 7/27/2019 DW Concepts II

    32/52

    What is OLAP?

    What is OLAP? Online Analytical Processing. Viewing data in a

    multi dimensional way.

    Why OLAP? Slice and dice for data warehouse.

    RDBMS is a 2 dimensional way of storing /

    viewing the data OLAP is a multi dimensional way of storing /

    viewing the data

  • 7/27/2019 DW Concepts II

    33/52

    Types in OLAP?

    Three types of OLAP in the industry.

    1. MOLAPMulti dimensional OLAP (Ex

    MSOLAP, Essbase, Cognos).

    2. ROLAPRelational OLAP ( Ex Business

    Objects, Microstrategy).

    3. HOLAPHybrid OLAP

  • 7/27/2019 DW Concepts II

    34/52

    Aggregates

    sale prodId storeId date amt

    p1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44

    p1 s2 2 4

    Add up amounts for day 1

    In SQL: SELECT sum(amt) FROM SALE

    WHERE date = 1

    81

  • 7/27/2019 DW Concepts II

    35/52

    Aggregates

    Add up amounts by day

    In SQL: SELECT date, sum(amt) FROM SALE

    GROUP BY date

    ans date sum

    1 81

    2 48

    sale prodId storeId date amt

    p1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44

    p1 s2 2 4

  • 7/27/2019 DW Concepts II

    36/52

    Another Example

    Add up amounts by day, product

    In SQL: SELECT date, sum(amt) FROM SALE

    GROUP BY date, prodId

    sale prodId date amt

    p1 1 62

    p2 1 19

    p1 2 48

    drill-down

    rollup

    sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4

  • 7/27/2019 DW Concepts II

    37/52

    Example

    Produc

    t

    TimeM T W Th F S S

    Juice

    Milk

    Coke

    Cream

    Soap

    Bread

    NYSF

    LA

    10

    34

    56

    32

    12

    56

    56 units of bread sold in LA on M

    Dimensions:

    Time, Product, Store

    Attributes:

    Product (upc, price, )

    Store

    Hierarchies:

    ProductBrand

    DayWeekQuarter

    StoreRegionCountryroll-up to week

    roll-up to brand

    roll-up to region

  • 7/27/2019 DW Concepts II

    38/52

    Summary of Operations Aggregation (roll-up)

    aggregate (summarize) data to the next higher dimension element e.g., total sales by city, year total sales by region, year

    Navigation to detailed data (drill-down)

    Selection (slice) defines a subcube

    e.g., sales where city =Gainesville and date = 1/15/90

    Calculation and ranking

    e.g., top 3% of cities by average income

    Visualization operations (e.g., Pivot)

    Time functions

    e.g., time average

  • 7/27/2019 DW Concepts II

    39/52

    Data Warehouse- Concepts

    Module 3

    Data Warehouse Implementation

    Steps

  • 7/27/2019 DW Concepts II

    40/52

    Typical Approach

    Data Modeling is a cyclic process involving thefollowing steps

    Requirement Gathering

    Requirement Analysis

    Requirement Validation Logical Modeling

    Physical Design

    Implementation

    ValidationThe above cycle repeats for any upgrades or

    enhancements

  • 7/27/2019 DW Concepts II

    41/52

    Requirement Gathering

    Identify the Business objectives

    Identify the reporting requirements

    Identify the frequency of report generation

    Granularity of Information

    Business rules

  • 7/27/2019 DW Concepts II

    42/52

    Requirement Analysis

    Study the requirements captured

    Identify the subject areas

    Identify the Measures and criteria fields

    Identify the granularity of information

    required

  • 7/27/2019 DW Concepts II

    43/52

    Requirement Validation

    Validate the analysis with the customer

    Document Sign off.

  • 7/27/2019 DW Concepts II

    44/52

    Logical Modeling

    Identify facts and dimensions

    Create Logical Model

  • 7/27/2019 DW Concepts II

    45/52

    Physical Design

    Analyze Source Systems with respect to Logical Model

    Data Quality Analysis

    Physical Design

    Data type

    Indexes

    Partitioning

    Database creation etc.,

    Source to target mapping

    Capture Transformation rules

    Capture Derivation rules for derived fields

  • 7/27/2019 DW Concepts II

    46/52

    Implementation

    Database Creation

    Staging Design (Design Extraction Jobs)

    Develop ETL Jobs

    Unit testing of ETL Jobs

    Schedule Jobs Test Load

    Data Validation

    Performance monitoring

    ETL Job tuning

    Test Database performance tuning Final loading of data from source to target

  • 7/27/2019 DW Concepts II

    47/52

    Data Warehouse- Concepts

    Module 5

    Next steps in Data Warehousing

  • 7/27/2019 DW Concepts II

    48/52

    Data Mining

    Literally, the purpose is to exploit raw data clusters as if they were gold mines,that is, to look for treasures buried inside. Actually, companies are trying tocomputerize the discovery proceeding of the various existing trends within largeamounts of data.

    allowing the forecast of future behaviours.

    Mining tools provides the sophisticated algorithms to find the specific trends with

    the data available.

    Example : MS Analysis Server provides the following algorithms. (Decision Treeand Clustering etc)

  • 7/27/2019 DW Concepts II

    49/52

    Difference between OLAP and Data

    Mining

    OLAP

    Who were my 10 best

    customers last year

    What was the response

    rate to our mailing?

    Data Mining

    Which 10 customers offer

    me the greatest profit

    potential

    What is the profile ofpeople who are likely to

    respond to future

    mailings?

  • 7/27/2019 DW Concepts II

    50/52

    Business Activity Monitoring (BAM)

    BAM is the technology which is used to monitor the DW or OLTPactively for certain value.

    The system can run the set of process when it finds theexception and sends the information to relevant owners to takethe action.

    Based on the findings immediately update the relevant OLTPsystem (conceptually its called closing the loop with DSS andOLTP)

    Example - INFORAY is a BAM tool which you can use on theDW.

  • 7/27/2019 DW Concepts II

    51/52

    Q & A

  • 7/27/2019 DW Concepts II

    52/52

    Thank You