opensource dwbi_a primer

Upload: parthasarathi-doraisamy

Post on 03-Jun-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 OpenSource DWBI_A Primer

    1/50

    OPEN SOURCE DATA WAREHOUSE

    /BI-A PRIMER

    Webinar session for TechGig.com

    PresentorParthasarathi Doraisamy

    Enterprise BIDI Solutions

    1

  • 8/11/2019 OpenSource DWBI_A Primer

    2/50

    CLOUD --WHAT DOES THIS MEAN?

    UC Berkeley RAD Lab definition:

    1. The illusion of infinite computing resources available ondemand, thereby eliminating the need for Cloud Computingusers to plan far ahead for provisioning

    2. The elimination of an up-front commitment by Cloud users,thereby allowing companies to start small and increase hardwareresources only when there is an increase in their needs; and

    3. The ability to pay for use of computing resources on a short term

    basis as needed (e.g., processors by the hour and storageby the day) and release them as needed, thereby rewardingconservation by letting machines and storage go when they areno longer useful.

    2

  • 8/11/2019 OpenSource DWBI_A Primer

    3/50

    REFERENCES/ACKNOWLEDGEMENT

    Talend Pentaho Birt-eclipse

    Birst Jaspersoft GreenplumASAODW model Gartner research analysis TDWI

    3

  • 8/11/2019 OpenSource DWBI_A Primer

    4/50

    WHAT IS OPEN DW/BI?

    Beware:Open doesnt means the product(s) are free!!!!!!!!

    Open DW consists of pre designed,prebuilt Data warehouse architecture whichcomes free

    Thereby it reduces overall cost and risk by reducing design,development and

    implementation time

    -> Reduces consumers initial development cost(DQ,ETL,BI & Analytics etc.)

    But the vendors charge for the related services in maintainig the DWsolution,further customizing to their exact business need ,Support &

    maintenance of the system.

    Mitigates the risk through Rapid development

    There are technical, social, and economic reasons that will move datawarehousing and, perhaps all data models toward open solutions

    4

  • 8/11/2019 OpenSource DWBI_A Primer

    5/50

    NEED FOR OPEN DW/BI

    Open data warehouse,BI developmentprogressed rapidly over the past few years dueto compelling economic downturn

    Faster deployment need of the proposedsolution due to dynamic business changes

    Now a days we can getOpen Source productfor almost every aspect of the BI/Data

    warehouse stack including architectures whichare picking up pace.(Few noticable playersTalend,Pentaho,Jaspersoft,Birst .Qlikview etc.)

    5

  • 8/11/2019 OpenSource DWBI_A Primer

    6/50

    INDUSTRY STATS ON TRADITIONAL DWBI

    The average cost of these projects was $2.2million ($3.1 million today, adjusted for inflation).

    The average payback period was 2.3 years,

    with over 30% experiencing a 5+ year paybackperiod.

    The majority of respondents reported that their

    data warehouses consumed enormousresources and remained works in progress for

    extended periods of time.

    6

  • 8/11/2019 OpenSource DWBI_A Primer

    7/50

    NEED FOR OPEN DW/BI .

    Popular open source databases which helpin these Open data warehouse are MySql(and its eco-system of add-ons), Ingres,

    EnterpriseDB.

    Hardware,software cost considerations arefurther reduced by extending the Open

    solution in the hosted SaaS environment.

    7

  • 8/11/2019 OpenSource DWBI_A Primer

    8/50

    ODW MODEL A FRAMEWORK

    Open Data Warehouse Model (ODWM)provides a generic framework for delivering anOpen data warehouse

    This generic data warehouse model can befurther fine tuned to specific industry

    Domain experts work upon these specificindustry solutions just like in typical proprietaryDW/BI solutions earlier,but differ in certain

    critical aspects like pre-design of Open DWBIarchitecturedata model,Etl design,BI designfor theconcerned industry domains

    8

  • 8/11/2019 OpenSource DWBI_A Primer

    9/50

    ODW MODEL PRINCIPLE

    The Open Datamodel consists of Hundreds of potential dimension tableswith thousands of fields which forms the Foundation

    These Open data warehouse are carefully designed to ensure stability ofthe DW system and easily facilitates the use of commercial ETLbridges/connectors

    (yet allow for interpretation through aggregation and by other means)

    OLAP cubes and data marts can be constructed from the foundation asrequired by the business through similar bridges/connectors

    These are the potential opportunity for Developers in their respectivetechnology-ie.ETL,BI & Analytics area to come up with appropriate bridgesolutions to seamlessly develop the entire ODW & BI model into afunctional datamart,Enterprise Data warehouse

    9

  • 8/11/2019 OpenSource DWBI_A Primer

    10/50

    ODW MODEL & ITS EXTENSIONS..

    They must allow for integration of multiple datasources of different granularity ;should in somemanner, accommodate slowly changing dimensions

    Each of the baseline ODW Db instance model canfurther create a range of domain specific(we can callit a IndustrySlice) packaged solutions.Thesepackage may comprise of DQ,ETL,BI solution asoutlined earlier.

    These package solutions comprises of Host the domain specific ODW solution(s) in the

    cloud .

    These hosted Open DWBI solutions leads us to thepackaged Data warehouse/BI Appliances 10

  • 8/11/2019 OpenSource DWBI_A Primer

    11/50

    OPEN DATAWAREHOUSE/BI APPLIANCE

    11

  • 8/11/2019 OpenSource DWBI_A Primer

    12/50

    OPEN DWBI APPLIANCES

    The Open DWBI Appliance combines andsupports thousands of data warehouses, manyof those with hundreds of millions of records in a

    scalable multi-tenant environment. These appliances got the capablity to generate

    complex datamodels, complex algorithms inbuiltwithin their query engine

    These appliance vendors tie up with Hardwaresuppliers to construct the appliance in such away for performing to its maximum efficiency

    12

  • 8/11/2019 OpenSource DWBI_A Primer

    13/50

    OPEN DWBI APPLIANCES

    These appliances are designed to power anon-demand software solution that needs tosupport a large number of users

    simultaneously and has the ability to quicklyincrease capacity

    Built on a shared-nothing architecture and no

    data is shared across nodes (servers). Popular appliances are

    Nettezza,Greenplum..

    13

  • 8/11/2019 OpenSource DWBI_A Primer

    14/50

    MULTIPLE APPLIANCES FOR ENTERPRISE NEED

    14

  • 8/11/2019 OpenSource DWBI_A Primer

    15/50

    DWBI APPLIANCES SALENT FEATURES

    High Availability and Failover Support Designed for operation in a high-availability clustered Open DWBI

    environmentGlobal Cache

    Provides superior query performance via its massive-scale

    caching capabilities

    Simplified software Deployment and Upgrades in Place

    Dramatically simplifies its deployment by freeing IT from having to

    worry about resolving potentially complex OS compatibility issues,library dependencies or undesirable interactions with otherapplications.

    15

  • 8/11/2019 OpenSource DWBI_A Primer

    16/50

    DWBI APPLIANCES SALENT FEATURES.

    Advanced ETL Services and a completeanalytical data warehouse with automatedwarehouse generation

    Cloud Connectors, for connecting to operationalcloud applications- Eg.Salesforce.com,GoogleAnalytics

    These Connecters allow for automatic uploading

    of data into the appliance from various sources Live Access, which allows you to analyze data

    from on-premise datawarehouseswithoutuploading

    16

  • 8/11/2019 OpenSource DWBI_A Primer

    17/50

    SAAS BASED OPEN BI SOLUTION

    17

  • 8/11/2019 OpenSource DWBI_A Primer

    18/50

    SAAS OPEN BI SOLUTION..

    Low-cost, open source solution.

    End-to-end, integrated BI and ETLcapabilities.

    Full enterprise-level support.

    Flexibility of on-demand and on-premisedeployment.

    Support for mobile devices as a BI platform.

    Support for iterative IT and business-user

    report generation process. 18

  • 8/11/2019 OpenSource DWBI_A Primer

    19/50

    CLOUD --WHAT DOES THIS MEAN?

    Depends upon how you slice it vertically

    IaaS -AWS, GoGrid, Mosso

    PaaS -Google App Engine, Microsoft Azure SaaS(BaaS) -Salesforce ,Talend,Jaspersoft,

    Pentaho,BIRT etc.

    19

  • 8/11/2019 OpenSource DWBI_A Primer

    20/50

    AGILE BI-ASTER,CHEAPER,BETTER.

    20

  • 8/11/2019 OpenSource DWBI_A Primer

    21/50

    CLOUD --WHAT DOES THIS MEAN?

    21

  • 8/11/2019 OpenSource DWBI_A Primer

    22/50

    ODW -WHEN TO USE THE CLOUD?

    Transient application lifespan or use

    Quick start required

    Budget pressure Variable use/scale of application unknown

    IT unavailable/unresponsive

    22

  • 8/11/2019 OpenSource DWBI_A Primer

    23/50

    SAAS OPEN DWBI

    23

  • 8/11/2019 OpenSource DWBI_A Primer

    24/50

    KEY FINDINGS FOR BUSINESS TRANSITION TO

    CLOUD TECHNOLOGY IN 2009)

    By 2012, at least 50% of direct commercial revenue attributed toopen-source products or services will come from projects under asingle vendor's patronage.

    Through 2011, less than 50% of Global 2000 IT organizations will

    have implemented a formal open-source adoption andmanagement policy as part of an enterprise software assetmanagement strategy.

    Through 2013, 50% of mainstream IT projects using open-sourcesoftware (OSS) will not achieve cost savings over closed-sourcealternatives.

    Through 2013, 90% of market-leading, cloud-computing providerswill depend on OSS to deliver products and services.

    24

  • 8/11/2019 OpenSource DWBI_A Primer

    25/50

    MOVING TO CLOUD-RECOMMENDATIONS

    Expect vendors to play an increasing role in the governance ofmany market-leading, open-source solutions during the nextseveral years.

    Move aggressively to establish an effective enterprise adoptionpolicy, and bring OSS and hardware under asset management

    controls. Do not expect to automatically save money with OSS or any

    technology without effective financial management. Do expect tocarefully manage open-source solutions in the appropriatescenarios to realize total cost of ownership (TCO) advantages.

    Manage cloud-based software strategies and open-source

    strategies together for maximum effect. Look for synergiesbetween both, and the ability of OSS to move your workloads tothe cloud.

    25

  • 8/11/2019 OpenSource DWBI_A Primer

    26/50

    STRATEGIC PLANNING ASSUMPTION S)

    By 2012, at least 50% of direct commercial revenueattributed to open-source products or services willcome from projects under a single vendor'spatronage.

    Through 2011, less than 35% of Global 2000 ITorganizations will have implemented a formal open-source adoption and management policy.

    Through 2013, 50% of mainstream IT projects usingOSS will not achieve cost savings over closed-source

    alternatives. Through 2013, 90% of market-leading, cloud-

    computing providers will depend on OSS to deliverproducts and services.

    26

  • 8/11/2019 OpenSource DWBI_A Primer

    27/50

    CLOUD USAGE BY VARIOUS ORGANIZATIONS..

    27

  • 8/11/2019 OpenSource DWBI_A Primer

    28/50

    OPENSOURCE BI TOOLS

    28

  • 8/11/2019 OpenSource DWBI_A Primer

    29/50

    TDWI RESEARCH STUDY

    29

  • 8/11/2019 OpenSource DWBI_A Primer

    30/50

    SAAS BI PROCESS FLOW

    30

  • 8/11/2019 OpenSource DWBI_A Primer

    31/50

    HARDWARE ACCESS IN CLOUD OPEN DW BI

    Secure access via web,RDC,VPN or combo..

    Customized server(Choose ur ownCPU,RAM,Disk space)

    Scale up your capacity anytime

    Level 2,3 Server support incl 24 * 7monitoring service

    Applicaton support on demand

    Integrate with your local & Global IT groups

    31

  • 8/11/2019 OpenSource DWBI_A Primer

    32/50

    SECURITY ASPECTS IN CLOUD OPEN DW BI

    Web,RDC,VPN or a combo

    Firewalls

    Certified Data centerSAS 70 type II

    NDA

    Virus protection

    32

  • 8/11/2019 OpenSource DWBI_A Primer

    33/50

    MDM

    MDM success for enterprise open sourceDWBI implementation

    High quality master data is extremely

    valuable to enterprise businessprocesses and analytics

    33

  • 8/11/2019 OpenSource DWBI_A Primer

    34/50

    MDM-KEY CONSIDERATIONS

    Some key considerations for creating amaster reference data source are outlinedbelow:

    Central master reference data modelMapping

    Populating the master

    Publish dataAccess and provisioning

    Ownership and process

    34

  • 8/11/2019 OpenSource DWBI_A Primer

    35/50

    MDM CHECKLIST

    MDM provides the system in obtaining theSingle version of truth across the various

    applications within the enterprise(despite the

    disparity of source systems)The following checklist provides functional

    requirements for implementing and deploying

    MDM in an enterprise environment :.

    35

  • 8/11/2019 OpenSource DWBI_A Primer

    36/50

    MDM CHECKLIST FUNCTIONALITY COVERED

    Profiling,

    Modeling

    Data quality

    Data Stewardship & Governance -Hierarchymanagement & security

    Workflow administration

    36

  • 8/11/2019 OpenSource DWBI_A Primer

    37/50

    MDM-ACTIVE DATA MODEL .

    Multi-Domain capability

    Object-Oriented Data Modeling

    Domain Templates

    Basic Data Validations and Business Rules

    Graphical Modeling Tool

    Multiple Language Support

    37

  • 8/11/2019 OpenSource DWBI_A Primer

    38/50

    MDM-DOMAIN INTEGRATION

    Complete Data Integration Functionality

    Automated Services-Based Integration

    Real-Time and Batch Integration

    SOA Manager/Console

    38

  • 8/11/2019 OpenSource DWBI_A Primer

    39/50

    MDM-DQ INTEGRATION WITH ETL,BI

    Data Profiling

    Accurate Data Match and Merge

    Data Bucketing and Blocking

    Data Augmentation

    Advanced Data Validations and Business Rules

    Data Standardization

    Data Cleansing

    39

  • 8/11/2019 OpenSource DWBI_A Primer

    40/50

    MDM-DATA STEWARDSHIP & GOVERNANCE

    Hierarchy ManagementMultiple and RecursiveHierarchies

    Hierarchy Import and Overlays

    Business Process Management (BPM) and Workflow

    Automated Data Survivorship

    Manual Resolution through intuitive GUI interface

    40

  • 8/11/2019 OpenSource DWBI_A Primer

    41/50

    MDM-ADMINSITRATION

    Historical Views of Hub Data

    Hub Versioning

    Master Data Audit Trail Information

    Roles-Based Security and Active Directory Integration

    Versioning

    41

  • 8/11/2019 OpenSource DWBI_A Primer

    42/50

    TALEND MDM SOLUTION OS PRODUCTS

    IBM Eclipse; JBoss Application Server and Portal;eXist Open database;

    XSD / XML Schema for the XML data models;

    XSLT for data transformation;

    Object programming following the EJB 2.1 standards("Enterprise Java Beans") on Jboss server

    XQuery for queries on XML database;Document/literal WSI norm ("Web ServiceInteroperability") for web services

    Bonita for business process management.

    42

  • 8/11/2019 OpenSource DWBI_A Primer

    43/50

    COST COMPARISION

    43

    Eg: Total cost for a small project, comparing the use of 3 approaches to

    data integration: opensource, proprietary and manual coding

  • 8/11/2019 OpenSource DWBI_A Primer

    44/50

    SUMMARISED COST-SMALL ETL PROJECT

    44

  • 8/11/2019 OpenSource DWBI_A Primer

    45/50

    SUMMARY COST FOR MEDIUM ETL PROJECT

    45

  • 8/11/2019 OpenSource DWBI_A Primer

    46/50

    ODW /BI --WHY IT WILL SUCCEED IN MARKET

    ODW/BI has got lot of winner(financial) groups.. Owners get low cost rapid entry into a data

    warehouses they can extend. Developers get to create/sell new ETL/BI products in

    a new market(Tool providers) Source vendors can solve reporting problems and

    advance new ways to compete(Source providers) Consultants get a bigger market for their services

    (Service providers). Domain exerts can participate by creating new open

    data warehouses using their deep industryknowledge (Service providers).

    46

  • 8/11/2019 OpenSource DWBI_A Primer

    47/50

    ODW /BI --WHY IT WILL SUCCEED IN MARKET

    Development licenses

    Training curve

    Development time

    Run-time licenses

    Deployment of hardware and operating

    system licensesIT operations

    47

  • 8/11/2019 OpenSource DWBI_A Primer

    48/50

    ODW /BI --WHY IT WILL SUCCEED IN MARKET

    Maintenance/subscription

    Maintenance time

    Reliability and predictability of the data

    integration processes

    48

  • 8/11/2019 OpenSource DWBI_A Primer

    49/50

    QUESTIONS?

    Any questions,please get in touch with me at

    [email protected]

    Skype -ebidisolutions

    49

    mailto:[email protected]:[email protected]
  • 8/11/2019 OpenSource DWBI_A Primer

    50/50

    Thank You!