emc/greenplum driving the future of data warehousing … · – no need for manual partitioning or...

24
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC 2010 Forum Series

Upload: buinhu

Post on 07-Apr-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

1© Copyright 2010 EMC Corporation. All rights reserved.

EMC/GreenplumDriving the Future of Data Warehousing and Analytics

EMC 2010 Forum Series

2© Copyright 2010 EMC Corporation. All rights reserved.

E M C A CQ U I R E S G R E E N P L U M

“Greenplum, with expertise in the massively parallel arena, will give the storage giant a boost in big-data computing.”

– InformationWeek –

Greenplum Becomes the Foundation of EMC’s Data Computing Division

3© Copyright 2010 EMC Corporation. All rights reserved.

About EMC’s Data Computing Division• Driving the Future of Data Warehousing and Analytics

• Core Products:– Greenplum Database 4.0 – true MPP architecture and features that

meet mandatory requirements of enterprise-class data warehousing– Greenplum Database Single-Node Edition – free version

for data analysis power users– Greenplum Data Computing Appliance – price/performance leadership,

industry’s fastest data loading, and private cloud ready– Greenplum Chorus – the world’s first Enterprise

Data Cloud Platform

• We enable global organizations to gain greater insight and value from their data than ever before possible

4© Copyright 2010 EMC Corporation. All rights reserved.

Greenplum Database 4.0: Critical Mass Innovation

• 4.0 represents industry leading innovations in:– Workload Management– Fault-Tolerance– Advanced Analytics

• Culmination of more than +7 years of research and development

• First vendor to achieve critical mass and maturity across all necessary aspects of enterprise class DBMS platforms

• Genuine floor-sweep replacement option for Teradata, Oracle, DB2, and SQL Server

5© Copyright 2010 EMC Corporation. All rights reserved.

Greenplum Single Node Edition

• Free, state-of-the-art, parallel analytic database

• Fully parallel execution leverages multi-core processors

• No storage capacity cap – from GBs to 10s of TBs

• Hybrid row and column-oriented processing

• Ability to expand beyond SNE to massively parallel edition of Greenplum database

SingleNode

Edition

6© Copyright 2010 EMC Corporation. All rights reserved.

Data Warehousing Requirements

FastData

Loading Extreme Performance& Elastic Scalability

Unified Data Access

7© Copyright 2010 EMC Corporation. All rights reserved.

Key Technology Pillars

World’s fastest data loading

• Scatter / Gather streaming technology

Fast query execution with linear scalability

• Shared-nothing MPP architecture

Unified data access across the enterprise

• Dynamic query optimization and workload management

8© Copyright 2010 EMC Corporation. All rights reserved.

Scatter GatherTM Streaming for the world’s fastest data loading speeds

• Parallel-everywhere approach to data loading

• Avoids the need for a “loader” tier of servers

• Supports both large batch and continuous near-real-time loading patterns

9© Copyright 2010 EMC Corporation. All rights reserved.

Shared-Nothing ArchitectureMassively Parallel Processing (MPP)

• Most scalable database architecture– Optimized for BI and analytics

• Provides automatic parallelization– No need for manual partitioning or tuning– Just load and query like any database

• Tables are distributed across segments– Each has a subset of the rows

• Extremely scalable and I/O optimized– All nodes can scan and process in parallel– No I/O contention between segments

• Linear scalability by adding nodes– Each adds storage, query performance and loading

performance

Interconnect

Loading

10© Copyright 2010 EMC Corporation. All rights reserved.

Unified Data Access Across The Enterprise

• Workload Management

– Connection management controls how many users can be connected and assigns them to a queue

– User-based resource queues allow for control of the total number or cost of queries allowed at any point in time.

• Dynamic Query Prioritization

– Patent pending technique of dynamically balancing resources across running queries

– Allows DBAs to control query priorities in real-time, or determine default priorities by resource queue

11© Copyright 2010 EMC Corporation. All rights reserved.

Greenplum Chorus: The World’s First Enterprise Data Cloud Platform• World’s first Enterprise Data Cloud

Platform (EDC), enabling:– Self-service provisioning– Data virtualization services– Data collaboration

• Customers deploy Chorus along with VMware and the Greenplum Database to create a net new & self-service analytic infrastructure

• Chorus can significantly accelerate the time and ease with which companies extract value and insight from their data

12© Copyright 2010 EMC Corporation. All rights reserved.

Greenplum Chorus: Core Design Philosophies

• Secure– Provide comprehensive and granular access control over

whom is authorized to view and subscribe to data within Chorus

• Collaborative– Facilitate the publishing, discovery, and sharing of data

and insight using a social computing model that appears familiar and easy-to-use

• Data-centric– Focus on the necessary tooling to manage the flow and

provenance of data sets as they are created/shared within a company

• MAD Skills in Action– Build a platform capable of supporting the magnetic, agile,

and deep principles of MAD Skills

13© Copyright 2010 EMC Corporation. All rights reserved.

• 150+ global enterprise customers

• $250+ Million saved by customers choosing Greenplum over Teradata

• 5+ Billion shares analyzed daily by Financial Markets using Greenplum

• 20+ Trillion rows being mined for business value

• 1+ Billion consumers receiving more secure and personalize services from Greenplum customers

Our Customers Include…

14© Copyright 2010 EMC Corporation. All rights reserved.

Customer Example: Regional Bank - Teradata Bake-Off• Business Problem

– DW and data mart consolidation across banking regional bank operations

– Improved query performance for both operational and ad-hoc reporting

– In-database analytics to support advanced data mining initiatives

• Existing Solution – Oracle

• Benefits over Teradata– Open-systems, commodity HW– Significantly better TCO– Incremental scalability– Better price-performance

Resp

onse

Tim

e (M

in)

“We turned to Greenplum because its massively parallel data warehousing approach is the only one robust and cost effective to grow with us over time.”

- SVP Corporate Finance

Response Time Improvement

15© Copyright 2010 EMC Corporation. All rights reserved.

Customer Example: Investment Firm -Netezza Bake-Off• Business Problem

– Exorbitant maintenance and support costs for Enterprise Data Warehouse

– Poor data load and ad-hoc query performance on existing Oracle system

– Scalable platform capable of consolidating multiple decision support DBMS

• Existing Solution – Oracle

• Benefits over Netezza– Open-systems, commodity HW– Support model that fit with their existing

data center operations– Incremental scalability– Better price-performance

Resp

onse

Tim

e (M

in)

“Queries that timed-out after 8 hours now run in less than 10 minutes.”-Sr. Director Data Warehousing

Response Time Improvement

16© Copyright 2010 EMC Corporation. All rights reserved.

Customer Example: Stock Exchange

• Business Problem– Analytic database platform

standard across global exchange operations

• Key Criteria – Mission critical reliability– High-concurrency, mixed-workload– Incremental scalability

• Data Size– 10TB - multi-hundred TB systems– Loading 1TB/day to 2TB/day

• Result– 6 production systems deployed

globally

“Greenplum offers strong scalability advantages due to its highly parallel model that enables us to simply add more servers as data volumes expand.”

- CIO

TB/d

ay

17© Copyright 2010 EMC Corporation. All rights reserved.

Customer Example: Internet Media• Business Problem

– Multi-hundred TB EDW to support $1B Internet advertising operation

– True mixed-workload environment supporting production reporting, ad-hoc data mining, and operational data services

• Competition – Teradata, HP, Oracle, Netezza, Aster Data

• Data Size– 1 trillion row fact table, adding 3TB/day

• Results– Running successfully in production ~ 2

years– Continuous operations mode while moving

data centers across the countryN

et D

ata

Size

(TB

)

“Greenplum will be an invaluable partner as we continue to put our data to work in new ways that will improve both the user and advertiser experience on our network of sites.”- EVP of Product, Tech &Ops

Scalability & Reliability

18© Copyright 2010 EMC Corporation. All rights reserved.

Greenplum Industry Solutions

Mission: Drive the Adoption of Greenplum Software through the Creation of Industry-specific Analytic Solutions

Strategic Objectives:

Address Business Analytic Requirements of Specific Industries

Raise Value Proposition from Technology to Business Solutions

Develop Ecosystem of Analytic Application Service Providers and ISVs

19© Copyright 2010 EMC Corporation. All rights reserved.

Industry Sales Focus

Open Interfaces

Database

BITools

Industry-specificAnalytic

ApplicationServices and ISVs

Industry-Specific

DataFeeds

ETL Tools

ChorusCollaboration

Greenplum Analytic Application Services and ISVs

FinancialServices

RetailTelco

MediaEntertainment

Healthcare

Public Sector---------

FederalSLED

Energy-------

UtilitiesOil & Gas

20© Copyright 2010 EMC Corporation. All rights reserved.

Industry Sales & Strategic Partnerships Ecosystem

Database

BITools

Industry-specificAnalytic

ApplicationServices and ISVs

Industry-Specific

DataFeeds

ETL Tools

ChorusCollaboration

Impl

emen

tatio

n Se

rvic

esPa

rtne

rs

Infrastructure Partners

Greenplum Analytic Application Services and ISVs

FinancialServices

RetailTelco

MediaEntertainment

Healthcare

Public Sector---------

FederalSLED

Energy-------

UtilitiesOil & Gas

Open Interfaces

21© Copyright 2010 EMC Corporation. All rights reserved.

G-Tick PlatformEMC Secure Tick Data Management for real-time and historical data

Feed

Han

dler

Real-time data

Historical data

TradingDesks

AlgoTrading

PriceEngine

OrderMgmt

System

RiskModeling

ComplianceSurveillance

GemFire– in-memory processing databaseGreenplum – high performance analytic engine

EMC Components & Partner components highlighted

Trade, Position,Market Data Snapshots

Business Intelligence& Analytics Tools

TradeStrategies

EMC & Partners

22© Copyright 2010 EMC Corporation. All rights reserved.

Greenplum Value Prop

Scalable Performance

Efficiency

Improvement

Revenue Growth

23© Copyright 2010 EMC Corporation. All rights reserved.

Greenplum Value Prop

Greenplum provides an agile analytics environment to address the life cycle of analytics in an enterprise.

Chorus, Greenplum’s Enterprise Data Cloud, provides a platform to consolidate and virtualize the various data mart silos into a private cloud environment.

Greenplum is building out industry-specific solution suites where a higher level of integration is required to drive better time-to-value for various lines of business.

Greenplum enables extreme scale, elastic expansion, self service provisioning and data collaboration.

24© Copyright 2010 EMC Corporation. All rights reserved.

Thank you