gp introduction 200811

28
公公公公公公公

Upload: iswaha

Post on 27-Jan-2017

263 views

Category:

Devices & Hardware


0 download

TRANSCRIPT

Page 1: Gp Introduction 200811

公司及产品介绍

Page 2: Gp Introduction 200811

Greenplum公司介绍

• Greenplum 公司和 Greenplum 数据库• Greenplum 的市场使命• Greenplum 的产品及特性• Greenplum 的市场地位• Greenplum 的客户和成功案例

Page 3: Gp Introduction 200811

Greenplum 公司介绍• Greenplum 是一家软件公司,提供全球最快、扩展性最好的数据仓库的数据分析与数据管理• 2003 年成立于硅谷( Silicon Valley ),是一家私人控股公司• 投资人包括: Sun, SAP, Sierra Ventures

• 从 2008 年 10 月进驻中国

Page 4: Gp Introduction 200811

Greenplum 数据库•海量并行处理 (Massively Parallel Processing) DBMS•基于 PostgreSQL 8.2.5

• 相同的客户段功能• 增加支持并行处理的技术• 增加支持数据仓库和 BI 的特性

- 外部表 (external tables)/ 并行加载 (parallel loading)- 资源管理- 查询优化器增强 (query optimizer enhancements)

Page 5: Gp Introduction 200811

Greenplum 数据库的关键特性• 查询和加载的并行性• 专门针对 BI/DW 优化的海量并行处理、 shared-nothing architecture 结构

• 处理能力提升 (Dynamic Provisioning)• 直到 PB 级的平稳扩展能力,避免“一锅端”升级方式 (“forklift upgrades” )

• 开放的接口• 支持众多先进的数据集成、分析、报表、数据挖掘产品

• 丰富的 SQL 支持• Any schema, any query

• 开放的硬件支持• 针对所有基于 x86 硬件厂家的优化设计

专门针对超大规模 BI 设计的新一款并行数据库系统

Page 6: Gp Introduction 200811

Greenplum 数据库软件以较小的成本提供可扩展的、快速分析可扩展性• 可平稳的、递增的扩展到 PB 级

高性能• 10 到 100 倍的查询和加载性能

成本效率• 2 orders of magnitude better price /

performance than Teradata开放性• 数据库软件可运行在简单开放的系统上: Gigabit Ethernet,

Solaris, Linux, OSX

Page 7: Gp Introduction 200811

BI/DW 面对的挑战

• 降低成本与复杂度• 减少整合与加载数据的时间• 提高用户查询与分析的性能

• 提高信息及时性和质量• 对财务状况、客户偏好、供应链运营的变化能够前瞻性的作出反应• 捕捉新机会并迅速形成业务模式的能力

Page 8: Gp Introduction 200811

Server

Database

Disk

SUNHPIBM ORACLE

DB2EMC

Hitachi

Operational Systems

Oracle

IBM

SQLServer

MySQL

Postgres

抽取、转换、加载 (ETL)

Informatica

DataStage

报表与分析

• MicroStrategy• Business Objects• Cognos• Hyperion• SAS

Cisco

集成硬件、关系型数据库和存储到一套设备中

BI 体系架构中的数据仓库器具

Page 9: Gp Introduction 200811

Sun/Greenplum 数据仓库器具 (Appliance) 的优势

• TCO — 迅速实施并降低运行成本• 性价比 — 集成硬件、关系型数据库和存储到一套设备中• 空间容挤率 — 每个机柜高达 100TB

• 能耗 – 每 TB 的能耗降低 5 倍• 快速响应时间 – 消除数据流瓶颈的设计

Page 10: Gp Introduction 200811

开放性 = 拥有众多合作伙伴的生态环境

System Vendors

ISVs

Service Providers

Platform

Page 11: Gp Introduction 200811

Gartner Magic Quadrant –Data Warehouse DBMS

2006 Magic Quadrant

2007 Magic Quadrant

Page 12: Gp Introduction 200811

Greenplum 的客户举例

Page 13: Gp Introduction 200811

June2006

January

2007

August

2007

0

500

1,000

2,000

3,000

4,000

2 Billion sharestraded / day

3 Billion sharestraded/day

4.9 Billion shares traded / day+

日交易量

客户案例:

5,000

• 合规性、安全性要求存储每一笔交易• 数据量超过 200TB• 基于 Oracle 进行了 2 年半的努力以失败而告终• 数据量一周之内增加一倍

(7/26/07)• 花费 $10M 构建的数据仓

库 4 个月里容量就不能满足要求

Page 14: Gp Introduction 200811

Case Study

FIM / MySpace:Profile:

• 6th most popular website in the world• Maintains >80% of visits to social networking

sites• More than 100 million user accounts

• Challenge: Perform hyper-targeted marketing against >100 million users; With 10 billion new records daily MySpace solution that was capable of performing in-depth analytics at a massive scale.

• Solution: Greenplum’s Petabyte-Scale loading and parallel execution enabled MySpace to load data at >4TB/hr and return complex queries against 400TB of data in seconds.

• Result: MySpace was able to efficiently and effectively

implement hyper-targeted marketing resulting in significant revenue gains.

“With Greenplum Database, we have increased the accuracy of our targeting by two orders of magnitude. The result is that the experience both for users as well as advertisers is continually improving.” -Adam Bain, VP Technology, Fox Interactive Media

Page 15: Gp Introduction 200811

Case Study

PLDT/SMART:Profile:

• +22 million subscribers• 1 billion text messages per day• World’s largest SMS carrier

• Challenge: Heavily Invested in Oracle for OLTP, ERP, and BI Tools but scale of data warehouse was simply too large.

• Solution: Greenplum’s scaling capability, and support for open standards allowed for PLDT to extract data from Oracle databases into Greenplum Database and use Oracle BI tools to analyze the data against the Greenplum Database.

• Result: The result was an 8X increase in load times and reduction in complex queries from 6hrs to 15min.

“The sheer amount of data involved has always been the key challenge. Greenplum technology enables us to better understand our customers and their needs.” – Alexander G. Seminiano, Department Head of Convergent Systems

Page 16: Gp Introduction 200811

Greenplum体系架构

• 基本架构与组成• 容错机制 ( 镜像 ) • 管理机制 (管理脚步、监控器 )• 关键概念 – 数据分布与查询

Page 17: Gp Introduction 200811

Segment Host

Segment Host

Segment Host

Segment Host

Greenplum 基本体系架构

Client

Master Host

LAN

Inte

rcon

nect

- G

igab

it Et

hern

et S

witc

h

Page 18: Gp Introduction 200811

Client Programs

• psql• pgAdmin III• ODBC• JDBC• Perl DBI• Python• libpq

Client

Page 19: Gp Introduction 200811

Master Host

• 访问系统的入口• 数据库侦听进程 (postgres)• 处理所有用户连接• 建立查询计划• 协调工作处理过程• 管理工具• 系统目录表和元数据(数据字典)• 不存放任何用户数据

Master Host

Page 20: Gp Introduction 200811

Segments

• 每段( Segment )存放一部分用户数据• 一个系统可以有多段• 用户不能直接存取访问• 所有对段的访问都经过 Master

• 数据库监听进程 (postgres)监听来自Master 的连接

Segment Host

Segment Host

Segment Host

Segment Host

Page 21: Gp Introduction 200811

Interconnect

• Greenplum 数据库之间的连接层• 进程间协调和管理• 基于千兆以太网架构• 属于系统内部私网配置• 支持两种协议: TCP or UDP

LANIn

terc

onne

ct -

Gig

abit

Ethe

rnet

Sw

itch

Page 22: Gp Introduction 200811

(private LAN)

segment hostprimary segment

segment hostprimary segment

segment hostprimary segmentclient

master host

master instance

gigabit ethernet

mirror segment

mirror segment

mirror segment

standby master host

synch process

standby master

Greenplum 高可靠性体系架构

Page 23: Gp Introduction 200811

数据冗余—段间镜像

segment host nsegment host 1 segment host 2master host

Greenplum Master

global catalog

Segment 1(mirror)

Segment 1(primary)

Segment 2(mirror)

Segment 2(primary)

Segment n(primary)

Segment n(mirror)

Page 24: Gp Introduction 200811

Master Mirroring – Warm Standby Master

primary master host standby master host

Transaction LogsTransaction Logs

synchronizationprocess

System Catalogs System Catalogs

Page 25: Gp Introduction 200811

PB 级规模的数据加载• Petabyte级规模的数据加载并行的将加载文件分布到所有节点进行加载

• 加载速度数十倍于竞争对手 ( 超过4.5 TB/hr.)

• 随着节点数增加,性能线形增长• 数据分布管理服务自动拆分多个大文件,并提交给相应节点完成加载

Page 26: Gp Introduction 200811

Compatible With Leading BI & ETLFor Complete DW Solutions

External Tables

High Speed

Loader

JDBC™

ODBC

SQL/92

Page 27: Gp Introduction 200811

Greenplum Database – Product Roadmap

Proprietary & Confidential

Page 28: Gp Introduction 200811

硬件:“白盒子”

• Greenplum 和硬件• 参考架构• Greenplum 系统验证工具• 存储空间估算