© 2009 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice.
BI Forum 2009
Principy architektury MPP datového skladu
26. listopadu 2009
Václav Hubka - Hewlett-Packard
2
27 November
2009
Agenda
• Spotřebiče pro enterprise datové sklady (EDWH)
• Principy návrhu datového skladu na platfrormě
„EDWH spotřebičů“
• Představení Operational DWH
• Architektura MPP datového skladu pro Operational
DWH
© 2009 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice.
Data Warehouse Appliances
DWH appliances
• Provide:
• Systems that are packaged (tightly integrated stack, bundled, balanced, pre-tuned, pre-installed; single support contact) and optimized for BI workloads
• Low maintenance and support through a single source
• Fast installation
• Integrated management and automated system administration. Increased functionality can not mean increased administrative complexity.
• Easy incremental expansion
• Guaranteed performance for specific purposes, use of new technologies to drive high performance
• Lower TCO and faster ROI
• Have established market acceptance
• 10TB or more in first phase is common today
• Have begun to host EDWs
• Satisfy real-time, on-demand, and/or Operational BI requirements outside of EDW
• 38% of TDWI research survey respondents have deployed, are currently evaluating, or plan to evaluate soon
Data Warehouse Appliance Pros
39% = Pre-tuned for
data warehousing
18% = Fast query performance
12% = Reduced system integration
11% = Fast installation
6% = Low cost
8% = Easy incremental expansion
6% = Other
What do you think is the leading benefit of
a data warehouse appliance?
Sources: TDWI Tech Survey, August 2005, 119 responses
TDWI Tech Survey, February 2007, 112 responses
Analyze
data
Physical
database
design
Load
data
Index
and
aggregate
Query
data
Ongoing
tuning
• Performance is dependent on getting a good physical design
• Time to market with new data is limited by skills and resources
• Query performance is poor when queries don’t take advantage of the design (no index scans)
Traditional data warehouse approach
• High-power-to-data ratio can operate without database tuning
− Load a TB in 1 hour
− Scan a TB in 30 seconds
• Neoview features take performance beyond scanning
− Next-generation optimizer can resolve complex queries efficiently
− pMesh dual switch fabrics provide massive bandwidth between nodes
− “Skewbusting” technology to resolves the traditional skew issues of MPP system
Neoview platform performance
Load
data
Query
data
• Overpowering a workload with inexpensive power has many benefits
− The ability to perform queries that no one anticipated
− “Load-and-go” simplicity of design
− Reduced indices, materialized views, etc., to manage and tune
− Enough power to quickly drop, reload, and restructure tables
Simply put…
• Random I/O− Scan a 1 TB table to find 1,000 rows using 256 inexpensive
processors in 30 seconds
− Same access with an index takes a fraction of a second using only one of the 256 inexpensive processors
• Common aggregations− Brute force does wonders for aggregating data on the fly so that
you don’t need to prebuild materialized views
− The same brute force can build a MV quickly, reducing CPU consumption 1,000x at runtime
• Both limit concurrency− How many times can one table scan in a day?
What the appliance model is not good at…
Performance repository
10–1,000x faster
Much higher concurrency
Beyond the database appliance with the Neoview platform
• Outperforms a pure appliance with little or no design decisions
• Permits you to “graduate” tables to a more enhanced design that supports extremely high concurrency
• Mixed workload support allows both optimized and nonoptimized workloads to coexist
Load
data
Query
data
Improve
database
design
© 2009 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice.
Operational DWH
12
Operational BI
Next
generation
EDW
Responding
to business
events as
they
occur
Operating
priorities
Market
signals
Real-Time
streams
Timely, operational decisions
• Real-time analytics
• 24 x 7 availability
Rich
interactions
across all
touch
points
Self-
service
Call
center
Suppliers
Know your customer and
partners
• 1000s of concurrent users
• Single version of the truth
Leveraging information as a
strategic asset
Web
Operational
SystemUnstructured
data
Richer data inputs
• Large data volumes
• Complex, mixed workloads
13
Performing multiple decisions by a large number of users characterizes operational BI
Number of decisions
Collective
impact of
decisions
High
High
Strategic
Business
Intelligence
Operational
Business Intelligence
Tactical Business
IntelligenceTactical Business
IntelligenceLow
Low
14
The evolution of data warehouses in operational BI environments
Built into business processes
Automating actions
Continuous online updates
Enterprise-wide resource
Thousands of users performing many types of tasks
Mission-critical
Back room analysis
Reporting
Offline batch updates
Departmental data marts
Few users doing strategic analysis
Availability not critical
Traditional BI Operational BI
15
• Enterprise data warehouse, for large enterprise operational needs
• Scales to thousands of users, terabytes of data
• Rapidly deployed, easily managed, compatible with existing BI tools
• Integrated solution, HP innovation and flexible- standards-based
components
Integrated
Hardware
OS
DBMS
Real time
Updates
Query ToolsData Integration
Concurrent Users
HP Neoview & Operational Business Intelligence –Real-time insight for your business
HP Neoview
Mixed
Workloads
Enterprise Data Warehouse
16
• Shared-nothing MPP− Each processor a unit of parallel work
• Database virtualization− Data transparently hashed across all disks
• Parallel query execution− Queries divided into subtasks and executed in
parallel with results streamed through memory
• Real-time data warehousing− Mixed workload & transactional heritage
• Unrivaled availability− Continuously available in spite of any single
point failure; online database operations
• Extreme processing power− 1 Intel® Itanium® processor to 2 RAID 1
volumes
Architected for availability, scalability, and performance
BI clie
nt
ET
L c
lients
91 billion rows
of data (20TB)
Neoview is designed for changing customer requirements
Analytical
queries120 concurrent
220 queries
Report
queries300 concurrent
1.85 million
queries
Adhoc
surprise4 concurrent
8 queries
SLA: 20 min/2 hours SLA: 5 Seconds
SLA: 2
Seconds
SLA: 2 Seconds
SLA: 200 ms
SLA: 10 min
Neoview: 2min to 46min
Neoview: 7min
Neoview: 1.6sec
Neoview: 0.5sec
Neoview: 200ms
Neoview: 167ms
All workloads run
concurrently
Online
ingest7.5m rows of data
every 10 min
Tactical-3400 concurrent
13.4 million
queries
Tactical-2
350 concurrent
6.4 million
queries
Tactical-1320 concurrent
3.6 million
queries
© 2009 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice.
Architecture of an MPP DWH – HP Neoview
19
PS PSCS CS
Neoview segment architecture
• 16 nodes per segment
• Dual active (X,Y) interconnect fabrics
• Multiple fat I/O pipes
• Dual cluster switches for inter-segment I/O
X Fabric Y Fabric
BladeBlade
P01,P02 P03,P04 P19,P20P07,P08 P09,P10 P11,P12 P13,P14P05 P06 P21,P22P15,P16 P17,P18 P23,P24 P25,P26 P27,P28 P29,P30 P31,P32
P01 P14
M15 M28
M01 M14
P15 P28
P29 P42
M29 M42
B27,B30 B05,B28 B12,B21B02,B32 B03,B06 B04,B13 B07,B09B01,B31 B15,B17B08,B10 B11,B14 B16,B18 B19,B22 B20,B29 B23,B25 B24,B26
Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 Node 13 Node 14 Node 15 Node 16
• RAID1 (mirrored)
disk protection
• Active reading from
both RAID1 copies
• Separate controller
writes for integrity
• End-to-end disk
checksum integrity
20
Neoview multi-segment architecture
• Active dual fault tolerant fabrics
• Multi-layered clustering (>128p)
• 500 MB/sec dedicated links
• Each segment adds bandwidth
• Cross sectional bandwidth up to 128 GB/sec
FT Clustered Mesh Fabric 1 to 16 segments
Neoview Segment Neoview Segment
Neoview Segment Neoview Segment
21 -216 -
Hash of partitioning key
Partitioning key
• The key is transparently hashed to identify data placement
• Balanced data distributions across all disks
• Balanced SQL execution across all processors
• Table, index, and materialized view support
Table
ATable
B
Table
C
Highly Parallelized Database
Neoview Shared Nothing Architecture
22
23 -236 -
− Data indexed for fast access by the clustering key
− Clustered data for fast sequential access
Hash by
order #
Line itemCluster by
order date, order number,item number
Cluster by
order date, order number
Partitioning determined
by hash of partitioning
key
Data indexed and clustered
by clustering key
Hash by
order #
Order
Indexed Clustering for Performance
© 2009 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice.
Questions?
Thank you
25 -256 -
Co-locating Index and Base Table Data− Eliminates cross-processor messaging overhead
− Fast and efficient indexing for query speed-up
Cluster by
order date, order number,item number
Cluster by
item number
Hash by
order #
Line item
Hash by
order #
Index on
line item
26
Parallel UOW drives MPP performance
• Measured performance
− Scan: 286 MB/sec/CPU2.34 GB/sec/segment
− Ingest: 1MB/sec/CPUto 256 CPUs using 3 loaders
− Extract: 2.5MB/sec/CPUto 64 CPUs
− Insert: 1MB/sec/connection at 128 connections
− Fetch: 1.5MB/sec/connection at 128 connections
RAID 1 RAID 1
LDV3-P LDV3-B
Itanium 2 processor Itanium 2 processor
SCAN
Data
ManagementData
ManagementData
management
Data
ManagementData
ManagementData
management
LDV1-P LDV1-B