jarek gryz
DESCRIPTION
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design). Jarek Gryz. Agenda. Motivation Indexes in DB2 Materialized query tables in DB2 Problem definition How does the DB2 Design Advisor tool work ? Experiments. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
IBM Software Group
®
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor(Automating Physical Database Design)
Jarek Gryz
IBM Software Group | DB2 Information Management Software
Agenda
Motivation Indexes in DB2 Materialized query tables in DB2 Problem definition How does the DB2 Design Advisor tool work ? Experiments
IBM Software Group | DB2 Information Management Software
Motivation
EASE OF USE
Indexes and materialized query tables (MQTs) can significantly enhance query performance.
Choosing good indexes and materialized query tables is a complex problem. This tool is intended to help automate this effort.
IBM Software Group | DB2 Information Management Software
Why have an index?
Performance, Performance, Performance Provides order
for example : Joins, GROUP BY, ORDER BY, DISTINCT Limits I/O and data retrieved due to filtering with predicates
Range of values (start/stop keys)
Join predicates Provides index-only access Enforces uniqueness or other constraints Provides statistics useful to the optimizer for cardinality estimation
for example: statistics on number of keys
IBM Software Group | DB2 Information Management Software
Why have Materialized query tables ?
The MQT feature is a powerful feature in DB2 that allows you to precompute and materialize a query result into a table
Full refresh or incremental refresh possible Subsequently it allows similar queries to automatically use the precomputed
data from the MQT to improve performance
CUSTTRANS
cust_id,cust_age, SUM(sales)GROUP BY cust_id,cust_age
cust.cust_id trans.cust_id
Materialized MQT
IBM Software Group | DB2 Information Management Software
Problem Definition
Given:Workload information
System configuration
Database characteristics Determine:
An Index and MQT set that will
•lead to good workload performance
•in a reasonable or specified maximum time
•considering disk space and maintenance constraints
•and be easy to use
The solution .......
IBM Software Group | DB2 Information Management Software
The DB2 Design Advisor
Automatically capture : A representative query workload (potentially compressing it to reduce its size)
The existing database characteristics and environment
System information Determines:
An Index and MQT set that will lead to good ESTIMATED workload response time
• Using DB2's Query Rewriter/Optimizer to suggest candidates
• Using DB2's Optimizer to provide cost / benefit information
• Using a combinatorial algorithm to perform a cost-benefit analysis observing constraints of (1) advisor execution time, (2) disk space and (3) anticipated DB2 costs of creating the entities plus overhead during INSERT / UPDATE / DELETE activity.
• Using sampled or estimated statistics of new entities
• Providing both GUI and command line options for initiating
IBM Software Group | DB2 Information Management Software
Issues in automating physical DB design selection
When to initiate the design algorithm? Health monitor with health indicators such as number of sort overflows
to initiate the advisor What data to use to make the decision?
Automatically capture workload, DB, and system information
Allow work on real data or just statistics How to make the decision?
Method to be described How are the recommendations implemented?
Little user interaction to ask if or when to initiate to gain DBA trust
Online methods to reduce implementation cost
• E.g., online index creation
IBM Software Group | DB2 Information Management Software
The various steps within the Advisor
Generate MQT Candidates
Generate Index Candidates
Estimate Statistics
Consolidate MQTs & Indexes
Combinatorial Search Algorithm
Filter out unused MQTs & Indexes
Query Workload DDL, Statistics
Recommended MQTs & Indexes
IBM Software Group | DB2 Information Management Software
Index candidate generation
During optimization generate virtual candidates when: Predicate exists but no index (e.g., R.A > 5 or R.A=S.B)
Ordering required
Uniqueness required Winning candidates are the virtuals in the final optimized query plan Provides candidates we know the optimizer will use
IBM Software Group | DB2 Information Management Software
MQT Candidate Generation Candidates are generated from original queries, logical views and common
expressions which are formed by matching multiple queries. Uses multiquery optimization (MQO)
Provides candidates we know the queries will use Candidates can contain table references in a federated DB (tables on different servers)
SELECT store_name, cust_name, SUM(sales) as ssFROM Trans T, Store S, Cust CWHERE T.store_id = S.store_id AND T.cust_id = C.cust_id WHERE cust_age >= 25GROUP BY store_name, cust_name
SELECT store_name, SUM(sales) as ssFROM Trans T, Store S, Cust CWHERE T.store_id = S.store_id AND T.cust_id = C.cust_id WHERE year = 2002 GROUP BY store_name
SELECT store_name, cust_name, cust_age, year, SUM(sales) as ssFROM Trans T, Store S, Cust CWHERE T.store_id = S.store_id AND T.cust_id = C.cust_id GROUP BY store_name, cust_name, cust_age, year
Query 1 Query 2
Candidate AST
IBM Software Group | DB2 Information Management Software
Combinatorial search algorithm
The search phase uses a knapsack algorithm and random swap method to choose the recommended index and MQT set
Requires each candidate to have a cost-benefit ratio (cbratio) Benefit based on estimated cost with and without MQT usage (updates have
negative benefit)
Cost based on disk space usage REFRESH DEFERRED or IMMEDIATE MQTs recommended.
Assumption (DEFERRED):
• estimated time for population = full refresh cost
• one refresh cost included in the calculation IMMEDIATE changes added in plans for insert/update/deletes
If indexes on candidate MQT are selected, then the MQT must be selected as well
IBM Software Group | DB2 Information Management Software
Experiments Detect what MQO candidates adds to performance improvement OLAP DB and workload Workload estimated execution time (WET)
Type of MQT Selection WET without MQTs WET with MQTs % diff in WETs Num. of MQTs
MQTs from queries 493.7 seconds 353.0 seconds 28.5% 7
MQTs from MQO 493.7 seconds 352.0 seconds 28.4% 4
IBM Software Group | DB2 Information Management Software
Self-optimizing
Self-configuring
Self-healing
Autonomic capabilities in DB2 Stinger
• Configuration Advisor• Design Advisor
• Health Monitor • Recommendation Advisor
• Automatic page write integrity checking• Automatic index reorganization• Recovery Expert • Fault Monitor• Backup
• Self-tuning • Automated
• HADR• DB2/Websphere Integration
• log and trace analyzer
• Query compiler • query rewrite • cost based optimization
• Automatic query parallelism degree • Self-configuring/optimizing utilities • Adaptive utility throttling
• Runstats• Performance Expert• Query patroller workload manager• Self-tuning load
• advises: Indexes, MDCs, MQTs, Partitioning
• Automated Table Maintenance• Runstats• Reorg• Statistics profiling
IBM Software Group | DB2 Information Management Software