jarek gryz

15
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz

Upload: hayden-dodson

Post on 31-Dec-2015

24 views

Category:

Documents


1 download

DESCRIPTION

Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design). Jarek Gryz. Agenda. Motivation Indexes in DB2 Materialized query tables in DB2 Problem definition How does the DB2 Design Advisor tool work ? Experiments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Jarek Gryz

IBM Software Group

®

Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor(Automating Physical Database Design)

Jarek Gryz

Page 2: Jarek Gryz

IBM Software Group | DB2 Information Management Software

Agenda

Motivation Indexes in DB2 Materialized query tables in DB2 Problem definition How does the DB2 Design Advisor tool work ? Experiments

Page 3: Jarek Gryz

IBM Software Group | DB2 Information Management Software

Motivation

EASE OF USE

Indexes and materialized query tables (MQTs) can significantly enhance query performance.

Choosing good indexes and materialized query tables is a complex problem. This tool is intended to help automate this effort.

Page 4: Jarek Gryz

IBM Software Group | DB2 Information Management Software

Why have an index?

Performance, Performance, Performance Provides order

for example : Joins, GROUP BY, ORDER BY, DISTINCT Limits I/O and data retrieved due to filtering with predicates

Range of values (start/stop keys)

Join predicates Provides index-only access Enforces uniqueness or other constraints Provides statistics useful to the optimizer for cardinality estimation

for example: statistics on number of keys

Page 5: Jarek Gryz

IBM Software Group | DB2 Information Management Software

Why have Materialized query tables ?

The MQT feature is a powerful feature in DB2 that allows you to precompute and materialize a query result into a table

Full refresh or incremental refresh possible Subsequently it allows similar queries to automatically use the precomputed

data from the MQT to improve performance

CUSTTRANS

cust_id,cust_age, SUM(sales)GROUP BY cust_id,cust_age

cust.cust_id trans.cust_id

Materialized MQT

Page 6: Jarek Gryz

IBM Software Group | DB2 Information Management Software

Problem Definition

Given:Workload information

System configuration

Database characteristics Determine:

An Index and MQT set that will

•lead to good workload performance

•in a reasonable or specified maximum time

•considering disk space and maintenance constraints

•and be easy to use

The solution .......

Page 7: Jarek Gryz

IBM Software Group | DB2 Information Management Software

The DB2 Design Advisor

Automatically capture : A representative query workload (potentially compressing it to reduce its size)

The existing database characteristics and environment

System information Determines:

An Index and MQT set that will lead to good ESTIMATED workload response time

• Using DB2's Query Rewriter/Optimizer to suggest candidates

• Using DB2's Optimizer to provide cost / benefit information

• Using a combinatorial algorithm to perform a cost-benefit analysis observing constraints of (1) advisor execution time, (2) disk space and (3) anticipated DB2 costs of creating the entities plus overhead during INSERT / UPDATE / DELETE activity.

• Using sampled or estimated statistics of new entities

• Providing both GUI and command line options for initiating

Page 8: Jarek Gryz

IBM Software Group | DB2 Information Management Software

Issues in automating physical DB design selection

When to initiate the design algorithm? Health monitor with health indicators such as number of sort overflows

to initiate the advisor What data to use to make the decision?

Automatically capture workload, DB, and system information

Allow work on real data or just statistics How to make the decision?

Method to be described How are the recommendations implemented?

Little user interaction to ask if or when to initiate to gain DBA trust

Online methods to reduce implementation cost

• E.g., online index creation

Page 9: Jarek Gryz

IBM Software Group | DB2 Information Management Software

The various steps within the Advisor

Generate MQT Candidates

Generate Index Candidates

Estimate Statistics

Consolidate MQTs & Indexes

Combinatorial Search Algorithm

Filter out unused MQTs & Indexes

Query Workload DDL, Statistics

Recommended MQTs & Indexes

Page 10: Jarek Gryz

IBM Software Group | DB2 Information Management Software

Index candidate generation

During optimization generate virtual candidates when: Predicate exists but no index (e.g., R.A > 5 or R.A=S.B)

Ordering required

Uniqueness required Winning candidates are the virtuals in the final optimized query plan Provides candidates we know the optimizer will use

Page 11: Jarek Gryz

IBM Software Group | DB2 Information Management Software

MQT Candidate Generation Candidates are generated from original queries, logical views and common

expressions which are formed by matching multiple queries. Uses multiquery optimization (MQO)

Provides candidates we know the queries will use Candidates can contain table references in a federated DB (tables on different servers)

SELECT store_name, cust_name, SUM(sales) as ssFROM Trans T, Store S, Cust CWHERE T.store_id = S.store_id AND T.cust_id = C.cust_id WHERE cust_age >= 25GROUP BY store_name, cust_name

SELECT store_name, SUM(sales) as ssFROM Trans T, Store S, Cust CWHERE T.store_id = S.store_id AND T.cust_id = C.cust_id WHERE year = 2002 GROUP BY store_name

SELECT store_name, cust_name, cust_age, year, SUM(sales) as ssFROM Trans T, Store S, Cust CWHERE T.store_id = S.store_id AND T.cust_id = C.cust_id GROUP BY store_name, cust_name, cust_age, year

Query 1 Query 2

Candidate AST

Page 12: Jarek Gryz

IBM Software Group | DB2 Information Management Software

Combinatorial search algorithm

The search phase uses a knapsack algorithm and random swap method to choose the recommended index and MQT set

Requires each candidate to have a cost-benefit ratio (cbratio) Benefit based on estimated cost with and without MQT usage (updates have

negative benefit)

Cost based on disk space usage REFRESH DEFERRED or IMMEDIATE MQTs recommended.

Assumption (DEFERRED):

• estimated time for population = full refresh cost

• one refresh cost included in the calculation IMMEDIATE changes added in plans for insert/update/deletes

If indexes on candidate MQT are selected, then the MQT must be selected as well

Page 13: Jarek Gryz

IBM Software Group | DB2 Information Management Software

Experiments Detect what MQO candidates adds to performance improvement OLAP DB and workload Workload estimated execution time (WET)

Type of MQT Selection WET without MQTs WET with MQTs % diff in WETs Num. of MQTs

MQTs from queries 493.7 seconds 353.0 seconds 28.5% 7

MQTs from MQO 493.7 seconds 352.0 seconds 28.4% 4

Page 14: Jarek Gryz

IBM Software Group | DB2 Information Management Software

Self-optimizing

Self-configuring

Self-healing

Autonomic capabilities in DB2 Stinger

• Configuration Advisor• Design Advisor

• Health Monitor • Recommendation Advisor

• Automatic page write integrity checking• Automatic index reorganization• Recovery Expert • Fault Monitor• Backup

• Self-tuning • Automated

• HADR• DB2/Websphere Integration

• log and trace analyzer

• Query compiler • query rewrite • cost based optimization

• Automatic query parallelism degree • Self-configuring/optimizing utilities • Adaptive utility throttling

• Runstats• Performance Expert• Query patroller workload manager• Self-tuning load

• advises: Indexes, MDCs, MQTs, Partitioning

• Automated Table Maintenance• Runstats• Reorg• Statistics profiling

Page 15: Jarek Gryz

IBM Software Group | DB2 Information Management Software