6.3 my sql queryoptimization_part2

[email protected]

Hệ quản trị cơ sở dữ liệuHệ quản trị cơ sở dữ liệu

Dư Phương HạnhBộ môn Hệ thống thông tin

Khoa CNTT, trường Đại học Công

nghệ

Đại học Quốc gia Hanoi

[email protected]

Query OptimizationQuery Optimization

Hệ quản trị CSDL @ BM HTTT2

OutlineOutline

Optimization Overview Optimizing SQL Statement Optimizing Database Structure Query Execution Plan Measuring Performance Internal Details of Mysql Optimizations

Reading: Chap 12+13+14 of Ramakrishnan

http://dev.mysql.com/doc/refman/5.5/en/optimization.html


Query Execution Plan


Query Execution Plan

The set of operations that the optimizer chooses to perform the most efficient query is called the “query execution plan”

Depending on the details of your tables, columns, indexes, and the conditions in your WHERE clause, the MySQL optimizer considers many techniques to efficiently perform the lookups involved in an SQL query. – A query on a huge table can be performed without reading all the

rows;

– A join involving several tables can be performed without comparing every combination of rows.

Your goals are to recognize the aspects of the EXPLAIN plan that indicate a query is optimized well.


Optimizing Queries with EXPLAIN

EXPLAIN SELECT select_options

MySQL displays information from the optimizer about

how tables are joined and in which order.

– To give a hint to the optimizer to use a join order

corresponding to the order in which the tables are named

in a SELECT statement, begin the statement with

SELECT STRAIGHT_JOIN rather than just SELECT.

You can see where you should add indexes to tables

so that the statement executes faster.


EXPLAIN output format EXPLAIN returns a row of information for each table

used in the SELECT statement. – In the output, the tables are listed in the order that MySQL

would read them while processing the statement.

MySQL solves all joins using nested-loop join method. – This means that MySQL reads a row from the first table, and

then finds a matching row in the second table, the third table, and so on.

– When all tables are processed, MySQL outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows.

– The next row is read from this table and the process continues with the next table.


EXPLAIN output columnColumn Meaning

id The SELECT identifier

select_type The SELECT type

table The table for the output row

type The join type

possible_keys The possible indexes to choose

key The index actually chosen

key_len The length of the chosen key

ref The columns compared to the index

rows Estimate of rows to be examined

Extra Additional information


EXPLAIN output column select_type:

– SIMPLE: Simple SELECT (not using UNION or subqueries)– PRIMARY: Outermost SELECT– UNION: Second or later SELECT statement in a UNION– DEPENDENT UNION: Second or later SELECT statement in a UNION,

dependent on outer query– UNION RESULT: Result of a UNION.– SUBQUERY: First SELECT in subquery.– DEPENDENT SUBQUERY: First SELECT in subquery, dependent on

outer query.– DERIVED: Derived table SELECT (subquery in FROM clause).– UNCACHEABLE SUBQUERY:A subquery for which the result cannot be

cached and must be re-evaluated for each row of the outer query.– UNCACHEABLE UNION: The second or later select in a UNION

that belongs to an uncacheable.


EXPLAIN output column Type: The following list describes the join types,

ordered from the best type to the worst:– all: A full table scan is done for each combination of rows

from the previous tables. – system: The table has only one row (= system table). This

is a special case of the const join type.– const: The table has at most one matching row, which is

read at the start of the query values from the column in this row can be regarded as constants by the rest of the optimizer. Const tables are very fast because they are read only once. Const is used when you compare all parts of a PRIMARY KEY or UNIQUE index to constant values.


EXPLAIN output column– eq_ref: One row is read from this table for each combination

of rows from the previous tables. This is the best possible join type. It is used when all parts of an index are used by the join and the index is a PRIMARY KEY or UNIQUE NOT NULL index.

Examples:SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column;

SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1;


EXPLAIN output column– ref: All rows with matching index values are read from this

table for each combination of rows from the previous tables. Ref is used if the join uses only a leftmost prefix of the key or if the key is not a PRIMARY KEY or UNIQUE (index cannot select a single row based on the key value).

Examples:SELECT * FROM ref_table WHERE key_column=expr;

SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column;

SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1;


EXPLAIN output column– range: Only rows that are in a given range are retrieved,

using an index to select the rows. The key column in the output row indicates which index is used. The key_len contains the longest key part that was used. The ref column is NULL for this type.

Examples:SELECT * FROM tbl_name WHERE key_column = 10;

SELECT * FROM tbl_name WHERE key_column BETWEEN 10 and 20;

SELECT * FROM tbl_name WHERE key_part1 = 10 AND key_part2 IN (10,20,30);


EXPLAIN output column

– ...

(Read more at http://dev.mysql.com/doc/refman/5.5/en/explain-output.html)


Optimizing join exampleEXPLAIN SELECT tt.TicketNumber, tt.TimeIn, tt.ProjectReference, tt.EstimatedShipDate, tt.ActualShipDate, tt.ClientID, tt.ServiceCodes, tt.RepetitiveID, tt.CurrentProcess, tt.CurrentDPPerson, tt.RecordVolume, tt.DPPrinted, et.COUNTRY, et_1.COUNTRY, do.CUSTNAME

FROM tt, et, et AS et_1, do

WHERE tt.SubmitTime IS NULL

AND tt.ActualPC = et.EMPLOYID

AND tt.AssignedPC = et_1.EMPLOYID

AND tt.ClientID = do.CUSTNMBR;


Optimizing join example

Table Column Data Type

tt ActualPC CHAR(10)

tt AssignedPC CHAR(10)

tt ClientID CHAR(10)

et EMPLOYID CHAR(15)

do CUSTNMBR CHAR(15)

Table Index

tt ActualPC

tt AssignedPC

tt ClientID

et EMPLOYID (primary key)

do CUSTNMBR (primary key)



Initially, before any optimizations have been performed, the EXPLAIN statement produces the following information:

table type possible_keys key key_len ref rows Extra

et ALL PRIMARY NULL NULL NULL 74

do ALL PRIMARY NULL NULL NULL 2135

et_1 ALL PRIMARY NULL NULL NULL 74

tt ALL AssignedPC, NULL NULL NULL 3872

ClientID,

ActualPC



This output indicates that MySQL is generating a Cartesian product of all the tables;

This takes quite a long time, because the product of the number of rows in each table must be examined. For the case at hand, this product is 74 × 2135 × 74 × 3872 = 45,268,558,720 rows. If the tables were bigger long time…



One problem here is that MySQL can use indexes on columns more efficiently if they are declared as the same type and size.

In this context, VARCHAR and CHAR are considered the same if they are declared as the same size. tt.ActualPC is declared as CHAR(10) and et.EMPLOYID is CHAR(15), so there is a length mismatch.

ALTER Table…



Executing the EXPLAIN statement again produces this result:


tt ALL AssignedPC, NULL NULL NULL 3872 Using ClientID, where ActualPC

do ALL PRIMARY NULL NULL NULL 2135

et_1 ALL PRIMARY NULL NULL NULL 74

et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1

This is not perfect, but is much better: The product of the rows values is less by a factor of 74. This version executes in a couple of seconds.



A second alteration can be made to eliminate the column length mismatches for the tt.AssignedPC = et_1.EMPLOYID and tt.ClientID = do.CUSTNMBR comparisons:


et ALL PRIMARY NULL NULL NULL 74

tt ref AssignedPC, ActualPC 15 et.EMPLOYID 52

Using ClientID, where ActualPC

et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1

do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1



At this point, the query is optimized almost as well as possible.

The remaining problem is that, by default, MySQL assumes that values in the tt.ActualPC column are evenly distributed, and that is not the case for the tt table. It is easy to tell MySQL to analyze the key distribution using ANALYZE statement.

With the additional index information, the join is perfect:




tt ALL AssignedPC NULL NULL NULL 3872 Using ClientID, where ActualPC

et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1

et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1

do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1


Estimating Query Performance

You can estimate query performance by counting disk seeks.– For small tables, you can usually find a row in one disk

seek (because the index is probably cached). – For bigger tables, you can estimate that, using B-tree

indexes, you need this many seeks to find a row:

log(row_count) / log(index_block_length / 3 * 2 / (index_length +data_pointer_length)) + 1.


Estimating Query Performance

In MySQL, an index block is usually 1,024 bytes and the data

pointer is usually 4 bytes. For a 500,000-row table with a key

value length of 3 bytes (the size of MEDIUMINT), the formula

indicates: log(500,000)/log(1024/3*2/(3+4)) + 1 = 4 seeks.

This index would require storage of about 500,000 * 7 * 3/2 =

5.2MB (assuming a typical index buffer fill ratio of 2/3), so

you probably have much of the index in memory and so need

only one or two calls to read data to find the row.

For writes, however, you need four seek requests to find

where to place a new index value and normally two seeks to

update the index and write the row.


Measuring Performance Performance depending on so many different factors

that a difference of a few percentage points might not be a decisive victory. – The results might shift the opposite way when you test in a

different environment.

Certain MySQL features help or do not help performance depending on the workload. – For completeness, always test performance with those

features turned on and turned off.


Measuring Performance To measure the speed of a specific MySQL

expression or function, invoke the BENCHMARK() function using the mysql client program as follow:

BENCHMARK(loop_count,expression).

Example:

SELECT BENCHMARK(1000000,1+1); If we use a Pentium II 400MHz system, the result

shows that MySQL can execute 1,000,000 simple addition expressions in 0.32 seconds on that system.


Internal Details of MySQL Optimizations


Internal Details of MySQL Optimizations IS NULL Optimization LEFT JOIN and RIGHT JOIN Optimization Nested-Loop Join Algorithms DISTINCT Optimization Optimizing IN/=ANY Subqueries …

Read more at

http://dev.mysql.com/doc/refman/5.5/en/optimization-internals.html


IS NULL Optimization If a WHERE clause includes a col_name IS NULL

condition for a column that is declared as NOT NULL, that expression is optimized away. – This optimization does not occur in cases when the

column might produce NULL anyway; for example, if it comes from a table on the right side of a LEFT JOIN.

MySQL can also optimize the combination (col_name = expr OR col_name IS NULL), a form that is common in resolved subqueries. – EXPLAIN shows ref_or_null when this optimization is

used.


IS NULL Optimization

Examples of queries that are optimized, assuming that there is an index on columns a and b of table t2:– SELECT * FROM t1 WHERE t1.a=expr OR t1.a IS NULL;

– SELECT * FROM t1, t2 WHERE t1.a=t2.a OR t2.a IS NULL;

– SELECT * FROM t1, t2

WHERE (t1.a=t2.a OR t2.a IS NULL) AND t2.b=t1.b;


WHERE t1.a=t2.a AND (t2.b=t1.b OR t2.b IS NULL);


WHERE (t1.a=t2.a AND t2.a IS NULL AND ...)

OR (t1.a=t2.a AND t2.a IS NULL AND ...);


IS NULL Optimization ref_or_null works by first doing a read on the

reference key, and then a separate search for rows with a NULL key value.

Note that the optimization can handle only one IS NULL level. In the following query, MySQL uses key lookups only on the expression (t1.a=t2.a AND t2.a IS NULL) and is not able to use the key part on b:

SELECT * FROM t1, t2

WHERE (t1.a=t2.a AND t2.a IS NULL)

OR (t1.b=t2.b AND t2.b IS NULL);


LEFT JOIN and RIGHT JOIN Optimization The join optimizer calculates the order in which tables

should be joined. – The table read order forced by LEFT JOIN or

STRAIGHT_JOIN helps the join optimizer do its work much more quickly, because there are fewer table permutations to check.

Example:SELECT * FROM a JOIN b LEFT JOIN c ON (c.key=a.key)

LEFT JOIN d ON (d.key=a.key) WHERE b.key=d.key;

– MySQL will do a full scan on b because the LEFT JOIN forces it to be read before d.


LEFT JOIN and RIGHT JOIN Optimization The fix in this example is reverse the order in

which a and b are listed in the FROM clause:

SELECT * FROM a JOIN b LEFT JOIN c ON (c.key=a.key)

LEFT JOIN d ON (d.key=a.key) WHERE b.key=d.key;

SELECT * FROM b JOIN a LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key) WHERE b.key=d.key;


LEFT JOIN and RIGHT JOIN Optimization For a LEFT JOIN, if the WHERE condition is always false for

the generated NULL row, the LEFT JOIN is changed to a

normal join. For example, the WHERE clause would be false in

the following query if t2.column1 were NULL:

SELECT * FROM t1 LEFT JOIN t2 ON (column1)

WHERE t2.column2=5;

Therefore, it is safe to convert the query to a normal join:

SELECT * FROM t1, t2 WHERE t2.column2=5 AND

t1.column1=t2.column1;

This can be made faster because MySQL can use table t2

before table t1 if doing so would result in a better query plan.


Nested-Loop Join Algorithms (NLJ)

MySQL executes joins between tables using a nested-loop algorithm or variations on it.

Assume that a join between three tables t1, t2, and t3 is to be executed using the following join types:

Table Join_Type

t1 range

t2 ref

t3 ALL.


Nested-Loop Join Algorithms (NLJ) If a simple NLJ algorithm is used, the join is

processed like this:

for each row in t1 matching range {

for each row in t2 matching reference key {

for each row in t3 {

if row satisfies join conditions,

send to client

}

}

}



A Block Nested-Loop (BNL) join algorithm uses buffering of rows read in outer loops to reduce the number of times that tables in inner loops must be read.

For example, if 10 rows are read into a buffer and the buffer is passed to the next inner loop, each row read in the inner loop can be compared against all 10 rows in the buffer. The reduces the number of times the inner table must be read by an order of magnitude.


Nested-Loop Join Algorithms (NLJ)for each row in t1 matching range {

for each row in t2 matching reference key {

store used columns from t1, t2 in join buffer

if buffer is full {


for each t1, t2 combination in join buffer {


send to client

}

}

empty buffer

}

}

}

if buffer is not empty {


for each t1, t2 combination in join buffer {


send to client

}

}

}



S: the size of each stored t1, t2 combination C: the number of combinations in the buffer

The number of times table t3 is scanned is:

(S * C)/join_buffer_size + 1

The number of t3 scans decreases as the value of join_buffer_size increases, up to the point when join_buffer_size is large enough to hold all previous row combinations. At that point, there is no speed to be gained by making it larger.


Optimizing IN/=ANY Subqueries To help the query optimizer better execute your

queries, use these tips:– A column must be declared as NOT NULL if it really is.

(This also helps other aspects of the optimizer.)– If you don't need to distinguish a NULL from FALSE

subquery result, you can easily avoid the slow execution path. Replace a comparison that looks like this:

outer_expr IN (SELECT inner_expr FROM ...)

with this expression:

(outer_expr IS NOT NULL) AND (outer_expr IN (SELECT inner_expr…


Optimizing IN/=ANY Subqueries

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)

MySQL evaluates queries “from outside to inside.” – It first obtains the value of the outer expression

outer_expr, and then runs the subquery and captures the rows that it produces.

A very useful optimization is to “inform” the subquery that the only rows of interest are those where the inner expression inner_expr is equal to outer_expr. This is done by pushing down an appropriate equality into the subquery's WHERE clause.


Optimizing IN/=ANY Subqueries The comparison is converted to this:

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)

EXISTS (SELECT 1 FROM ...

WHERE subquery_where AND outer_expr=inner_expr)

After the conversion, MySQL can use the pushed-down equality to limit the number of rows that it must examine when evaluating the subquery.

6.3 my sql queryoptimization_part2

Technology