oracle database administration lecture 6 indexes, optimizer, hints

Oracle Database Administration

Lecture 6

Indexes, Optimizer, Hints

Indexes in Oracle• Indexes are used to:

provide faster access to data help enforce primary key and unique constraints help enforce foreign key constraints

• Index types: B-Tree indexes (default) Bitmap indexes

Indexes in Oracle• B-Tree Index can be:

unique – each index value except NULL must be unique

non unique single column – one column is indexed multiple column (compound index) – multiple

columns are indexed• NULL values in B-Tree index are ignored (are

not indexed)

Compound indexes• Column values are combined together in the

order they appear in CREATE INDEX statement

Statements:CREATE INDEX IND1 ON EMP(ID, SALARY);CREATE INDEX IND2 ON EMP(SALARY,ID);

create two different indexes• Compound index value is NULL if all the

columns are NULL

Index usage• Index can be used by a SELECT statement to:

limit number of rows to be processed by the WHERE clause

order the results of the SELECT statement join two tables

• Index can be accessed in the following ways: unique index scan (0 or 1 rows) non unique index scan (any number of rows) index range scan (any number of rows)

Oracle optimizer• Many statements can be executed in a different

way• Each statement executed is analyzed by the

Oracle optimizer• Oracle optimizer creates execution plan for the

statement

Oracle optimizer• Execution plan defines:

what tables to access and how to access them (access path)

the order of operations (join order) join method / methods the optimizer can rewrite statement to different

one, as long as the results of the statement are the same

Access paths• Access path is a way of accessing table data. • Example access paths:

FULL – full table scan (all rows are searched) ROWID – table access by rowid (the fastest) INDEX_ASC, INDEX_DES – access table

using an index INDEX_FFS – index fast full scan INDEX_SS – index skip scan

Full table scan• Full table scan is relatively expensive• It is used when:

• entire table needs to be scanned (for example in SELECT * FROM table)• table is small (for example: 100 blocks)• there is no index that can be used• there are indexes, but expected number of rows is large and total cost of execution would be larger when using indexes

Full table scan example

• LIKE '%...' operator cannot use index• Table test1 contains 2000 rows• Optimizer estimates (guesses) that 100 rows will

be returned from the query

Full table scan example• SELECT * FROM emp WHERE gender = 'F'

• approximately 50% of the rows have gender = 'F'

• using an index is inefficient when 50% of the rows are to be searched – Oracle performs full table scan• this index has low selectivity – this column is not a good candidate for B-Tree index

Index range scan

• Index range scan uses index to search for multiple rows

• Index contains:• ROWID of the actual row• indexed values

• Index can be used to:• locate row using ROWID• get indexed value directly from the index

Index range scan example 1

• Index is used to get ROWID of the table row• ROWID retrieved from index is used to get data

from table data block

Index range scan example 2

• Since SALARY column exists in the index IDX1, value is retrieved directly from the index, without accessing the table

Index fast full scan

• Index fast full scan is sometimes used instead of full table scan

• Index FFS can be used when data to be retrieved in the query is in the index

• For FFS of compound indexes, the order of columns is not important

Join operations• When two tables are joined, the optimiser decides:

join order – which table will be accessed first, which second

join method – how the tables will be joined. Available join methods include:nested loopsmerge join (backward compatibility only)hash joinanti join, semi join

Nested loops join• For each row in the first table:

find all rows in the second table that match the where clause

Nested loops join

Nested loops join is most effective when:Number of nested executions is low – when

number of rows returned from the first statement is low

Nested statement can be executed quickly, for example using a unique index scan

Merge join• Sort rows in the first table by the join key• Sort rows in the second table by the join key• Merge sorted rows (in a single pass)• Merge sort is most effective when:

– Rows can be sorted effectively, for example when using index on the join column – that way rows are already sorted without additional effort

Hash join• Similar to merge join• Instead of sorting – hash table of all rows indexed

by the join key is used

Hash join

• Hash join is most effective:– When joining large sets of data that is not

sorted by the join key (otherwise merge join is useful)

Semi join• Statement with EXISTS clauseSELECT D.NAME FROM DEPT D WHERE EXISTS (SELECT * FROM EMP E WHERE D.ID = E.DEPT_ID)

SELECT D.NAME FROM DEPT D WHERE D.ID IN (SELECT E.DEPT_ID FROM EMP E WHERE E.SALARY > 2000)

Anti join• Statement with NOT EXISTS or NOT IN clauseSELECT D.NAME FROM DEPT D WHERE NOT EXISTS (SELECT * FROM EMP E WHERE D.ID = E.DEPT_ID)

SELECT D.NAME FROM DEPT D WHERE D.ID NOT IN (SELECT E.DEPT_ID FROM EMP E WHERE E.SALARY > 2000)

Execution plan• sample EXPLAIN PLAN command output:Rows Execution Plan-------- ---------------------------------------------------- 12 SORT AGGREGATE 2 SORT GROUP BY 76563 NESTED LOOPS 76575 NESTED LOOPS 19 TABLE ACCESS FULL CN_PAYRUNS_ALL 76570 TABLE ACCESS BY INDEX ROWID CN_POSTING_DETAILS_ALL 76570 INDEX RANGE SCAN (object id 178321) 76563 TABLE ACCESS BY INDEX ROWID CN_PAYMENT_WORKSHEETS_ALL11432983 INDEX RANGE SCAN (object id 186024)

Oracle optimizer•initialization parameter OPTIMIZER_MODE can take values:

first_rows first_rows_1, first_rows_10, first_rows_100,

first_rows_1000, all_rows, choose

Cost based optimizer• Cost based optimizer tries to estimate real cost of

executing statement (cpu, memory, io operations)• CBO uses statistics to estimate execution cost• The following types of statistics exist:

• table statistics• column statistics• index statistics• column histograms

Table statistics• Table statistics include:

• number of rows in a table• total number of blocks• number of free blocks• average row length• number of chained blocks

• Table statistics can be viewed in DBA_TABLES view

Column statistics• Column statistics include:

• minimum value• maximum value• number of distinct values• number of NULL values

• Table statistics can be viewed in DBA_TAB_COLS view

Index statistics• Index statistics include:

• BTree level• number of leaf blocks• number of distinct keys• average leaf blocks per key• average data blocks per key• number of rows

• Index statistics can be viewed in DBA_INDEXES view

Column histograms• Column histograms can be computed for columns

with non-uniform distribution• Example:

select * from emp where salary between 0 and 1000

• Without histogram Oracle estimates number of rows matching the where condition using:• total number of rows (table statistics)• minimum value of salary (column statistics)• maximum value of salary (column statistics)

Column histograms• With histograms Oracle can better estimate

number of rows that match WHERE condition• Histograms are not useful if:

• column distribution is uniform (normal statistics are enough)• column is accessed using bind variable:select * from emp where salary

between ? and ?

Gathering statistics• There are many ways to gather statistics:

• manualy for a table:ANALYZE TABLE employees COMPUTE STATISTICS;

ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 10 PERCENT

ANALYZE TABLE employees COMPUTE STATISTICS FOR ALL INDEXED COLUMNS

manually for a user:EXECUTE dbms_stats.gather_schema_stats( ownname => 'Username', method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO', cascade=>TRUE);

Automating statistics gathering• Oracle 9i and before:

create a job for computing statistics every day, week or month

• Oracle 10g and 11g: Oracle 10g and 11g automatically gather

statistics

Optimizer modes• RULE – force rule based optimizer• FIRST_ROWS – force cost based optimizer,

optimize statement based on time of returning the first row

• FIRST_ROWS_1, FIRST_ROWS_10, etc. – optimize statement based on time of returning first X rows

• ALL_ROWS - force cost based optimizer, optimize statement based on execution of entire statement

• CHOOSE – if statistics are present, use CBO, otherwise use RBO

Hints• SQL statements can have hints

hints help optimizer choose best access pathoptimizer can ignore hint(s) if:

hint does not make sense in the queryexample: using index unrelated to sort order or

join conditionhints are in conflict with each otherhints have incorrect syntax

Hints• Hints are included in SQL as comments:SELECT /*+ INDEX(employees emp_idx) */ * FROM employees WHERE id > 100;

SELECT /*+ INDEX(emp emp_idx) USE_NL(emp, dept)*/ * FROM employees emp, departments dept, WHERE emp.id > 100 AND emp.dept_id = dept.id

Hints start with /*+ Hints must immediately follow SELECT, UPDATE, etc.

Hints can be included in sub-selects and in views

Optimizer mode hints• Hint can change global optimizer mode:• ALL_ROWS • FIRST_ROWS• FIRST_ROWS_10, FIRST_ROWS_100, ... –

minimum time to return first 10, 100 rows• RULE• CHOOSE• Example

SELECT /*+ FIRST_ROWS */ * FROM EMPLOYEES ORDER BY ID ASC;

Access path hints• Access path is a way of accessing table data. The

following hints can be used:FULL – full table scan (all rows are searched)ROWID – table access by rowid (the fastest)CLUSTER – cluster scan (only for clustered

objects)HASH – hash scan for cluster objectsINDEX – access table using an indexINDEX_FFS – index fast full scan

Access path hints INDEX_ASC, INDEX_DESC INDEX_COMBINE – used for bitmap indexes NO_INDEX – disables specified index for a

query AND_EQUAL – merge several indexes

Other hints• Query transformation hints

USE_CONCAT – rewrite OR query to UNION ALL

MERGE – merge view into a query NO_MERGE – disable view merge

• Join order hints ORDERED – forces join of tables in the order

in which they appear in the FROM clause (very useful hint!)

Join operation hints• USE_NL – Nested loops• USE_MERGE – join two tables using sort-merge

join• USE_HASH – join two tables using hash join• LEADING – select table that is a first table in a

join order• HASH_AJ, MERGE_AJ, NL_AJ – hash, merge

or nested loops for anti join query• HASH_SJ, MERGE_SJ, NL_SJ – hash, merge or

nested loops for semi join query

SamplesSELECT /*+ use_hash(employees departments)*/ * FROM employees, departments WHERE employees.department_id =

departments.department_id;

SELECT * FROM departments WHERE exists (SELECT /*+ HASH_SJ*/ * FROM employees WHERE employees.department_id =

departments.department_id AND salary > 200000);

Optimizing SQL statements1. Check execution plan (EXPLAIN PLAN

command). If possible: update statistics create indexes provide optimizer hints

2. Check AUTOTRACE output

3. Optimize entire database

SQLPlus• SET TIMING ON:

shows execution time for each SQL statement SET AUTOTRACE ON:

shows detailed statistics for each statement:Statistics---------------------------------------------------------- 1 recursive calls 0 db block gets 1749 consistent gets 0 physical reads 0 redo size 395 bytes sent via SQL*Net to client 512 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed

AUTOTRACE output• recursive calls – how many additional SQL

statements were executed• db block gets – how many blocks were

processed (read) in memory• consistent gets – how many blocks were

processed (read) in memory• physical reads – number of blocks read from

disk• redo size – how much redo was generated from

this statement

AUTOTRACE output• bytes sent and received via SQL*NET – amount

of network traffic generated by the statement• sorts in memory and on disk• number of rows processed (returned) from the

statement

Resource intensive statements• long running SELECT statements can have:

large number of recursive calls (complex subselect)

large number of consistent gets large number of physical reads large amount of data received over the network many sorts in memory and on disk

Resource intensive statements• long running UPDATE and DELETE statements

can have: large number of db block gets large number of physical reads lot of redo log generated

Solving typical problems• large number of physical reads can mean:

database memory cache is too small index should be created on one or more tables

• large amount of data received over the network: query results are processed on the client instead

on the server• many sorts (especially disk):

order by, group by is used on a very large table without an index. Create index on some columns

Solving typical problems• long running UPDATE or DELETE:

try splitting the statement into several transactions

execute the statement as batch statement at night, when database traffic is low

consider rebuilding the application check if the WHERE condition is not a

complex long running query, if so – optimize the query

oracle database administration lecture 6 indexes, optimizer, hints

Documents