oracle database administration lecture 6 indexes, optimizer, hints
TRANSCRIPT
Oracle Database Administration
Lecture 6
Indexes, Optimizer, Hints
Indexes in Oracle• Indexes are used to:
provide faster access to data help enforce primary key and unique constraints help enforce foreign key constraints
• Index types: B-Tree indexes (default) Bitmap indexes
Indexes in Oracle• B-Tree Index can be:
unique – each index value except NULL must be unique
non unique single column – one column is indexed multiple column (compound index) – multiple
columns are indexed• NULL values in B-Tree index are ignored (are
not indexed)
Compound indexes• Column values are combined together in the
order they appear in CREATE INDEX statement
Statements:CREATE INDEX IND1 ON EMP(ID, SALARY);CREATE INDEX IND2 ON EMP(SALARY,ID);
create two different indexes• Compound index value is NULL if all the
columns are NULL
Index usage• Index can be used by a SELECT statement to:
limit number of rows to be processed by the WHERE clause
order the results of the SELECT statement join two tables
• Index can be accessed in the following ways: unique index scan (0 or 1 rows) non unique index scan (any number of rows) index range scan (any number of rows)
Oracle optimizer• Many statements can be executed in a different
way• Each statement executed is analyzed by the
Oracle optimizer• Oracle optimizer creates execution plan for the
statement
Oracle optimizer• Execution plan defines:
what tables to access and how to access them (access path)
the order of operations (join order) join method / methods the optimizer can rewrite statement to different
one, as long as the results of the statement are the same
Access paths• Access path is a way of accessing table data. • Example access paths:
FULL – full table scan (all rows are searched) ROWID – table access by rowid (the fastest) INDEX_ASC, INDEX_DES – access table
using an index INDEX_FFS – index fast full scan INDEX_SS – index skip scan
Full table scan• Full table scan is relatively expensive• It is used when:
• entire table needs to be scanned (for example in SELECT * FROM table)• table is small (for example: 100 blocks)• there is no index that can be used• there are indexes, but expected number of rows is large and total cost of execution would be larger when using indexes
Full table scan example
• LIKE '%...' operator cannot use index• Table test1 contains 2000 rows• Optimizer estimates (guesses) that 100 rows will
be returned from the query
Full table scan example• SELECT * FROM emp WHERE gender = 'F'
• approximately 50% of the rows have gender = 'F'
• using an index is inefficient when 50% of the rows are to be searched – Oracle performs full table scan• this index has low selectivity – this column is not a good candidate for B-Tree index
Index range scan
• Index range scan uses index to search for multiple rows
• Index contains:• ROWID of the actual row• indexed values
• Index can be used to:• locate row using ROWID• get indexed value directly from the index
Index range scan example 1
• Index is used to get ROWID of the table row• ROWID retrieved from index is used to get data
from table data block
Index range scan example 2
• Since SALARY column exists in the index IDX1, value is retrieved directly from the index, without accessing the table
Index fast full scan
• Index fast full scan is sometimes used instead of full table scan
• Index FFS can be used when data to be retrieved in the query is in the index
• For FFS of compound indexes, the order of columns is not important
Join operations• When two tables are joined, the optimiser decides:
join order – which table will be accessed first, which second
join method – how the tables will be joined. Available join methods include:nested loopsmerge join (backward compatibility only)hash joinanti join, semi join
Nested loops join• For each row in the first table:
find all rows in the second table that match the where clause
Nested loops join
Nested loops join is most effective when:Number of nested executions is low – when
number of rows returned from the first statement is low
Nested statement can be executed quickly, for example using a unique index scan
Merge join• Sort rows in the first table by the join key• Sort rows in the second table by the join key• Merge sorted rows (in a single pass)• Merge sort is most effective when:
– Rows can be sorted effectively, for example when using index on the join column – that way rows are already sorted without additional effort
Hash join• Similar to merge join• Instead of sorting – hash table of all rows indexed
by the join key is used
Hash join
• Hash join is most effective:– When joining large sets of data that is not
sorted by the join key (otherwise merge join is useful)
Semi join• Statement with EXISTS clauseSELECT D.NAME FROM DEPT D WHERE EXISTS (SELECT * FROM EMP E WHERE D.ID = E.DEPT_ID)
SELECT D.NAME FROM DEPT D WHERE D.ID IN (SELECT E.DEPT_ID FROM EMP E WHERE E.SALARY > 2000)
Anti join• Statement with NOT EXISTS or NOT IN clauseSELECT D.NAME FROM DEPT D WHERE NOT EXISTS (SELECT * FROM EMP E WHERE D.ID = E.DEPT_ID)
SELECT D.NAME FROM DEPT D WHERE D.ID NOT IN (SELECT E.DEPT_ID FROM EMP E WHERE E.SALARY > 2000)
Execution plan• sample EXPLAIN PLAN command output:Rows Execution Plan-------- ---------------------------------------------------- 12 SORT AGGREGATE 2 SORT GROUP BY 76563 NESTED LOOPS 76575 NESTED LOOPS 19 TABLE ACCESS FULL CN_PAYRUNS_ALL 76570 TABLE ACCESS BY INDEX ROWID CN_POSTING_DETAILS_ALL 76570 INDEX RANGE SCAN (object id 178321) 76563 TABLE ACCESS BY INDEX ROWID CN_PAYMENT_WORKSHEETS_ALL11432983 INDEX RANGE SCAN (object id 186024)
Oracle optimizer•initialization parameter OPTIMIZER_MODE can take values:
first_rows first_rows_1, first_rows_10, first_rows_100,
first_rows_1000, all_rows, choose
Cost based optimizer• Cost based optimizer tries to estimate real cost of
executing statement (cpu, memory, io operations)• CBO uses statistics to estimate execution cost• The following types of statistics exist:
• table statistics• column statistics• index statistics• column histograms
Table statistics• Table statistics include:
• number of rows in a table• total number of blocks• number of free blocks• average row length• number of chained blocks
• Table statistics can be viewed in DBA_TABLES view
Column statistics• Column statistics include:
• minimum value• maximum value• number of distinct values• number of NULL values
• Table statistics can be viewed in DBA_TAB_COLS view
Index statistics• Index statistics include:
• BTree level• number of leaf blocks• number of distinct keys• average leaf blocks per key• average data blocks per key• number of rows
• Index statistics can be viewed in DBA_INDEXES view
Column histograms• Column histograms can be computed for columns
with non-uniform distribution• Example:
select * from emp where salary between 0 and 1000
• Without histogram Oracle estimates number of rows matching the where condition using:• total number of rows (table statistics)• minimum value of salary (column statistics)• maximum value of salary (column statistics)
Column histograms• With histograms Oracle can better estimate
number of rows that match WHERE condition• Histograms are not useful if:
• column distribution is uniform (normal statistics are enough)• column is accessed using bind variable:select * from emp where salary
between ? and ?
Gathering statistics• There are many ways to gather statistics:
• manualy for a table:ANALYZE TABLE employees COMPUTE STATISTICS;
ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 10 PERCENT
ANALYZE TABLE employees COMPUTE STATISTICS FOR ALL INDEXED COLUMNS
manually for a user:EXECUTE dbms_stats.gather_schema_stats( ownname => 'Username', method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO', cascade=>TRUE);
Automating statistics gathering• Oracle 9i and before:
create a job for computing statistics every day, week or month
• Oracle 10g and 11g: Oracle 10g and 11g automatically gather
statistics
Optimizer modes• RULE – force rule based optimizer• FIRST_ROWS – force cost based optimizer,
optimize statement based on time of returning the first row
• FIRST_ROWS_1, FIRST_ROWS_10, etc. – optimize statement based on time of returning first X rows
• ALL_ROWS - force cost based optimizer, optimize statement based on execution of entire statement
• CHOOSE – if statistics are present, use CBO, otherwise use RBO
Hints
Hints• SQL statements can have hints
hints help optimizer choose best access pathoptimizer can ignore hint(s) if:
hint does not make sense in the queryexample: using index unrelated to sort order or
join conditionhints are in conflict with each otherhints have incorrect syntax
Hints• Hints are included in SQL as comments:SELECT /*+ INDEX(employees emp_idx) */ * FROM employees WHERE id > 100;
SELECT /*+ INDEX(emp emp_idx) USE_NL(emp, dept)*/ * FROM employees emp, departments dept, WHERE emp.id > 100 AND emp.dept_id = dept.id
Hints start with /*+ Hints must immediately follow SELECT, UPDATE, etc.
Hints can be included in sub-selects and in views
Optimizer mode hints• Hint can change global optimizer mode:• ALL_ROWS • FIRST_ROWS• FIRST_ROWS_10, FIRST_ROWS_100, ... –
minimum time to return first 10, 100 rows• RULE• CHOOSE• Example
SELECT /*+ FIRST_ROWS */ * FROM EMPLOYEES ORDER BY ID ASC;
Access path hints• Access path is a way of accessing table data. The
following hints can be used:FULL – full table scan (all rows are searched)ROWID – table access by rowid (the fastest)CLUSTER – cluster scan (only for clustered
objects)HASH – hash scan for cluster objectsINDEX – access table using an indexINDEX_FFS – index fast full scan
Access path hints INDEX_ASC, INDEX_DESC INDEX_COMBINE – used for bitmap indexes NO_INDEX – disables specified index for a
query AND_EQUAL – merge several indexes
Other hints• Query transformation hints
USE_CONCAT – rewrite OR query to UNION ALL
MERGE – merge view into a query NO_MERGE – disable view merge
• Join order hints ORDERED – forces join of tables in the order
in which they appear in the FROM clause (very useful hint!)
Join operation hints• USE_NL – Nested loops• USE_MERGE – join two tables using sort-merge
join• USE_HASH – join two tables using hash join• LEADING – select table that is a first table in a
join order• HASH_AJ, MERGE_AJ, NL_AJ – hash, merge
or nested loops for anti join query• HASH_SJ, MERGE_SJ, NL_SJ – hash, merge or
nested loops for semi join query
SamplesSELECT /*+ use_hash(employees departments)*/ * FROM employees, departments WHERE employees.department_id =
departments.department_id;
SELECT * FROM departments WHERE exists (SELECT /*+ HASH_SJ*/ * FROM employees WHERE employees.department_id =
departments.department_id AND salary > 200000);
Optimizing SQL statements1. Check execution plan (EXPLAIN PLAN
command). If possible: update statistics create indexes provide optimizer hints
2. Check AUTOTRACE output
3. Optimize entire database
SQLPlus• SET TIMING ON:
shows execution time for each SQL statement SET AUTOTRACE ON:
shows detailed statistics for each statement:Statistics---------------------------------------------------------- 1 recursive calls 0 db block gets 1749 consistent gets 0 physical reads 0 redo size 395 bytes sent via SQL*Net to client 512 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
AUTOTRACE output• recursive calls – how many additional SQL
statements were executed• db block gets – how many blocks were
processed (read) in memory• consistent gets – how many blocks were
processed (read) in memory• physical reads – number of blocks read from
disk• redo size – how much redo was generated from
this statement
AUTOTRACE output• bytes sent and received via SQL*NET – amount
of network traffic generated by the statement• sorts in memory and on disk• number of rows processed (returned) from the
statement
Resource intensive statements• long running SELECT statements can have:
large number of recursive calls (complex subselect)
large number of consistent gets large number of physical reads large amount of data received over the network many sorts in memory and on disk
Resource intensive statements• long running UPDATE and DELETE statements
can have: large number of db block gets large number of physical reads lot of redo log generated
Solving typical problems• large number of physical reads can mean:
database memory cache is too small index should be created on one or more tables
• large amount of data received over the network: query results are processed on the client instead
on the server• many sorts (especially disk):
order by, group by is used on a very large table without an index. Create index on some columns
Solving typical problems• long running UPDATE or DELETE:
try splitting the statement into several transactions
execute the statement as batch statement at night, when database traffic is low
consider rebuilding the application check if the WHERE condition is not a
complex long running query, if so – optimize the query