jun wu, department of information technology, national pingtung institute of commerce. query...
TRANSCRIPT
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Query Processing in Database Systems資料庫系統之查詢處理
Jun Wu ( 吳卓俊 )Email: [email protected]
國立屏東商業技術學院 資訊科技系所Department of Information Technology
National Pingtung Institute of Commerce
February 27, 2008
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Outline
Part I: Introduction to Database Systems Database Query Processing
Part II: Multiprocessor QEP Scheduling Problem Query Execution Plan (QEP) Critical-Path-Based Approach
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Part I Introduction to Database
Systems
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Database
Database( 資料庫 )A collection of information organized in such a
way that a computer program can quickly select desired pieces of data.
It is managed by a powerful software for creating and managing large amounts of data efficiently and allowing it to persist over long periods of time, safely.
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Database Systems
Database Systems ( 資料庫系統 )Allow users to create new databases and specify their
structures.Give users the ability to query the data and modify
the data. Support the storage of very large amounts of data
over a long period of time, keeping it secure from accident or unauthorized use.
Control access to data from many users concurrently.
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Popular Database Systems
Commercial ProductsOracle Database – http://www.oracle.comIBM DB2 – http://www.ibm.com/db2Microsoft – http://www.microsoft.com
SQL ServerAccess
Open Source ProjectsMySQL – http://mysql.comPostgreSQL – http://www.postgreSQL.org
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Databases on the WWW
Google/YahooWikipedia( 維基百科 )YouTube/ 無名小站SourceForgeUrMap/Google MapBlogForum
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Data ModelsData Model( 資料模型 )
A theory or specification describing how a database is structured and manipulated.
Common Data ModelHierarchical ModelNetwork ModelRelational ModelEntity-Relationship ModelObject-Oriented ModelSemistructured Model (Such as XML)
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Relational Data Model
Relational Data ModelData are organized as a
collection of relations which are two-dimensional table.
Relational Database (RDB)A database system that based
on the relational data model. Today’s database system are
almost relational databases. Edgar Frank (Ted) Codd
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Example 1
Relations are tables. Their columns are headed by attributes, which describe the entries in the column. For instance, a relation named Accounts, recording bank accounts, their balance, and type might look like:
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Data Manipulation Language
Data Manipulation Language (DML)A computer language used by users or computer
programs to retrieve, insert, delete and update data in a database system.
The most popular DML is the structured query language (SQL).
Almost today’s database systems support SQL.
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Example 2
Simple Queries in SQLFind the balance of account 67890.
Find the savings accounts with negative balances.
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Query ProcessingQuery Processing is a
sequence of procedures that a database will perform to answer the user query.
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Query Processing
SCANNER, PARSER, and VALIDATOR
QUERY OPTIMIZER
RUNTIME DATABASEPROCESSOR
Results of the Query
Query in a high-level language (e.g., SQL)
Query Execution Plan (QEP)
Intermediate form
While most work on QEPs is on the derivation of “optimal” QEPs for queries, little work is done for the scheduling of QEPs subject to their partial order constraints.
Index NestedLoop Join
Merge-JoinIndex ScanZ
Sort Sort
Table ScanX
Table ScanY
Physical Operator
Partial Order
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Part IIMultiprocessor QEP Scheduling Problem
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Query Execution Plan (QEP)
Query Execution Plan (QEP) PO : a set of physical
operators : a partial order on PO
Table ScanX
Table ScanY
SortSort
MergeJoin
Index NestedLoop Join
Index ScanZ
Table ScanW
Index ScanZ
Index NestedLoop Join
Sort Table ScanV
NestedLoop Join
Table ScanY
Virtual Root
• QEP Structures– Tree-Structured– DAG-Structured
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Multiprocessor QEP Scheduling (MQEPS) Problem
MQEPS problem is NP-hard. This problem is equivalence to the
multiprocessor precedence constrained scheduling (MPCS) problem.
…
P1
P2
PM
schedule length
4
234
152
4
3
Query Scheduling
Schedule a given set of physical operators to multiprocessor subject to their partial orders to minimize the schedule length.
Objective:
3
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Best ResultsHomogeneous multiprocessor
environments List scheduling algorithm [1]
Approximation ratio: 2-1/M
Heterogeneous multiprocessor environments Speed-based list scheduling algorithm[2]
Approximation ratio: O(logM)
Pi
Pj
Pi
Pj
*: M is the number of processors in the system.
[1] R.L. Graham, “Bounds on multiprocessing timing anomalies”, SIAM Journal on Applied Mathematics, 17:263-269, 1969.[2] Fabian A. Chudak and David B. Shmoys, “Approximation algorithms for precedence-constrained scheduling problems on parallel machines that run at different speeds”, Journal of Algorithms, 30(2):581-590, February 1999.
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Our GoalFocus on the MQEPS problem with I/O
considerations for symmetric multiprocessor (SMP) database systems
Propose a joint scheduling algorithm To improve the operating parallelism of processors and
I/O subsystems. To provide approximation/competitive bounds
I/O Controller
Processor1
Processor2
ProcessorM
…
SharedMemory
Bus
I/O Subsystem
An SMP Database System
Disk 1…
Disk 2 Disk N
Query Execution Plan Scheduling with I/O considerations
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Motivation: I/O Considerations
Joint scheduling problem I/O activities are the bottleneck of system performance[3]
Join-based QEP scheduling[4]
Each join requires exactly two data pages Few results with approximation bounds are known
w
a
x y z
a
w x y z
relevant set of data pages
processor
I/O subsystem
Read data pages into the main memory
[3]: M. Murphy and M.-C. Shan, “Execution Plan Balancing”, In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 698-706, 1991.[4] M. Murphy and D. Rotem, “Processor Scheduling for Multiprocessor Joins”, In Proceedings of the IEEE Data Engineering Conference (ICDE), pp. 140-148, 1989.
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
MQEPS-I/O Problem
Given a QEP with its relevant data pages, the scheduling problem is to find a compatible schedule such that the schedule length is minimized.
Problem Definition
Strong NP-hard with the structure of the QEP being DAG
Be consistent with the partial order of physical operators Do not violate the environmental constraints (e.g., the number of processors in the system)
Compatible schedule
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
A Compatible Schedule Example
d
g
e f
cb
a
h i j l m n ok
h
f
g
d eb c a
i j k l m n o0 2 4 6 8 10 12 14 16 18 20
schedule length = 20
P1
P2
I/O time
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Our Approach: Critical-Path-Based Scheduling (CPS) Algorithm
Assign a unique ordered number to each physical operator and data page according to the critical path rule.
Processor Schedule
Whenever a processor is available, schedule a ready physical operator with minimum ordered number to it.
I/O Schedule
Whenever an I/O subsystem is available, read a unloaded data page with minimum ordered number into the main memory.
QEPINPUT:
OUTPUT:
PROCESSING:
Ordered Number Generator (OrdGen) Algorithm
Critical-Path-Based Scheduling (CPS) Algorithm
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
d
g
e
A CPS Schedule Example
f
cb
a
h i j l m n ok
1
2
3 4 57
6 8 9
10
11 1214
13
15
o
f
g d eb c a
h i l k m n j0 2 4 6 8 10 12 14 16 18 20
schedule length = 17
P1
P2
I/O1 3 4 5 7 11 12 14
9
2 6 10 8 1315
Locate the physical operator which has the maximum height.Assign a unique ordered number to every data page in its relevant set and itself in an increasing order.
OrdGen AlgorithmSTEP 1:
STEP 2:
Priority = Ordered Number
CPS AlgorithmPriority-driven scheduling algorithm
d
g
e f
cb
a
h i j l m n ok
1
2
3 4 57
6 8 9
10
11 1214
13
15
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Properties of the CPS Algorithm
On-line
M >=W
M < W
Tree-Structured DAG-Structured
Off-lineOptimal Optimal
3 – 2/M
3 – 2/M3 – 2/M
2
M : the number of processorsW : the width of a given QEP
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Performance Evaluation
Simulation Model of an SMP Database Number of processors: 2,4,8,16 I/O subsystem QEP Scheduler (The OrdGen and the CPS
algorithm)Experimental QEP’s
A TPC-C benchmark database Number of physical operators: 1,000-2,000
Metrics Performance Ratio:
(CPS schedule length) (Optimal schedule length)
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Experimental Results: Off-line Usageupper bound CPS
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Experimental Results: On-line Usage
upper bound CPS
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
Conclusion
Scheduling of QEPs with I/O considerations A physical operator can be executed only if all of its relevant
data pages are read into main memory. A more general model of physical operators is considered. A joint scheduling of physical-operator executions over
multiprocessors and activities over the I/O subsystem is needed. Critical-Path-Based Scheduling (CPS) Algorithm
The rationale behind the design of the CPS algorithm is to overlap the I/O activities for data access and the executions of physical operators to reduce the schedule length.
The approximation ratios of our proposed algorithm are shown. The capability of the proposed algorithm is verified by a series of
simulation experiments.
Query Execution Plan Scheduling with I/O considerations
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.
-The End-
~Thank You Very Much~