jun wu, department of information technology, national pingtung institute of commerce. query...

30
Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資資資資資資資資資資 Jun Wu ( 吳吳吳 ) Email: [email protected] 吳吳吳吳吳吳吳吳吳吳 吳吳吳吳吳吳 Department of Information Technology National Pingtung Institute of Commerce February 27, 2008

Upload: margaretmargaret-daniel

Post on 28-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Query Processing in Database Systems資料庫系統之查詢處理

Jun Wu ( 吳卓俊 )Email: [email protected]

國立屏東商業技術學院 資訊科技系所Department of Information Technology

National Pingtung Institute of Commerce

February 27, 2008

Page 2: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Outline

Part I: Introduction to Database Systems Database Query Processing

Part II: Multiprocessor QEP Scheduling Problem Query Execution Plan (QEP) Critical-Path-Based Approach

Page 3: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Part I Introduction to Database

Systems

Page 4: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Database

Database( 資料庫 )A collection of information organized in such a

way that a computer program can quickly select desired pieces of data.

It is managed by a powerful software for creating and managing large amounts of data efficiently and allowing it to persist over long periods of time, safely.

Page 5: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Database Systems

Database Systems ( 資料庫系統 )Allow users to create new databases and specify their

structures.Give users the ability to query the data and modify

the data. Support the storage of very large amounts of data

over a long period of time, keeping it secure from accident or unauthorized use.

Control access to data from many users concurrently.

Page 6: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Popular Database Systems

Commercial ProductsOracle Database – http://www.oracle.comIBM DB2 – http://www.ibm.com/db2Microsoft – http://www.microsoft.com

SQL ServerAccess

Open Source ProjectsMySQL – http://mysql.comPostgreSQL – http://www.postgreSQL.org

Page 7: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Databases on the WWW

Google/YahooWikipedia( 維基百科 )YouTube/ 無名小站SourceForgeUrMap/Google MapBlogForum

Page 8: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Data ModelsData Model( 資料模型 )

A theory or specification describing how a database is structured and manipulated.

Common Data ModelHierarchical ModelNetwork ModelRelational ModelEntity-Relationship ModelObject-Oriented ModelSemistructured Model (Such as XML)

Page 9: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Relational Data Model

Relational Data ModelData are organized as a

collection of relations which are two-dimensional table.

Relational Database (RDB)A database system that based

on the relational data model. Today’s database system are

almost relational databases. Edgar Frank (Ted) Codd

Page 10: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Example 1

Relations are tables. Their columns are headed by attributes, which describe the entries in the column. For instance, a relation named Accounts, recording bank accounts, their balance, and type might look like:

Page 11: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Data Manipulation Language

Data Manipulation Language (DML)A computer language used by users or computer

programs to retrieve, insert, delete and update data in a database system.

The most popular DML is the structured query language (SQL).

Almost today’s database systems support SQL.

Page 12: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Example 2

Simple Queries in SQLFind the balance of account 67890.

Find the savings accounts with negative balances.

Page 13: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Query ProcessingQuery Processing is a

sequence of procedures that a database will perform to answer the user query.

Page 14: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Query Processing

SCANNER, PARSER, and VALIDATOR

QUERY OPTIMIZER

RUNTIME DATABASEPROCESSOR

Results of the Query

Query in a high-level language (e.g., SQL)

Query Execution Plan (QEP)

Intermediate form

While most work on QEPs is on the derivation of “optimal” QEPs for queries, little work is done for the scheduling of QEPs subject to their partial order constraints.

Index NestedLoop Join

Merge-JoinIndex ScanZ

Sort Sort

Table ScanX

Table ScanY

Physical Operator

Partial Order

Page 15: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Part IIMultiprocessor QEP Scheduling Problem

Page 16: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Query Execution Plan (QEP)

Query Execution Plan (QEP) PO : a set of physical

operators : a partial order on PO

Table ScanX

Table ScanY

SortSort

MergeJoin

Index NestedLoop Join

Index ScanZ

Table ScanW

Index ScanZ

Index NestedLoop Join

Sort Table ScanV

NestedLoop Join

Table ScanY

Virtual Root

• QEP Structures– Tree-Structured– DAG-Structured

Page 17: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Multiprocessor QEP Scheduling (MQEPS) Problem

MQEPS problem is NP-hard. This problem is equivalence to the

multiprocessor precedence constrained scheduling (MPCS) problem.

P1

P2

PM

schedule length

4

234

152

4

3

Query Scheduling

Schedule a given set of physical operators to multiprocessor subject to their partial orders to minimize the schedule length.

Objective:

3

Page 18: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Best ResultsHomogeneous multiprocessor

environments List scheduling algorithm [1]

Approximation ratio: 2-1/M

Heterogeneous multiprocessor environments Speed-based list scheduling algorithm[2]

Approximation ratio: O(logM)

Pi

Pj

Pi

Pj

*: M is the number of processors in the system.

[1] R.L. Graham, “Bounds on multiprocessing timing anomalies”, SIAM Journal on Applied Mathematics, 17:263-269, 1969.[2] Fabian A. Chudak and David B. Shmoys, “Approximation algorithms for precedence-constrained scheduling problems on parallel machines that run at different speeds”, Journal of Algorithms, 30(2):581-590, February 1999.

Page 19: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Our GoalFocus on the MQEPS problem with I/O

considerations for symmetric multiprocessor (SMP) database systems

Propose a joint scheduling algorithm To improve the operating parallelism of processors and

I/O subsystems. To provide approximation/competitive bounds

I/O Controller

Processor1

Processor2

ProcessorM

SharedMemory

Bus

I/O Subsystem

An SMP Database System

Disk 1…

Disk 2 Disk N

Query Execution Plan Scheduling with I/O considerations

Page 20: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Motivation: I/O Considerations

Joint scheduling problem I/O activities are the bottleneck of system performance[3]

Join-based QEP scheduling[4]

Each join requires exactly two data pages Few results with approximation bounds are known

w

a

x y z

a

w x y z

relevant set of data pages

processor

I/O subsystem

Read data pages into the main memory

[3]: M. Murphy and M.-C. Shan, “Execution Plan Balancing”, In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 698-706, 1991.[4] M. Murphy and D. Rotem, “Processor Scheduling for Multiprocessor Joins”, In Proceedings of the IEEE Data Engineering Conference (ICDE), pp. 140-148, 1989.

Page 21: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

MQEPS-I/O Problem

Given a QEP with its relevant data pages, the scheduling problem is to find a compatible schedule such that the schedule length is minimized.

Problem Definition

Strong NP-hard with the structure of the QEP being DAG

Be consistent with the partial order of physical operators Do not violate the environmental constraints (e.g., the number of processors in the system)

Compatible schedule

Page 22: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

A Compatible Schedule Example

d

g

e f

cb

a

h i j l m n ok

h

f

g

d eb c a

i j k l m n o0 2 4 6 8 10 12 14 16 18 20

schedule length = 20

P1

P2

I/O time

Page 23: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Our Approach: Critical-Path-Based Scheduling (CPS) Algorithm

Assign a unique ordered number to each physical operator and data page according to the critical path rule.

Processor Schedule

Whenever a processor is available, schedule a ready physical operator with minimum ordered number to it.

I/O Schedule

Whenever an I/O subsystem is available, read a unloaded data page with minimum ordered number into the main memory.

QEPINPUT:

OUTPUT:

PROCESSING:

Ordered Number Generator (OrdGen) Algorithm

Critical-Path-Based Scheduling (CPS) Algorithm

Page 24: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

d

g

e

A CPS Schedule Example

f

cb

a

h i j l m n ok

1

2

3 4 57

6 8 9

10

11 1214

13

15

o

f

g d eb c a

h i l k m n j0 2 4 6 8 10 12 14 16 18 20

schedule length = 17

P1

P2

I/O1 3 4 5 7 11 12 14

9

2 6 10 8 1315

Locate the physical operator which has the maximum height.Assign a unique ordered number to every data page in its relevant set and itself in an increasing order.

OrdGen AlgorithmSTEP 1:

STEP 2:

Priority = Ordered Number

CPS AlgorithmPriority-driven scheduling algorithm

d

g

e f

cb

a

h i j l m n ok

1

2

3 4 57

6 8 9

10

11 1214

13

15

Page 25: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Properties of the CPS Algorithm

On-line

M >=W

M < W

Tree-Structured DAG-Structured

Off-lineOptimal Optimal

3 – 2/M

3 – 2/M3 – 2/M

2

M : the number of processorsW : the width of a given QEP

Page 26: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Performance Evaluation

Simulation Model of an SMP Database Number of processors: 2,4,8,16 I/O subsystem QEP Scheduler (The OrdGen and the CPS

algorithm)Experimental QEP’s

A TPC-C benchmark database Number of physical operators: 1,000-2,000

Metrics Performance Ratio:

(CPS schedule length) (Optimal schedule length)

Page 27: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Experimental Results: Off-line Usageupper bound CPS

Page 28: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Experimental Results: On-line Usage

upper bound CPS

Page 29: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

Conclusion

Scheduling of QEPs with I/O considerations A physical operator can be executed only if all of its relevant

data pages are read into main memory. A more general model of physical operators is considered. A joint scheduling of physical-operator executions over

multiprocessors and activities over the I/O subsystem is needed. Critical-Path-Based Scheduling (CPS) Algorithm

The rationale behind the design of the CPS algorithm is to overlap the I/O activities for data access and the executions of physical operators to reduce the schedule length.

The approximation ratios of our proposed algorithm are shown. The capability of the proposed algorithm is verified by a series of

simulation experiments.

Query Execution Plan Scheduling with I/O considerations

Page 30: Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce. Query Processing in Database Systems 資料庫系統之查詢處理 Jun Wu ( 吳卓俊 ) Email:

Jun Wu, Department of Information Technology, National Pingtung Institute of Commerce.

-The End-

~Thank You Very Much~