tutorial of course project: distributed query engine jun wang( 王军 ) east main building 9-216...
TRANSCRIPT
Tutorial of Course Project: Distributed Query Engine
Jun Wang( 王军 )East Main Building [email protected]
Outline
Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A
23/4/19 DDB 2
Outline
Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A
23/4/19 DDB 3
Database Management
Compulsory Commands SELECT
Fragmentation Horizontal Fragmentation Vertical Fragmentation
23/4/19 DDB 4
Architecture
P2P Architecture
23/4/19 DDB 5
Query Processing
SELECT statement One table & multi-tables (JOIN) Types of operator in the predicate: >,<,=
Command Parsing Query Processing
General query tree Query tree optimization and reduction Network traffic optimization
23/4/19 DDB 6
User Interface
The user should be able to use the interface to interact with your Distributed Query Engine
Any type of interface Command Line Interface Application-based Interface Web-based Interface
Note: DO NOT focus on the interface design. The interface meets the requirements if: Let users input the commands Display the results and additional evaluation
metrics
23/4/19 DDB 7
System Outputs
The size of query result set The optimized query tree The time cost of query The communication cost of query
23/4/19 DDB 8
Documentation and Report Mid-term presentation
Design of the distributed database query engine Project work plan
Final report Architecture Query optimization method Implementation of communication protocols
System operation specification Instruction of installation, configuration, and
operation of the query engine
23/4/19 DDB 9
System Evaluation
Demonstration Time 16th Week
System Test Environment Operating system: Windows Local DBMS: MySQL
23/4/19 DDB 10
Outline
Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A
23/4/19 DDB 11
Dataset
We simulate a scenario of using distributed database systems.
In general, the followings are provided: The schema of a database (global tables) The fragmentation schemes The allocation
23/4/19 DDB 12
Dataset
23/4/19 DDB 13
Fragmentation
Horizontal Fragmentation
Vertical Fragmentation
23/4/19 DDB 14
Allocation
23/4/19 DDB 15
Overview of Query Processing
Decomposition and Localization Rewriting: a query an Algebra tree Reduction
Optimization Optimize the cost of data transfer
Execution Intermediate table storage and access The TOTAL response time after the user issues a
query
23/4/19 DDB 16
Decomposition and Localization
Evaluation Points: The elimination of useless fragmentations and
joins The global optimization of algebra tree
Example:
23/4/19 DDB 17
Decomposition and Localization
23/4/19 DDB 18
Optimization
Evaluation metric: The amounts (Bytes) of data transfer
You should provide the following information: The execution plan, where all operations as well
as data transfers should be listed in sequence. The amounts of each data transfer and the sum
of amounts of all transfers. Note that the amounts of data transfer is measured by data BYTES before compression (you can compress the transferred data if it is necessary).
23/4/19 DDB 19
Execution
Evaluation metric: total response time Total response time is the sum of
Time of input receiving Time of query processing (decomposition,
localization and optimization) Time of result display
23/4/19 DDB 20
Outline
Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A
23/4/19 DDB 21
Communication Protocols
Access Level Client-Server Protocols Server-Server Protocols
How to Design Communication Protocols Sync vs. Async Design of commands and responses
How to implement Communication Protocols Strong vs. Economy Techniques
23/4/19 DDB 22
Database Management
Global vs. Local Global Management Local Management
GDD Global Information of DDB Storage Issues
Local DBMS Recommendation MySQL
23/4/19 DDB 23
Query Processing
23/4/19 DDB 24
ClientClient A
Master site Optimize the query Formulate
execution plan Broadcast the plan
All sites Execute commands
from Master site Return results
B
C D
commands
The Crucial Points Global Optimization Global Execution
Formulation
Other Issues
SQL Statement Parser Multi-Thread Mechanism Query Tree Layout and Visualization
23/4/19 DDB 25
Outline
Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A
23/4/19 DDB 26
Demo
For References Only
Authors: Shoubin Kong 孔守斌 Jun Wang 王 军 FangQiang Yu 余芳强
23/4/19 DDB 27
Implementation Details
Programming Language: Java Local DBMS: MySQL Protocol: RMI
23/4/19 DDB 28
An Overview of the System
23/4/19 DDB 29
ClientClient
User
System
A
C
B
D
DDB Servers(P2P)
Deployment
Client: 127.0.0.1 Site server 1: 127.0.0.1:40001 Site server 2: 127.0.0.1:40002 Site server 3: 127.0.0.1:40003 Site server 4: 127.0.0.1:40004
23/4/19 DDB 30
Database Initialization
Use your self-defined commands to initialize the database: Define the 4 sites over 4 servers Create the database Create the tables Fragment the tables Allocation each fragmentation to sites
23/4/19 DDB 31
Commands
Define site Create table Fragment Allocate Import Insert / Delete Select
23/4/19 DDB 32
Summaries
Requirement Driven Perfect vs. Good Enough Comparative Advantage A Central Management Scheme to a
Distributed Project
23/4/19 DDB 33
Outline
Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A
23/4/19 DDB 34
Assignment : Fragmentation Q1:
Select SNO from PARTA, SUPPLYWhere PARTS.PNO = SUPPLY.PNO and
PARTS.PRICE<6000 Q2:
Select SNAME, PNO from SUPPLIER, SUPPLYWhere SUPPLIER.SNO = SUPPLY.SNO
and SUPPLIER.COUNTRY = “USA” Q3:
Select SNO, SNAME, COUNT(*) FROM SUPPLIER, SUPPLYWhere SUPPLIER.SNO = SUPPLY.SNO group by
SUPPLIER.SNO
23/4/19 DDB 35
Assignment : Fragmentation
The Set of Complete and Minimal Simple Predicates {PRICE < 6000, PRICE ≥ 6000,
COUNTRY = “USA”, COUNTRY ≠ “USA” }
23/4/19 DDB 36
Assignment : Fragmentation
PART – Horizontal Fragmentation PARTS1 = σprice<6000 PARTS
PARTS2 = σprice≥6000 PARTS
23/4/19 DDB 37
PARTS1
PNO PNAME PRICE
P3 VIDEO 5000
P4 HI-HI 3000
PARTS2
PNO PNAME PRICE
P1 PC 10000
P2 CAMERA 8000
Assignment : Fragmentation
SUPPLIER – Horizontal Fragmentation SUPPLIER1 = σcountry=“USA” SUPPLIER
SUPPLIER2 = σ country≠ “USA” SUPPLIER
23/4/19 DDB 38
SUPPLIER1
SNO SNAME COUNTRY
S1 SN1 USA
S6 SN6 USA
SUPPLIER2
SNO SNAME COUNTRY
S2 SN2 INDIA
S3 SN3 CHINA
S4 SN4 CHINA
S5 SN5 INDIA
Assignment : Fragmentation
SUPPLY – Derived Fragmentation SUPPLY 1 = (SUPPLY SUPPLIER1) PARTS1 SUPPLY 2 = (SUPPLY SUPPLIER1) PARTS2 SUPPLY 3 = (SUPPLY SUPPLIER2) PARTS1 SUPPLY 4 = (SUPPLY SUPPLIER2) PARTS2
23/4/19 DDB 39
SUPPLY1
SNO PNO QTY
S1 P3 70
S6 P4 96
SUPPLY2
SNO PNO QTY
S1 P1 60
S6 P2 70
SUPPLY3
S3 P3 55
S3 P4 96
SUPPLY4
S2 P2 60
S4 P2 65
Assignment : Allocation
a) Solution1
23/4/19 DDB 40
Assignment : Allocation
23/4/19 DDB 41
a) Solution2
Assignment : Allocation
b) Solution
23/4/19 DDB 42
23/4/19 CLUE 43
Q & A
Thank You!