tutorial of course project: distributed query engine jun wang( 王军 ) east main building 9-216...

43
Tutorial of Course Project: Distributed Query Engine Jun Wang( 王王 ) East Main Building 9-216 18901291504 [email protected] u.cn

Upload: jared-gordon

Post on 26-Dec-2015

293 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Tutorial of Course Project: Distributed Query Engine

Jun Wang( 王军 )East Main Building [email protected]

Page 2: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Outline

Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A

23/4/19 DDB 2

Page 3: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Outline

Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A

23/4/19 DDB 3

Page 4: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Database Management

Compulsory Commands SELECT

Fragmentation Horizontal Fragmentation Vertical Fragmentation

23/4/19 DDB 4

Page 5: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Architecture

P2P Architecture

23/4/19 DDB 5

Page 6: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Query Processing

SELECT statement One table & multi-tables (JOIN) Types of operator in the predicate: >,<,=

Command Parsing Query Processing

General query tree Query tree optimization and reduction Network traffic optimization

23/4/19 DDB 6

Page 7: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

User Interface

The user should be able to use the interface to interact with your Distributed Query Engine

Any type of interface Command Line Interface Application-based Interface Web-based Interface

Note: DO NOT focus on the interface design. The interface meets the requirements if: Let users input the commands Display the results and additional evaluation

metrics

23/4/19 DDB 7

Page 8: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

System Outputs

The size of query result set The optimized query tree The time cost of query The communication cost of query

23/4/19 DDB 8

Page 9: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Documentation and Report Mid-term presentation

Design of the distributed database query engine Project work plan

Final report Architecture Query optimization method Implementation of communication protocols

System operation specification Instruction of installation, configuration, and

operation of the query engine

23/4/19 DDB 9

Page 10: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

System Evaluation

Demonstration Time 16th Week

System Test Environment Operating system: Windows Local DBMS: MySQL

23/4/19 DDB 10

Page 11: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Outline

Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A

23/4/19 DDB 11

Page 12: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Dataset

We simulate a scenario of using distributed database systems.

In general, the followings are provided: The schema of a database (global tables) The fragmentation schemes The allocation

23/4/19 DDB 12

Page 13: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Dataset

23/4/19 DDB 13

Page 14: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Fragmentation

Horizontal Fragmentation

Vertical Fragmentation

23/4/19 DDB 14

Page 15: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Allocation

23/4/19 DDB 15

Page 16: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Overview of Query Processing

Decomposition and Localization Rewriting: a query an Algebra tree Reduction

Optimization Optimize the cost of data transfer

Execution Intermediate table storage and access The TOTAL response time after the user issues a

query

23/4/19 DDB 16

Page 17: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Decomposition and Localization

Evaluation Points: The elimination of useless fragmentations and

joins The global optimization of algebra tree

Example:

23/4/19 DDB 17

Page 18: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Decomposition and Localization

23/4/19 DDB 18

Page 19: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Optimization

Evaluation metric: The amounts (Bytes) of data transfer

You should provide the following information: The execution plan, where all operations as well

as data transfers should be listed in sequence. The amounts of each data transfer and the sum

of amounts of all transfers. Note that the amounts of data transfer is measured by data BYTES before compression (you can compress the transferred data if it is necessary).

23/4/19 DDB 19

Page 20: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Execution

Evaluation metric: total response time Total response time is the sum of

Time of input receiving Time of query processing (decomposition,

localization and optimization) Time of result display

23/4/19 DDB 20

Page 21: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Outline

Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A

23/4/19 DDB 21

Page 22: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Communication Protocols

Access Level Client-Server Protocols Server-Server Protocols

How to Design Communication Protocols Sync vs. Async Design of commands and responses

How to implement Communication Protocols Strong vs. Economy Techniques

23/4/19 DDB 22

Page 23: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Database Management

Global vs. Local Global Management Local Management

GDD Global Information of DDB Storage Issues

Local DBMS Recommendation MySQL

23/4/19 DDB 23

Page 24: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Query Processing

23/4/19 DDB 24

ClientClient A

Master site Optimize the query Formulate

execution plan Broadcast the plan

All sites Execute commands

from Master site Return results

B

C D

commands

The Crucial Points Global Optimization Global Execution

Formulation

Page 25: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Other Issues

SQL Statement Parser Multi-Thread Mechanism Query Tree Layout and Visualization

23/4/19 DDB 25

Page 26: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Outline

Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A

23/4/19 DDB 26

Page 27: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Demo

For References Only

Authors: Shoubin Kong 孔守斌 Jun Wang 王 军 FangQiang Yu 余芳强

23/4/19 DDB 27

Page 28: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Implementation Details

Programming Language: Java Local DBMS: MySQL Protocol: RMI

23/4/19 DDB 28

Page 29: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

An Overview of the System

23/4/19 DDB 29

ClientClient

User

System

A

C

B

D

DDB Servers(P2P)

Page 30: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Deployment

Client: 127.0.0.1 Site server 1: 127.0.0.1:40001 Site server 2: 127.0.0.1:40002 Site server 3: 127.0.0.1:40003 Site server 4: 127.0.0.1:40004

23/4/19 DDB 30

Page 31: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Database Initialization

Use your self-defined commands to initialize the database: Define the 4 sites over 4 servers Create the database Create the tables Fragment the tables Allocation each fragmentation to sites

23/4/19 DDB 31

Page 32: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Commands

Define site Create table Fragment Allocate Import Insert / Delete Select

23/4/19 DDB 32

Page 33: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Summaries

Requirement Driven Perfect vs. Good Enough Comparative Advantage A Central Management Scheme to a

Distributed Project

23/4/19 DDB 33

Page 34: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Outline

Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A

23/4/19 DDB 34

Page 35: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Assignment : Fragmentation Q1:

Select SNO from PARTA, SUPPLYWhere PARTS.PNO = SUPPLY.PNO and

PARTS.PRICE<6000 Q2:

Select SNAME, PNO from SUPPLIER, SUPPLYWhere SUPPLIER.SNO = SUPPLY.SNO

and SUPPLIER.COUNTRY = “USA” Q3:

Select SNO, SNAME, COUNT(*) FROM SUPPLIER, SUPPLYWhere SUPPLIER.SNO = SUPPLY.SNO group by

SUPPLIER.SNO

23/4/19 DDB 35

Page 36: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Assignment : Fragmentation

The Set of Complete and Minimal Simple Predicates {PRICE < 6000, PRICE ≥ 6000,

COUNTRY = “USA”, COUNTRY ≠ “USA” }

23/4/19 DDB 36

Page 37: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Assignment : Fragmentation

PART – Horizontal Fragmentation PARTS1 = σprice<6000 PARTS

PARTS2 = σprice≥6000 PARTS

23/4/19 DDB 37

PARTS1

PNO PNAME PRICE

P3 VIDEO 5000

P4 HI-HI 3000

PARTS2

PNO PNAME PRICE

P1 PC 10000

P2 CAMERA 8000

Page 38: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Assignment : Fragmentation

SUPPLIER – Horizontal Fragmentation SUPPLIER1 = σcountry=“USA” SUPPLIER

SUPPLIER2 = σ country≠ “USA” SUPPLIER

23/4/19 DDB 38

SUPPLIER1

SNO SNAME COUNTRY

S1 SN1 USA

S6 SN6 USA

SUPPLIER2

SNO SNAME COUNTRY

S2 SN2 INDIA

S3 SN3 CHINA

S4 SN4 CHINA

S5 SN5 INDIA

Page 39: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Assignment : Fragmentation

SUPPLY – Derived Fragmentation SUPPLY 1 = (SUPPLY SUPPLIER1) PARTS1 SUPPLY 2 = (SUPPLY SUPPLIER1) PARTS2 SUPPLY 3 = (SUPPLY SUPPLIER2) PARTS1 SUPPLY 4 = (SUPPLY SUPPLIER2) PARTS2

23/4/19 DDB 39

SUPPLY1

SNO PNO QTY

S1 P3 70

S6 P4 96

SUPPLY2

SNO PNO QTY

S1 P1 60

S6 P2 70

SUPPLY3

S3 P3 55

S3 P4 96

SUPPLY4

S2 P2 60

S4 P2 65

Page 40: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Assignment : Allocation

a) Solution1

23/4/19 DDB 40

Page 41: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Assignment : Allocation

23/4/19 DDB 41

a) Solution2

Page 42: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

Assignment : Allocation

b) Solution

23/4/19 DDB 42

Page 43: Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

23/4/19 CLUE 43

Q & A

Thank You!