distributed dbmsuniversity of shanghai for science and technology page 2.1 分布式数据库设计...

Distributed DBMS University of Shanghai for Science and Technology Page 2.1

分布式数据库设计 A FRAMEWORK FOR DISTRIBUTED DAT

ABASE DESIGN （概述） THE DESIGN OF DATABASE FRAGMENT

ATION （分片设计） THE ALLOCATION OF FRAGMENTS （分

配设计）

Distributed DBMS University of Shanghai for Science and Technology .2

分布式系统设计的三维

Level of sharing 共享维不共享，数据共享，数据 + 程序共享

Access pattern behavior 访问模式维静态模式，动态模式（分布式数据库设计与查询处理）

Level of knowledge 访问模式知识维用户完全已知或部分已知访问模式


存取模式

知识级别

共享

数据+程序

数据

静态动态

部分信息

完整信息

Dimensions of the Problem


集中式数据库设计1. Designing the “conceptual

schema" which describes the integrated database

2. Designing the "physical database," i.e., mapping the conceptual schema to storage areas and determining appropriate access methods.


分布式数据库设计的特殊要求

+3. Designing the fragmentation.

+ 4. Designing the allocation of fragments, i.e. mapped to physical images; also the replication of fragments is determined.


关于分片和分配的几点注意 Fragmentation design been partially analyzed in centr

alized systems with multiple storage devices. The allocation problem has been studied as the "file all

ocation problem." The distinction between two problems is conceptually

relevantone deals with the "logical criteria" which motivate

the fragmentation of a global relation one deals with the "physical" placement of data at

the various sites. 这两个问题通常是相互关联的，不可能独立地解决它们

而能确定最优的 fragmentaion 和 allocation


关于 APPLICATION 考虑因素 :分布式数据库设计包括：分布式数据库设计和相应的分布式应

用设计1.The site from which the application is issued (site of origin

of the application).2.The frequency of activation of the application (i.e., 在单位

时间内被激活的次数 ); applications which can be issued at multiple sites, we need to know the frequency of activation of each application at each site.

3.The number, type, and the statistical distribution of accesses made by each application to each required data "object."


设计目标（ Objectives ）

Processing locality 数据处理的本地性 Availability and reliability of distributed

data 分布式数据的有效性和可靠性冗余控制 Workload distribution 工作负荷的合理分

布 Storage costs and availability 存储能力和

费用


Processing locality Maximize processing locality corresponds to the simple

principle of placing data as close as possible to the applications which use them.

Maximizing processing locality (minimizing remote references) can be done by adding the number of' local and remote references corresponding to each candidate fragmentation and fragment, allocation, and selecting the best solution among them.

The advantage of complete locality is not only the reduction of remote accesses, but also the increased simplicity in controlling the execution of the application.


Availability and reliability of distributed data

A high degree of availability for read-only applications is achieved by storing multiple copies of the same information; the system must be able to switch to an alternative copy when the one that should be accessed under normal conditions is not available.

Reliability is also achieved by storing multiple copies of the same information - possible to recover from crashes or from the physical destruction of one of the copies by using the other still available copies.


Workload distribution

An important feature of distributed computer systems.

To take advantage of the different powers or utilizations of computers at each site,

Maximize the degree of parallelism of execution of applications.

workload distribution might negatively affect processing locality - to consider the trade-off


Storage costs and availability

Should reflect the cost and availability of storage at the different sites.

It is possible to have specialized sites in the network for data storage, or conversely to have sites which do not support mass storage at all.

通常存储的费用并不是非常重要 (Compared to CPU,I/O, Transmission of network).


设计方法

Top-Down Approach 自顶向下 Bottom-Up Approach 自底向上


Top-down approach

已有 DB… 如何分割数据及如何分配这些数据到不同站点过程

start by designing the global schemadesigning the fragmentation of the databasethen by allocating the fragments to the sites, creatin

g the physical imagesThe approach is completed by performing, at each si

te, the "physical design" of the data which are allocated to it.

User Input

View Integration

User Input

RequirementsAnalysis

Objectives

ConceptualDesign View Design

AccessInformation ES’sGCS

DistributionDesign

PhysicalDesign

LCS’s

LIS’s

Top-Down Design


特点

能先看到雏形问题： When the distributed database is

developed as the aggregation of existing databases, it is not easy to follow the top-down approach. The global schema is often produced as a compromise between existing data descriptions.


Bottom-up approach

Existing databases are aggregated( 还可能是异构heterogeneous 或完全自治 autonomous) ，无设计问题 ( 信息集成 )!

Based on the integration of existing schemata into a single, global schema.

By integration, the merging of common data definitions and the resolution of conflicts among different representations given to the same data.


bottom-up approach

Horizontal fragments of a same global relation must have the same relation schema - easily enforced in a top-down design, while it is difficult to "discover" it. The integration process should attempt to modify the definitions of local relations, so that they can be regarded as horizontal fragments of a common, global relation.


bottom-up design requires( 异构情况下 )

The selection of a common database model for describing the global schema of the database.

The translation of each local schema into the common data model.

The integration of the local schemata into a common global schema.


DDB 设计的两个问题

FragmentaionHorizontal FragmentationVertical fragmentation

Allocation 通常分片设计和分配设计需要统筹考

虑


Horizontal Fragmentation

Primary fragmentation 初级分片 Derived horizontal fragmentation 导

出分片


水平分片原则

若 R F = {F1, F2, …, Fn}, 则完整性对于每一个元组 tR, FiF 使得

tFi 不相交性对 tFi, Fj 使得 tFj, i j可重构性操作是并 ( 可以忽略 , 因为完整性

就蕴含着 ) R = {F1, F2, …, Fn}


水平分片－例

例子EMP ( E#, NAME, DEPT, JOB, SAL, TEL, …) DEPT={1,2} JOB={‘P’, ‘-P’}假定，应用经常查询的内容是属于部门 1 且是程序

员的职员。（ 80/20 原则）

则可能有的水平分片限定（ Qualification ） P={ DEPT=1} P={DEPT=1, JOB=‘P’} P={DEPT=1, JOB=‘P’, SAL>500}


如何保证分片原则

“ 手工”检查 !e.g., F1 = loc=‘Sa’ E ; F2 = loc=‘Sb’ E

生成具有满足分段原则的 predicate 谓词


一些定义

谓词：用来执行分片选择操作的条件1.A simple predicate 简单谓词 :

Attribute = value eg. ： DEPT=12.A minterm predicate （小项谓词） y ：给定简单谓

词集 P= { p1, p2,.. pn },

y=pi P pi* 也既是 p1* p2* … pn*

where (pi* = pi or pi* = NOT pi) and y ≠ false

3.A fragment is the set of all tuples for which a minterm predicate holds.


谓词生成过程

找到常用的 AP 查询的 simple predicate （Ai Value ）诸如 : A<10, A>5, Loc = Sa, Loc = Sb

生成 “小项” 谓词

消除可能出现的无用谓词


Example

Global relation EMP (EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM)

Assume: some important APs require information about employees who are members of department; other important APs which require only the data of employees who are programmers; these last APs can be issued at any site, and reference all programmers with the same probability.

Assume : that there are only two departments, 1 and 2; thus, DEPT = 1 → DEPT≠ 2, and vice versa.

Two simple predicates are DEPT =1 and JOB = "P" (programmer). The minterm predicates for these two predicates are

DEPT = 1 AND JOB= "P" DEPT = 1 AND JOB ≠"P" DEPT ≠ 1 AND JOB= "P" DEPT≠ 1 AND JOB ≠"P"


讨论

All the above simple predicates are relevant

e.g. SAL > 50 is not a relevant predicate;


complete and minimal

Let P = {p1, p2, … , pn} be a set of simple predicates.

为了正确有效进行分片，则 P必须是 complete and minimal

1 . P of predicates is complete if any two tuples belonging to the same fragment are referenced with the same probability by any application.

2. P is minimal if all its predicates are relevant.


Example

P1 = {DEPT = 1} is not complete -the applications reference tuples of programmers with a greater probability within each fragment produced by P1 .

P2 ={DEPT = 1, JOB ="P" } is complete and minimal.

P3= {DEPT = 1, JOB = "P", SAL > 50} is complete but not minimal, since SAL > 50 is not relevant.


Fragmentation Method

Basis Consider a predicate p1 which partitions the tuples of R into two parts which are referenced differently by at least one application. Let P = {p1}

Method Consider a new simple predicate pi which partitions at leas

t one fragment of P into two parts which are referenced in a different way by at least one application.

Set P← Ppi. Eliminate nonrelevant predicates from P. Repeat this step until the set of the minterm fragments of P

is complete.


Example

Consider: SAL>50: if programmers have average salary greater than 50, it determines two sets of employees who are referenced differently by the applications. P1= { SAL > 50}

Consider: DEPT = 1; this predicate is relevant and is added to the previous one, P2 ={ SAL > 50, DEPT = 1}.

Consider: JOB = "P". The predicate is relevant, set P3={SAL > 50, DEPT = 1, JOB = "P" }.

then SAL > 50 is not relevant in P3, thus, the final set P4={DEPT = 1, JOB = "P" }, which is complete and minimal.


A "reasonable" way

1. Concentrating on a few important applications

2. Not distinguishing fragments whose features are very similar


DEPT (DEPTNUM, NAME, AREA, MGRNUM) important applications:

1.Administrative applications, issued only at sites 1 and 3; administrative applications about departments in the northern area are issued at site 1; those about departments in the southern area are issued at site 3.

2.Applications about work conducted at each department; they can be issued at any department, but they reference tuples of the departments which are closer to their site of origin with higher probability than the tuples of other departments.


Set of predicates:

P1: DEPTNUM < 10 P2: 10 < DEPTNUM < 20 P3: DEPTNUM > 20 P4: AREA = "North" P5: AREA = "South"


可能的谓词限定

Y1: DEPTNUM < 10 and AREA = "North“ Y2: DEPTNUM < 10 and AREA = "South“ Y3: 10 < DEPTNUM < 20 and AREA = "North“ Y4: 10 < DEPTNUM < 20 and AREA = "South“ Y5: DEPTNUM > 10 and AREA = "North“ Y6: DEPTNUM > 10 and AREA = "South“


Reduce, e.g. AREA = "North" implies that DEPNUM > 20

y1: DEPTNUM < 10 y2: (10 < DEPTNUM < 20) AND (AREA

= "North") y3: (10 < DEPTNUM < 20) AND (AREA

= "South") y4: DEPTNUM > 20


Derived Horizontal Fragmentation导出分片

DHF ：从另一个关系的属性性质或水平分片推导出来

采用 DHF 可以使分片之间的 join 操作更加容易


DHF 分片 example eg:SC(S#, C#, GRADE) S ( S#, SNAME. AGE, SEX) 分段设计

Define fragment SC1 as Select SC.S#,C#,GRADE From SC, S Where SC.S#=S.S# and SEX=‘M’ Define fragment SC2 as Select SC.S#,C#,GRADE From SC, S Where SC.S#=S.S# and SEX=‘F’


分布式数据库中的 join 连接操作

distributed join join graphs

TotalSimplepartitioned


Join graph

R S

圆圈：数据分片

无向边：两个分片之间有相同属性值的元组存在

连接图定义


Total Join graph

R S

完全连接图定义

A join graph is total when it contains all possible edges between fragments of R and S;


Partitioned Join graph

R S

部分连接图定义

A reduced join graph is partitioned if the graph is composed of two or more subgraphs without edges between them


Simple Join graph

R S

简单连接图定义

A reduced join graph is simple if it is partitioned and each subgraph has just one edge


General example (continued)

SUPPLY (SNUM, PNUM, DEPTNUM, QUAN) SUPPLY is always used together with another

relation Some applications require information about

supplies of given suppliers- join SUPPLY and SUPPLIER on the SNUM

attribute. The other applications require information about

supplies at a given department- join SUPPLY and DEPT on the DEPTNUM

attribute.


DEPT is horizontally fragmented according to values taken by the attribute DEPTNUM

SUPPLIER is horizontally fragmented according to values taken by the attribute SNUM.

There are two possible derived fragmentations SUPPLYone through the semi-join with SUPPLIER on SNUMone through the semi-join with DEPT on DEPTNUMboth of them are correct.

The selection between these alternatives should take into account which one of the two corresponding joins is more used by applications.


Vertical Fragmentation

Vertical Fragmentation Vertical Clustering

objective: 将某个 AP 频繁使用的属性聚集在一起，当有多个 APs 有时候需要权衡利弊。


Vertical Fragmentation

为一全局关系 R进行分片是不容易的 , 因为随着R 的属性数目增加，可能的分片数目也大幅度增加（ the number of possible clusters is even larger. ）

两种启发式方法 (heuristic approaches)The split approach in which global relations are progressi

vely split into fragments 分裂法The grouping approach in which attributes are progressive

ly aggregated to constitute fragments 成组法


General example (continued)

EMP(EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM)

APP1、 Administrative applications, concentrated at site 3, requiring NAME, SAL, and TAX of employees.

APP2、 Applications about work conducted at each department, requiring NAME, MGRNUM, and DEPTNUM of employees; these applications are issued at all sites, and reference tuples of employees in the same group of departments with 80 percent probability.


结果

EMP1(EMPNUM, NAME, TAX, SAL) EMP2(EMPNUM, NAME, MGRNUM, DEP

TNUM)


Mixed Fragmentation

the simplest ways : 1. Applying horizontal

fragmentation to vertical fragments

2. Applying vertical fragmentation to horizontal fragments


THE ALLOCATION OF FRAGMENTS

nonredundant allocation （ easier ）The simplest method is a “best-fit” （最佳适应） approach; a measure is associated with each possible allocation, and the site with the best measure is selected.

redundant allocationReplication introduces further complexity, 例如复

制程度，如何检索和更新等


讨论

在进行 redundant allocation 冗余分配时，通常先求 nonredundant allocation 非冗余分配的解，在此基础上再求 redundant allocation 冗余分配的解

The "additional replication" method is a typical heuristic approach; with this method, it is possible to take into account that the increase in the degree of redundancy is progressively less beneficial.


Two methods (for reduntant allocation) :

1. Determine the set of all sites where the benefit of allocating one copy of the fragment is higher than the cost, and allocate a copy of the fragment to each element of this set; this method selects “all beneficial sites.“ 所有得益站点法

2. Determine first the solution of the nonreplicated problem, and then progressively introduce replicated copies starting from the most beneficial; the process is terminated when no “additional replication” （附加复制法） is beneficial. 这种方法随着冗余度的增加而得益逐渐减少


HOW TO

Measure of Costs and Benefits of Fragment Allocation


General Criteria for Fragment Allocation

i is the fragment index j is the site index k is the application index fkj is the frequency of application k at site j rki is the number of retrieval references of application k

to fragment i uki is the number of update references of application k t

o fragment i nki = rki + uki


Horizontal fragmentation (nonredundatn)

1 Using the “best-fit” （最佳适应法） approach for a nonreplieated allocation, we place Ri at the site where the number of references to Ri is maximum. The number of local references of Ri at site j is

Bij =∑k fkj nki

Ri is allocated at site j* such that Bij* is maximum.


2. Using the "all beneficial sites" method for replicated allocation, Ri at all sites j where the cost of retrieval references of applications is larger than the cost of update references to Ri from applications at any other site.

Bij =∑k fkjrki - C * ∑k∑j’≠j fkj'uki

C is a constant, measures the ratio between the cost of an update and a retrieval access; typically, (C> 1).

Ri is allocated at all sites j* such that Bij is positive; when all Bij are negative, a single copy of Ri is placed at the site such that Bij is maximum.

redundant allocation approach I


3. Using the "additional replication", in terms of increased reliability and availability of the systemsystem. di ： degree of redundancy of RiFi ： the benefit-Ri fully replicated at each site

In [1] ： β(di)= (1 – 21-di)FiNote that, β(1) = 0, β(2)=Fi/2, β(3) = 3Fi/4, and so on.

Bij= ∑kfkjrki - C *∑k∑j’≠j fkj'uki +β(di)

[1]V. Lum et al., "1978 New Orleans Data Base Design Workshop Report," IBM Report PJ2554(33154), 7/13/79, IBM Pres. Lab., San Jose, CA, part of this report is also published in the Fifth VLDB, Pio de Janeiro, 1979.

redundant allocation approach II


Vertical fragmentation

1. As and At: set of applications, issued at sites s or t, which use only attributes of Rs or Rt

2. A1 ： set of applications local to r which use only attributes of Rs or Rt

3. A2 ： set of applications local to r which reference attributes of both Rs and Rt

4. A3 ： set of applications at sites different than r, s, or t We evaluate the benefit of this partitioning as

Bist=BAS+BAT-BA1-BA2-BA3

=∑kAs fksnki + ∑kAtfktnki -∑kA1fktnki -∑kA22X fkrnki -∑kA3∑jr,s,tfkjnk

i


DATAID-D 方法

分布式数据库设计阶段需求分析概念设计分布要求设计全局逻辑设计分布设计局部逻辑设计局部物理设计


DATAID-D 方法 - 续设计步骤

设计数据字典全局数据模式全局操作模式简化全局模式逻辑访问表各站点逻辑模式各站点访问表局部逻辑模式局部物理模式


DATAID-D 方法 - 续

分布要求分析阶段收集关于分布的信息 , 如水平分段的划分谓词 ,

每个应用在各站点激活的频率等 . 分布设计阶段从全局模式规格说明和所收集的分布要求开始 , 产生全局数据的分段模式和片段的位置分配模式


DATAID-D 方法 - 续分布要求分析阶段

频率表各站点上每一应用激活次数划分表可用于模式中各实体的潜在水平分片规则极化表指明由一个站点发出的一给定应用访问一给定片段的频率

分布设计阶段分片设计非冗余分配冗余分配局部模式的重新构造


实例研究 - 飞机订票系统

三个应用订票应用登记应用起飞应用


从到

机场

登记起飞时间到达时间

符号

城市

权力

区域

安全规则

座位号检查行李

班机订票

旅客

机号

日期

可用座位

进入口

座位图

延期

种类

名字电话

飞机订票数据库全局模式


班机2000 3

机场40 2

旅客10000 1

从到

订票

日期 [k]

起飞时间 [k]

符号 [k]

到达

时间 [k] 名字 [w] 电话 [w]

可用座位[o,w]

种类 [w]

全局操作模式 (订票 ) 旅客订票时激活


分布结果

机场实体：基于区域的水平分段机场 1 ，机场 2 ，机场 3

班机实体：基于起飞机场的导出水平分段班机 1 ，班机 2 ，班机 3

旅客实体：基于旅客预定的所有班机起飞的导出水平分段

旅客 1 ，旅客 2 ，旅客 3 ，旅客 4，旅客 5，

旅客 6，旅客 7，


分布结果 (A)

班机 1

从到

订票

登记到

机场 1

旅客 1u旅客 4u旅客 5u旅客 7

BC

站点 1 的局部模式


分布结果 (B)

班机 2

从到

订票

登记到

机场 2


AC

站点 2 的局部模式


分布结果 (C)

班机 3

从到

订票

登记到

机场 3


AB站点 3 的局部模式


自底向上设计

将现有的各种不同的数据库模式集成为全局模式 .

三个问题选择公用数据库模型来描述数据库的全局模式

把每个站点上的本地模式翻译成公用数据模型

把各站点上的本地数据模式集成为一公用的全局模式


视图合并

班机

机号

日期

可用座位

出入口

座位图

延期

班机

机号

日期

可用座位

机型

座位图

班机

班机 1 班机 2

机号

日期

可用座位

座位图

出入口

延期

机型


自底向上设计 - 续

识别相似性不同 Site上有相似应用 , 使用各自 DB 的数据副本 , 则

这两 Site之间有某些相似点 . 识别冲突

命名冲突同物异名异物同名域差异定标差异计量单位不同结构差异同一对象有的用实体描述 , 有的用属性描

述 . 处理操作期间不一致的数据


举例

View1 View2

技术人员工程师 =>

技术人员

工程师

Is-A

职工学生

View1 View2

=> 不可并

工程师办事员

View1 View2

=>

Employee

工程师办事员


举例 - 续View1

View2

技术人员工程

工作

1 n

工程师工程

工作

n1

=>

人员

技术员工程师

工程工作1 n


从到

机场

登记起飞时间到达时间

符号

城市

权力

区域

安全规则


班机订票

旅客

机号

日期

可用座位

进入口

座位图

延期

种类

名字电话

系统 A 概念模式


系统 B 概念模式

班机订票

旅客

标识符

起飞

起飞时间

座位图

可用座位

种类

名字电话

到达到达时间


班机

班机 B 班机 A

飞机符 (机号 )

日期

(1,3)

可用座位

座位图

出入口

登记

订票

从到

机场

到达时间

到达机场

起飞时间

起飞机场

起飞时间到达时间


旅客

种类

名字

电话

综合后建立的全局模式


数据集成

数据源1 2数据源数据源3

包装器包装器包装器

协调器

用户应用• XML

• Ontology

• View


Exercise 已知有如下两种段分配 : A> R1 在 Site1, R2 在 Site2, R3 在 Site3. B> R1 和 R2 在 Site1, R2 和 R3 在 Site3.另已知有如下应用 (所有应用的频率相同 ) A1: 在 Site1上发出 , 读 5 个 R1记录 , 5 个 R2记录 A2: 在 Site3上发出 , 读 5 个 R3记录 , 5 个 R2记录 A3: 在 Site2上发出 , 读 10 个 R2记录 .问 : 1. 如果以本地应用为主要设计目标 , 那个分配较优?

2. 假定 A3改为要修改 10 个 R2记录 , 并仍以本地应用为其设计目标 , 则那个分配方案较优 ?

distributed dbmsuniversity of shanghai for science and technology page 2.1 分布式数据库设计...

Documents