数据库原理 principles of database system 第 10 章 逻辑建模 (ii) logical modeling(ii)...

42
数数数数数 Principles of Database Sys 10 数 数数数数 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational Databases

Upload: marcia-francis

Post on 02-Jan-2016

232 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

数据库原理 Principles of Database System

第 10 章 逻辑建模 (II) Logical Modeling(II)

Textbook:Chapter 10 Functional Dependencies and Normalization for Relational Databases

Page 2: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

2

本讲主要内容 为什么需要“数据模型优化” 数据模型优化的基本概念 关系模式的非形式化设计原则 函数依赖的基本概念 规范化的基本概念 超键、键和主属性的定义 第 1 范式 第 2 范式 第 3 范式 规范化小结

Page 3: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

3

复习: Phases of Database Design and Implementation Process( 数据库设计的基本过程 )

Phase 1:Requirements Collections and Analysis( 需求收集与分析 )

Phase 2:Conceptual Database Design( 概念结构设计 )

Phase 3:Choice of a DBMS( 选择合适的 DBMS) Phase 4:Data Model Mapping (Logical Database

Design)( 逻辑结构设计 ) Phase 5:Physical Database Design( 物理结构设计 ) Phase 6:Database System Implementation( 数据库

实施 ) Phase 7:Database System Operation and

Maintenance( 数据库运行和维护 )

Page 4: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

4

Phase 2: ER DIAGRAM of COMPANY database

Page 5: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

5

Phase 4:Data Model Mapping (Logical Database Design)( 逻辑结构设计 ) Data Model Mapping( 数据模型映射 )

From E-R to Relational Model 数据模型的优化 ( 难点 ) ( 本讲内容 ) 设计用户子模式

Page 6: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

6

Page 7: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

7

Schema diagram for the COMPANY relational database schema

Page 8: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

8

为什么需要“数据模型优化” So far in our discussion of conceptual design

and its mapping into the relational model, we have not developed any measure of the appropriateness, “goodness,” or quality of the design, other than the intuition( 直觉 ) of the designer.

We need some formal measure of why one grouping of attributes into a relation schema may be better than another.

Page 9: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

9

数据模型优化的基本概念 What is relational database design?

The grouping of attributes to form "good" relation schemas

We first discuss informal guidelines for good relational design

Then we discuss formal concepts of functional dependencies and normal forms - 1NF (First Normal Form) - 2NF (Second Normal Form) - 3NF (Third Normal Form)

Page 10: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

10

Informal Design Guidelines for Relation Schemas Semantics( 语义 ) of the Relation

Attributes Whenever we group attributes to form a

relation schema, we assume that a certain meaning is associated with the attributes.

In general, the easier it is to explain the semantics of the relation, the better the relation schema design will be.

Page 11: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

11

GUIDELINE 1: Design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation. Intuitively, if a relation schema corresponds to one entity type or one relationship type, the meaning tends to be clear. Otherwise, the relation corresponds to a mixture of multiple entities and relationships and hence becomes semantically unclear. ( 可概括为“一事一地”原则,即“一件事放一张表,不同事放不同表” )

Page 12: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

12

They may be used as views, but they cause problems when used as base relations.

Page 13: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

13

Redundant( 冗余 ) Information in Tuples and Update Anomalies( 异常 ) One goal of schema design is to

minimize the storage space that the base relations (files) occupy.

Update Anomalies insertion anomalies deletion anomalies modification anomalies

Page 14: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

14

Page 15: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

15

Insertion Anomalies To insert a new employee tuple into EMP_DEPT,

we must include either the attribute values for the department that the employee works for, or nulls (if the employee does not work for a department as yet).

It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relation. The only way to do this is to place null values in the attributes for employee. This causes a problem because SSN is the primary key of EMP_DEPT, and each tuple is supposed to represent an employee entity—not a department entity.

Page 16: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

16

Deletion Anomalies If we delete from EMP_DEPT an

employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost from the database.

Page 17: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

17

Modification Anomalies In EMP_DEPT, if we change the

value of one of the attributes of a particular department—say, the manager of department 5—we must update the tuples of all employees who work in that department; otherwise, the database will become inconsistent.

Page 18: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

18

GUIDELINE 2: Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly.

Page 19: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

19

It is important to note that these guidelines may sometimes have to be violated in order to improve the performance of certain queries. For example, if an important query retrieves information concerning the department of an employee, along with employee attributes, the EMP_DEPT schema may be used as a base relation. However, the anomalies in EMP_DEPT must be noted and well understood so that, whenever the base relation is updated, we do not end up with inconsistencies.

Page 20: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

20

In general, it is advisable( 明智的 ) to use anomaly-free base relations and to specify views that include the JOINs for placing together the attributes frequently referenced in important queries. This reduces the number of JOIN terms specified in the query, making it simpler to write the query correctly, and in many cases it improves the performance.

Page 21: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

21

The performance of a query specified on a view that is the JOIN of several base relations depends on how the DBMS implements the view. Many relational DBMSs materialize a frequently used view so that they do not have to perform the JOINs often. The DBMS remains responsible for updating the materialized view (either immediately or periodically) whenever the base relations are updated.

Page 22: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

22

Null Values in Tuples In some schema designs we may

group many attributes together into a "fat" relation. If many of the attributes do not apply to all tuples in the relation, we end up with many nulls in those tuples.

Page 23: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

23

GUIDELINE 3: As far as possible, avoid placing attributes in a base relation whose values may frequently be null. If nulls are unavoidable, make sure that they apply in exceptional cases( 极少数情况,比如 SUPERSSN) only and do not apply to a majority of tuples in the relation.

Page 24: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

24

1:1 和 1:N 联系的另一种映射方式 If only 10 percent of employees have

individual offices, there is little justification( 正当理由 ) for including an attribute OFFICE_NUMBER in the EMPLOYEE relation; rather, a relation EMP_OFFICES(ESSN, OFFICE_NUMBER) can be created to include tuples for only the employees with individual offices.

Page 25: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

25

Functional Dependencies Functional dependencies (FDs) are

used to specify formal measures of the "goodness" of relational designs

FDs and keys are used to define normal forms for relations

FDs are constraints that are derived from the real-world meaning and interrelationships of the data attributes

Page 26: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

26

Functional Dependencies (continued)

A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y

X -> Y holds if whenever two tuples have the same value for X, they must have the same value for Y

For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then t1[Y]=t2[Y]

If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K])

Page 27: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

27

Functional Dependencies (continued)

An FD is a property of the attributes in the schema R

The constraint must hold on every relation instance r(R)

Page 28: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

28

Examples of FD constraints social security number determines

employee nameSSN -> ENAME

project number determines project name and locationPNUMBER -> {PNAME, PLOCATION}

employee ssn and project number determines the hours per week that the employee works on the project{SSN, PNUMBER} -> HOURS

Page 29: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

29

Page 30: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

30

Normalization of Relations Normalization: The process of decomposing

unsatisfactory "bad" relations by breaking up their attributes into smaller relations

Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form

The database designers need not normalize to the highest possible normal form. (usually up to 3NF, BCNF or 4NF)

Page 31: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

31

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more.

Page 32: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

32

Definitions of Keys and Attributes Participating in Keys(continued) If a relation schema has more than one

key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys.

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.

Page 33: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

33

First Normal Form Disallows composite attributes,

multivalued attributes, and nested relations

Considered to be part of the definition of relation

Page 34: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

34

FIGURE 10.8Normalization into 1NF. (a) A relation schema that is not in 1NF. (b) Example state of relation DEPARTMENT. (c) 1NF version of same relation with redundancy.

Page 35: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

35

Second Normal Form

Definitions: Prime attribute - attribute that is member

of the primary key K Full functional dependency - a FD Y -> Z

where removal of any attribute from Y means the FD does not hold any moreExamples: - {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold - {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds

Page 36: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

36

Second Normal Form (continued) A relation schema R is in second

normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Page 37: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

37

FIGURE 10.10Normalizing into 2NF and 3NF. (a) Normalizing EMP_PROJ into 2NF relations (b) Normalizing EMP_DEPT into 3NF relations.

Page 38: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

38

Third Normal Form

Definition: Transitive functional dependency - a FD

X -> Z that can be derived from two FDs X -> Y and Y -> Z Examples:

- SSN -> DMGRSSN is a transitive FD since

SSN -> DNUMBER and DNUMBER -> DMGRSSN hold - SSN -> ENAME is non-transitive since there is no set of attributes X where SSN -> X and X -> ENAME

Page 39: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

39

Third Normal Form (continued) A relation schema R is in third normal

form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE:In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency .E.g., Consider EMP (SSN, Emp#, Salary ). Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

Page 40: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

40

规范化小结 第一范式:关系模式中的每一个属性都

是不可分的数据项(即原子项) 第二范式:关系模式达到第一范式,且

每个非主属性都完全函数依赖于主键 第三范式:关系模式达到第二范式,且

不存在非主属性对主键的传递函数依赖

Page 41: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

41

I swear to tell the truth, the whole truth, and nothing but the truth, so help me God!

规范化 ( 达到 3NF) 的“锦囊妙计”

Page 42: 数据库原理 Principles of Database System 第 10 章 逻辑建模 (II) Logical Modeling(II) Textbook:Chapter 10 Functional Dependencies and Normalization for Relational

42