数据库原理 principles of database system 第 10 章逻辑建模 (ii) logical modeling(ii)...

数据库原理 Principles of Database System

第 10 章逻辑建模 (II) Logical Modeling(II)

Textbook:Chapter 10 Functional Dependencies and Normalization for Relational Databases

2

本讲主要内容为什么需要“数据模型优化” 数据模型优化的基本概念关系模式的非形式化设计原则函数依赖的基本概念规范化的基本概念超键、键和主属性的定义第 1 范式第 2 范式第 3 范式规范化小结

3

复习： Phases of Database Design and Implementation Process( 数据库设计的基本过程 )

Phase 1:Requirements Collections and Analysis( 需求收集与分析 )

Phase 2:Conceptual Database Design( 概念结构设计 )

Phase 3:Choice of a DBMS( 选择合适的 DBMS) Phase 4:Data Model Mapping (Logical Database

Design)( 逻辑结构设计 ) Phase 5:Physical Database Design( 物理结构设计 ) Phase 6:Database System Implementation( 数据库

实施 ) Phase 7:Database System Operation and

Maintenance( 数据库运行和维护 )

4

Phase 2: ER DIAGRAM of COMPANY database

5

Phase 4:Data Model Mapping (Logical Database Design)( 逻辑结构设计 ) Data Model Mapping( 数据模型映射 )

From E-R to Relational Model 数据模型的优化 ( 难点 ) ( 本讲内容 ) 设计用户子模式

7

Schema diagram for the COMPANY relational database schema

8

为什么需要“数据模型优化” So far in our discussion of conceptual design

and its mapping into the relational model, we have not developed any measure of the appropriateness, “goodness,” or quality of the design, other than the intuition( 直觉 ) of the designer.

We need some formal measure of why one grouping of attributes into a relation schema may be better than another.

9

数据模型优化的基本概念 What is relational database design?

The grouping of attributes to form "good" relation schemas

We first discuss informal guidelines for good relational design

Then we discuss formal concepts of functional dependencies and normal forms - 1NF (First Normal Form) - 2NF (Second Normal Form) - 3NF (Third Normal Form)

10

Informal Design Guidelines for Relation Schemas Semantics( 语义 ) of the Relation

Attributes Whenever we group attributes to form a

relation schema, we assume that a certain meaning is associated with the attributes.

In general, the easier it is to explain the semantics of the relation, the better the relation schema design will be.

11

GUIDELINE 1: Design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation. Intuitively, if a relation schema corresponds to one entity type or one relationship type, the meaning tends to be clear. Otherwise, the relation corresponds to a mixture of multiple entities and relationships and hence becomes semantically unclear. ( 可概括为“一事一地”原则，即“一件事放一张表，不同事放不同表” )

12

They may be used as views, but they cause problems when used as base relations.

13

Redundant( 冗余 ) Information in Tuples and Update Anomalies( 异常 ) One goal of schema design is to

minimize the storage space that the base relations (files) occupy.

Update Anomalies insertion anomalies deletion anomalies modification anomalies

15

Insertion Anomalies To insert a new employee tuple into EMP_DEPT,

we must include either the attribute values for the department that the employee works for, or nulls (if the employee does not work for a department as yet).

It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relation. The only way to do this is to place null values in the attributes for employee. This causes a problem because SSN is the primary key of EMP_DEPT, and each tuple is supposed to represent an employee entity—not a department entity.

16

Deletion Anomalies If we delete from EMP_DEPT an

employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost from the database.

17

Modification Anomalies In EMP_DEPT, if we change the

value of one of the attributes of a particular department—say, the manager of department 5—we must update the tuples of all employees who work in that department; otherwise, the database will become inconsistent.

18

GUIDELINE 2: Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly.

19

It is important to note that these guidelines may sometimes have to be violated in order to improve the performance of certain queries. For example, if an important query retrieves information concerning the department of an employee, along with employee attributes, the EMP_DEPT schema may be used as a base relation. However, the anomalies in EMP_DEPT must be noted and well understood so that, whenever the base relation is updated, we do not end up with inconsistencies.

20

In general, it is advisable( 明智的 ) to use anomaly-free base relations and to specify views that include the JOINs for placing together the attributes frequently referenced in important queries. This reduces the number of JOIN terms specified in the query, making it simpler to write the query correctly, and in many cases it improves the performance.

21

The performance of a query specified on a view that is the JOIN of several base relations depends on how the DBMS implements the view. Many relational DBMSs materialize a frequently used view so that they do not have to perform the JOINs often. The DBMS remains responsible for updating the materialized view (either immediately or periodically) whenever the base relations are updated.

22

Null Values in Tuples In some schema designs we may

group many attributes together into a "fat" relation. If many of the attributes do not apply to all tuples in the relation, we end up with many nulls in those tuples.

23

GUIDELINE 3: As far as possible, avoid placing attributes in a base relation whose values may frequently be null. If nulls are unavoidable, make sure that they apply in exceptional cases( 极少数情况，比如 SUPERSSN) only and do not apply to a majority of tuples in the relation.

24

1:1 和 1:N 联系的另一种映射方式 If only 10 percent of employees have

individual offices, there is little justification( 正当理由 ) for including an attribute OFFICE_NUMBER in the EMPLOYEE relation; rather, a relation EMP_OFFICES(ESSN, OFFICE_NUMBER) can be created to include tuples for only the employees with individual offices.

25

Functional Dependencies Functional dependencies (FDs) are

used to specify formal measures of the "goodness" of relational designs

FDs and keys are used to define normal forms for relations

FDs are constraints that are derived from the real-world meaning and interrelationships of the data attributes

26

Functional Dependencies (continued)

A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y

X -> Y holds if whenever two tuples have the same value for X, they must have the same value for Y

For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then t1[Y]=t2[Y]

If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K])

27

Functional Dependencies (continued)

An FD is a property of the attributes in the schema R

The constraint must hold on every relation instance r(R)

28

Examples of FD constraints social security number determines

employee nameSSN -> ENAME

project number determines project name and locationPNUMBER -> {PNAME, PLOCATION}

employee ssn and project number determines the hours per week that the employee works on the project{SSN, PNUMBER} -> HOURS

30

Normalization of Relations Normalization: The process of decomposing

unsatisfactory "bad" relations by breaking up their attributes into smaller relations

Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form

The database designers need not normalize to the highest possible normal form. (usually up to 3NF, BCNF or 4NF)

31

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more.

32

Definitions of Keys and Attributes Participating in Keys(continued) If a relation schema has more than one

key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys.

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.

33

First Normal Form Disallows composite attributes,

multivalued attributes, and nested relations

Considered to be part of the definition of relation

34

FIGURE 10.8Normalization into 1NF. (a) A relation schema that is not in 1NF. (b) Example state of relation DEPARTMENT. (c) 1NF version of same relation with redundancy.

35

Second Normal Form

Definitions: Prime attribute - attribute that is member

of the primary key K Full functional dependency - a FD Y -> Z

where removal of any attribute from Y means the FD does not hold any moreExamples: - {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold - {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds

36

Second Normal Form (continued) A relation schema R is in second

normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

37

FIGURE 10.10Normalizing into 2NF and 3NF. (a) Normalizing EMP_PROJ into 2NF relations (b) Normalizing EMP_DEPT into 3NF relations.

38

Third Normal Form

Definition: Transitive functional dependency - a FD

X -> Z that can be derived from two FDs X -> Y and Y -> Z Examples:

- SSN -> DMGRSSN is a transitive FD since

SSN -> DNUMBER and DNUMBER -> DMGRSSN hold - SSN -> ENAME is non-transitive since there is no set of attributes X where SSN -> X and X -> ENAME

39

Third Normal Form (continued) A relation schema R is in third normal

form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE:In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency .E.g., Consider EMP (SSN, Emp#, Salary ). Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

40

规范化小结第一范式：关系模式中的每一个属性都

是不可分的数据项（即原子项）第二范式：关系模式达到第一范式，且

每个非主属性都完全函数依赖于主键第三范式：关系模式达到第二范式，且

不存在非主属性对主键的传递函数依赖

41

I swear to tell the truth, the whole truth, and nothing but the truth, so help me God!

规范化 ( 达到 3NF) 的“锦囊妙计”

数据库原理 principles of database system 第 10 章 逻辑建模 (ii) logical modeling(ii)...

Documents

数据库原理 principles of database system 第 10 章逻辑建模 (ii) logical modeling(ii)...