database design 建構一個能滿足某一應用需求的資料庫 結構. steps in database...
TRANSCRIPT
Database Design
•建構一個能滿足某一應用需求的資料庫結構
Steps in database
Requirements formulation & analysis
Requirements formulation & analysis
conceptual
schema
conceptual
schema
Logical(or implementation)
design
Logical(or implementation)
design
Physical designPhysical design
Logical database design
Physical database design
Requirements
Spec.
Informationstructure
ERD or 3NF relations
Logical database structure(DBMS processible)
Access methods,Storage structures
Requirement Formation& Analysis
• Purpose: identify and describe the data that are required by the organization
• Inputs: user information requirements, data items(attributes), data association, processing requirements (reports frequencies, response time requirements, etc.)
• Outputs: a set of requirements specifications for conceptual design
Conceptual Design
• Purpose: synthesize the various user views and information requirements into a global database design.
• The result is a conceptual schema in ERD or normalized telations
Implementation Design
• Purpose: map the conceptual data model into logical schema that can be processed by a particular DBMS.
Conceptual model is mapped into hierarchical, network or relational data model.
Schemas & subschemas are developed using DDL
Physical Design
• Designing stored formats, selecting access methods (index), security, integrity, backup & recovery.
• Output: internal schema
Steps in Requirements Analysis
• Identifying & documenting what data user requires
• Study the data flows&decision-making process, particularly answering the following questions:1. User views?2. Data elements (attributes) required in the views?3. Primary keys?4. Relationships among data elements?5. Operational requirements such as security, integrity &
response time?
Data-oriented Approach
• Tools: view analysis, definition, normalization
Data-oriented Approach
• Steps:
Define database scopeDefine database scope
Establish metadata collection standards
Establish metadata collection standards
Identify user viewsIdentify user views
Build data dictionaryBuild data dictionary
Identify data volumes and Usage
Identify data volumes and Usage
Identify operational requirements
Identify operational requirements
For physical Database design
For logical database design
Define database scope
• Define the scope of the application• Mini-world
Student Registrationsystem
Mini-world
Univ.
Establish metadata collection standards
• Use consensus metadata collection forms
Typical forms
Typical forms
Identify user views
• Identify the user views in the application• Used for logical database design
User views
• A user view is a subset of data required by a particular user to make a decision or carry out some action, can be derived from Reports, displays, files Determine semantic rules Standard form for recording information about user
views
A user view example
Lakewood collegeGrade Report
Fall Semester 198x
Student#: 38214 Major: Information Systems
Student-name: Jane Bright
Course # Course-title Instructor-name
Instructor-Location
Grade
IS 350IS445
DatabaseSystems Analysis
CoddKemp
B104B213
AC
Data Associations:
Student#Student_name, MajorCourse# Course-title, Instructor-name, Instructor-locationInstructor-name Instructor-location(Student#, Course#) Grade
Build a data dictionary
• Each data item type that appears in a user view must be defined and described in details.
• Standard form can be used.
Identify data volume and usage
• Data volumes and usage pattern is required for physical database design
• It is done after the conceptual model has been completed
Identify operational requirements
• Security• Integrity• Response time• Back & recovery• Archiving-how long must data be retained? In
what form?
View analysis using normalization
• Normalization (正規化 ) • 當我們得到 user views(經由 RA)之後 , 每個 view內含多個 data items ,分別屬於不 同的 entity, 或 relationship
Normalization
• 例 : Grade-Report 內含 STUDENT#, STUDENT-NAME, MAJOR,COURSE# COURSE-TITLE, INSTRUCTOR-NAME,INSTRUCTOR- LOCATION , GRADE
• 分別屬於 STUDENT , COURSE , 和 INSTRUCTOR ENTITY TYPE
Normalization
• Normalization provides a foundation for logical database design
• Normalization can be used to reduce complex user views to a set of small, stable conceptual schema
Normalization 的步驟
• 定出 user views→ 表示為 Unnormalized relations →除去重複群 (1NF) →除去部分相依 (2NF) →除去遞移性相依 (3NF)
→view integration (conceptual schema)
User views
Unnormalizedrelations
Normalizedrelations(1NF)
Second normalform(2NF)relations
Third normalform(3NF)relations
Remove repeating groups
Remove partialdependency
Remove transitive dependency
Unnormalized relation
• Relation 中含有 repeating groups.• 例 : Grade-report • 表示法 GRADE-REPORT (STUDENT#,STUDENT-
NAME,MAJOR,{COURSE#,COURSE-TITLE,INSTRUCTOR-NAME,INSTRUCTOR-LOCATION,GRADE})
Example of unnormalized relation
Student# Studentname
Major Course# CourseTitle
Instructorname
InstructorLocation
grade
38214 Bright IS IS 350
IS 465
database
sys anal
CODD
KEMP
B104
B213
A C
69173 Smith PM IS 465PM 300QM 440
sys analprod mgt
op res
KEMPLEWISKEMP
B213D317B213
ABC
…
Normalized relation(1NF)
• 中每一個列與行的交會處只能放一個 single value . 亦即每一個 attribute只能有一個 single value
• Atomic attribute• 又叫做 1NF的表格
Normalization
• 經由 Normalization的分析,可以由一堆 views之中整理出一個 conceptual data model (schema), 此 conceptual
schema 能完整 (completely),簡單地(simply)支持
所有的 user views.
Normalization
• Normalization 的一般用途是免除在對表格作insert , update,和 delete時所造成的不方便或異常 (anomaly)
Unnormalized relation 的正規化
• 分離 repeating group使成一新的 relation. 例 :STUDENT(STUDENT# , STUDENT-NAME , MAJOR)STUDENT-COURSE(STUDENT# , COURSE# ,COURSE-TITLE , INSTRUCTOR-NAME , INSTRUCTOR-LOCATION , GRADE)
• 要帶 foreign key以保持損切割• 找出 Primary key ---STUDENT-COURSE 中
COURSE# 不能獨立形成 primary key, 需 (STUDENT# , COURSE#) 聯合成為 primary key
Remove repeating group
MAJORSTUDENT NAME
STUDENT#
GRADEINSTRUCTOR LOCATION
INSTRUCTOR NAME
COURSE TITLE
COURSE#
…
PMSmith69173
ISBright38214
MAJORSTUDENT NAME
STUDENT#
GRADE-REPORT
STUDENT
3NF
…
CB213KEMPOp resQM 44069173
BD317LEWISProd mgtPM 30069173
AB213KEMP Sys analIS 46569173
CB213KEMPSys analIS 46538214
AB104CODDDatabaseIS 35038214
GRADEINSTRUCTOR LOCATION
INSTRUCTOR NAME
COURSE TITLE
COURSE#STUDENT#STUDENT COURSE
1NF
Problems in STUDENT-COURSE
• Data redundancy---IS 465• Insertion anomaly---insert a new course
e.g., BA200, INTR DP, We cannot do this unless one student registers in BA200
• Deletion anomaly---課程若只有一個人修,若該學生退選,則 delete該 tuple會導致資訊遺失 例如, delete student#=69173 修 Prod mgt,會遺失Lewis教 Prod mgt的訊息。
• Update anomaly---改 IS465的課名為 sys anal&Des,必須修改全部有關的 tuple以免造成不一致。
Reasons for anomaly
• Some attributes do not fully depend on the primary key
STUDENT-COURSE表格的 primary key為 (STUDENT#, COURSE#)
COURSE#→ COURSE-TITLE, INSTRUCTOR-NAME, INSTRUCTOR-LOCATION• 我們說 COURSE-TITLE,INSTRUCTOR-NAME ,
INSTRUCTOR-LOCATION partially dependent(部份相關) on the primary key .
Functional dependency(FD)
• 正規化是在分析函數相依關係 (Functional dependency),因此介紹函數相依關係
• 定義: Given a relation R , attribute Y of R is functionally dependent on attribute X of R if and only if each X-value in R has associated with it precisely one Y-value in R (at any time)
• 函數相依關係是一種語意規則,不能以某一時間表格的內容論定
Functional dependency
RX Y
x1x2
y1y1
X→YX determines YY functional dependent on X
X 與 Y 的關係為多對一
第二正規化 (2NF)
• 定義: A relation is said to be in 2NF , if it is already 1NF , and all non-key attributes are all fully dependent on the primary key
• 正規化的方法:將 partially dependent 的attributes分到另外一個表格中
第二正規化範例
…
CB213KEMPOp resQM 44069173
BD317LEWISProd mgtPM 30069173
AB213KEMP Sys analIS 46569173
CB213KEMPSys analIS 46538214
AB104CODDDatabaseIS 35038214
GRADEINSTRUCTOR LOCATION
INSTRUCTOR NAME
COURSE TITLE
COURSE#STUDENT#
Student-Course relation
Dependency diagram
COURSETITLE
INSTRUCTORNAME
INSTRUCTORLOCATION
STUDENT#
COURSE#
GRADE}Partially function dependency
表格第二正規化
STUDENT#
COURSE# COURSE TITLE
INSTRUCTOR NAME
INSTRUCTOR LOCATION
GRADE
STUDENT# COURSE# GRADE
38214 IS 350 A
38214 IS 465 C
69173 IS 465 A
69173 PM 300 B
69173 QM 400 C
…
Student-Course
REGISTERATION
COURSE# COURSE TITLE
INSTRUCTOR NAME
INSTRUCTOR LOCATION
IS 350 Database CODD B104
IS 465 Sys anal KEMP B213
PM 300 Prod mgt LEWIS D317
QM 440 OP res KEMP B213
…
COURSE-INSTRUCTOR
3NF 2NF
Anomaly in 2NF relation
• insert 一個新的 instructor data 必須 instructor 開授某一課程
• delete 某一個 course 可能會失去一個 instructor 的資料 ,
例 : delete IS350CODD 的資料會遺失
• update 由於 instructor 的資料重複 , 改 instructor 的location 較不易
FD in COURSE-INSTRUCTOR
COURSE#COURSE TITLE
INSTRUCTORNAME
INSTRUCTORLOCATION
Transitive dependency
Transitive dependency
• A transitive dependency occurs when one non-key attribute is dependent on one or more non-key attributes
Primary key A B
Transitive dependency
Primary key →AA→B
Primary key→B}
3NF
• A relation is in 3NF, if it is already in 2NF and no transitive dependency exists.
Primary key
Attribute1 Attribute2 Attribute n…
3NF的 FD型態
Each non-key attribute is fully dependent on the primary key and there is no transitive dependency.
3NF的正規化
• 將造成 transitive dependency 的 attributes分離至另外一個 relation 中 .
3NF的正規化
• Conversion of a relation to third normal form (3NF) by removing transitive functional dependency (FD) COURSE-INSTRUCTOR
COURSE# COURSETITLE
INSTRUCTOR NAME INSTRUCTOR LOCATION
COURSE INSTRUCTOR
COURSE# COURSETITLE
INSTRUCTOR NAME
IS 350 Database CODD
IS 465 Sys anal KEMP
PM 300 Prod mgt LEWIS
QM 440 Op res KEMP
…
INSTRUCTOR NAME INSTRUCTORLOCATION
CODD B104
KEMP B213
LEWIS D317
…
3NF
3NF
3NF的正規化
• Instructor-name 必須放入 COURSE relation 之中 以保存 COURSE-INSTRUCTOR relationship,如此COURSE relation 之中才可以參考到 INSTRUCTOR.
• INSTRUCTOR-NAME 為 COURSE中的一個 foreign key
• Normalization 到 3NF就可以結束 , 因為 3NF排除了大部分的 anomaly, 每個 entity都由各自的一個 relation 表示 .insert ,delete , 或 update一個 entity不須參考到別的 entity.因為一個 relation 只代表一個 entity,或relationship 可以繼續作 Normalization至 4NF, BCNF , 5NF , DKNF … , 但會產生太多的小 relation ,通常不必 .
表格分析後的結果
Summary of 3NF relations for GRADE-REPORTSTUDENT #
STUDENTNAME
MAJOR
38214 Bright IS
69173 Smith PM
…
STUDENT ( STUDENT#, STUDENT-NAME , MAJOR )
COURSE#COURSE
TITLEINSTRUCTOR
NAME
IS 350 Database CODD
IS 465 Sys anal KEMP
PM 300 Prod mgt LEWIS
QM 440 Op res KEMP
…
COURSE ( COURSE# , COURSE-TITLE , INSTRUCTOR-NAME)
INSTRUCTOR NAMEINSTRUCTOR
LOCATION
CODD B104
KEMP B213
LEWIS D317
…
INSTRUCTOR ( INSTRUCTOR-NAME , INSTRUCTOR-LOCATION )
STUDENT # COURSE# GRADE
38214 IS 350 A
38214 IS 465 C
69173 IS 465 A
69173 PM 300 B
69173 QM 440 C
…
REGISTRATION ( STUDENT# , COURSE# , GRADE )
ER Diagram
STUDENT
COURSEINSTRUCTOR
GRADE
M
N
Data volume analysis
Normalization 做完之後就可以作 view integration . 若只有一個 view , 可以作 Conceptual Model . 例 : Conceptual Model 中可以表示 Mapping 的關係及 Data Volumes . Data Volume : relation 中最多 tuple 時的 tuple 個數 .
例 : 1. 3000 Student 2. 每個 Student 平均選 3 門課 9000 個註冊 ( 修課 ) 3. 100 個 Instructors 4. 平均一個 Instructor 教 3 門課 300 門課 ( 班 )
5. 平均一班有 30 人 (9000/300=30) - Data volumes 在 conceptual 中表示 . 同時 mapping 中也表示對應的元素個數 .
Data structure diagram
Data Structure Diagram :
STUDENT#STUDENT-
NAMEMAJOR
STUDENT
REGISTRATION
COURSE
INSTRUCTOR INSTRUCTOR-
NAME INSTRUCTOR- LOCATION
STUDENT#COURSE-
TITLEINSTRUCTOR-
NAME
STUDENT# COURSE# GRADE
3000
9000
300
100
3
30
3
Boyce-Codd Normal Form
• Problem in 3NF: some non-key attribute may determine part of the key attributes
• Determinant: an attribute on which some other attributes is fully functional dependent
• A relation is in Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key
• BCNF is related to functional dependency
Relation not in BCNF
STUDENT#
MAJOR ADVISOR
123 PHYSICS
EINSTEIN
123 MUSIC MOZART
456 BIOL DARWIN
789 PHYSICS
BOHR
999 PHYSICS
EINSTEIN
ST-MAJ-ADV
STUDENT
MAJOR
ADVISOR}
Anomaly
• Student# 456 changes from BIOL to MATH→ lose DARWIN advises in BIOL• Cannot insert WATSON advises in COMPSCI until
at least one student majoring in COMPSCI and is assigned WATSON as an advisor
Normalization
STUDENT# ADVISOR
123 EINSTEIN
123 MOZART
456 DARWIN
789 BOHR
999 EINSTEIN
ST-ADV
ADVISOR MAJOR
EINSTEIN PHYSICS
MOZART MUSIC
DARWIN BIOL
BOHR PHYSICS
EINSTEIN PHYSICS
ADV-MAJ
4NF
• 4NF is related to a notion called multi-value dependency• Multi-value dependency exists when thre are three
attribute A,B,C satisfies For a value of A, there exists a well-defined set of value of B For a value of A, there exists a well defined set of value of C The set of B values associated with a given A value is
independent of the C values
COURSE INSTRUCTOR TEXTBOOK
A relation not in 4NF
COURSE INSTRUCTOR
TEXTBOOK
Management
WhiteGreenBlack
DruckerPeters
Finance Gray WestonGilford
Offering
(a) Unnormalized Relation
COURSE INSTRUCTOR
TEXTBOOK
Management
White Drucker
Management
Green Drucker
Management
Black Drucker
Management
White Peters
Management
Green Peters
Management
Black Peters
Finance Gray Weston
Finance Gray Gilford
(b) Normalized relationProblem: 加入 Middleston textbook 給 management course需加入 3個Tuples(Insert anomaly)
4NF 正規化
• Remove multi-value dependencies
Relations in fourth normal form
COURSE INSTRUCTOR
Management White
Management Green
Management Black
Finance Gray
TEACHER
COURSE TEXTBOOK
Management Drucker
Management Peters
Finance Weston
Finance Gilford
TEXT
Normal form 關係圖
unnormalized1NF2NF
3NF
BCNF4NF
5NF(PJNF)DKNF
Normal form越高 relation越多3NF即可
BCNF與 FD有關
4NF與 multi-valueDependency有關
5NF與 Project-Join有關