2002.10.17 - slide 1is 202 – fall 2002 prof. ray larson & prof. marc davis uc berkeley sims...

77
2002.10.17 - SLIDE 1 IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002 http://www.sims.berkeley.edu/academics/courses/ is202/f02/ SIMS 202: Information Organization and Retrieval Lecture 14: Database Design

Post on 21-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 1IS 202 – FALL 2002

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 pm

Fall 2002http://www.sims.berkeley.edu/academics/courses/is202/f02/

SIMS 202:

Information Organization

and Retrieval

Lecture 14: Database Design

Page 2: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 2IS 202 – FALL 2002

Lecture Overview

• Review– Databases and Database Design

– Database Life Cycle

– ER Diagrams

• Database Design

• Normalization

• Web-Enabled Databases

Page 3: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 3IS 202 – FALL 2002

Lecture Overview

• Review– Databases and Database Design

– Database Life Cycle

– ER Diagrams

• Database Design

• Normalization

• Web-Enabled Databases

Page 4: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 4IS 202 – FALL 2002

Models (1)

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 5: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 5IS 202 – FALL 2002

Database System Life Cycle

Growth,Change, &

Maintenance6

Operations5

Integration4

Design1

Conversion3

PhysicalCreation

2

Page 6: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 6IS 202 – FALL 2002

Another View of the Life Cycle

Operations5

Conversion3

PhysicalCreation

2Growth, Change

6

Integration4

Design1

Page 7: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 7IS 202 – FALL 2002

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 8: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 8IS 202 – FALL 2002

Entity

• An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information– Persons (e.g.: customers in a business,

employees, authors)– Things (e.g.: purchase orders, meetings,

parts, companies)

Employee

Page 9: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 9IS 202 – FALL 2002

Attributes

• Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (This is the Metadata for the entities)

Employee

Last

Middle

First

Name SSN

Age

Birthdate

Projects

Page 10: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 10IS 202 – FALL 2002

Relationships

• Relationships are the associations between entities

• They can involve one or more entities and belong to particular relationship types– One to One– One to Many– Many to Many

Page 11: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 11IS 202 – FALL 2002

Relationships

ClassAttendsStudent

PartSuppliesproject parts

Supplier

Project

Page 12: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 12IS 202 – FALL 2002

Types of Relationships

• Concerned only with cardinality of relationship

TruckAssignedEmployee

ProjectAssignedEmployee

ProjectAssignedEmployee

1 1

n

n

1

m

Chen ER notation

Page 13: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 13IS 202 – FALL 2002

More Complex Relationships

ProjectEvaluationEmployee

Manager

1/n/n

1/1/1

n/n/1

ProjectAssignedEmployee 4(2-10) 1

SSN ProjectDate

ManagesEmployee

Manages

Is Managed By

1

n

Page 14: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 14IS 202 – FALL 2002

Weak Entities

• Owe existence entirely to another entity

Order-lineContainsOrder

Invoice #

Part#

Rep#

QuantityInvoice#

Page 15: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 15IS 202 – FALL 2002

Supertype and Subtype Entities

ClerkIs one ofSales-rep

Invoice

Other

Employee

Sold

Manages

Page 16: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 16IS 202 – FALL 2002

Many to Many Relationships

Employee

ProjectIsAssigned

ProjectAssignment

Assigned

SSN

Proj#

SSN

Proj#Hours

Page 17: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 17IS 202 – FALL 2002

Lecture Overview

• Review– Databases and Database Design

– Database Life Cycle

– ER Diagrams

• Database Design

• Normalization

• Web-Enabled Databases

Page 18: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 18IS 202 – FALL 2002

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 19: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 19IS 202 – FALL 2002

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 20: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 20IS 202 – FALL 2002

Requirements Analysis

• Conceptual Requirements– Systems Analysis Process

• Examine all of the information sources used in existing applications

• Identify the characteristics of each data element– Numeric– Text– Date/time– Etc.

• Examine the tasks carried out using the information

• Examine results or reports created using the information

Page 21: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 21IS 202 – FALL 2002

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 22: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 22IS 202 – FALL 2002

Conceptual Design

• Conceptual Model– Merge the collective needs of all applications– Determine what Entities are being used

• Some object about which information is to maintained

– What are the Attributes of those entities?• Properties or characteristics of the entity• What attributes uniquely identify the entity

– What are the Relationships between entities• How the entities interact with each other?

Page 23: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 23IS 202 – FALL 2002

Developing a Conceptual Model

• Overall view of the database that integrates all the needed information discovered during the requirements analysis

• Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details

• Can also be represented using other modeling tools (such as UML)

Page 24: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 24IS 202 – FALL 2002

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 25: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 25IS 202 – FALL 2002

Logical Design

• Logical Model– How is each entity and relationship

represented in the Data Model of the DBMS• Hierarchic?• Network?• Relational?• Object-Oriented?

Page 26: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 26IS 202 – FALL 2002

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 27: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 27IS 202 – FALL 2002

Physical Design

• Internal Model– Choices of index file structure– Choices of data storage formats– Choices of disk layout

Page 28: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 28IS 202 – FALL 2002

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 29: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 29IS 202 – FALL 2002

Database Application Design

• External Model– User views of the integrated database – Making the old (or updated) applications work

with the new database design

Page 30: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 30IS 202 – FALL 2002

Lecture Overview

• Review– Databases and Database Design

– Database Life Cycle

– ER Diagrams

• Database Design

• Normalization

• Web-Enabled Databases

Page 31: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 31IS 202 – FALL 2002

Normalization

• Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data

• Normalization is a multi-step process beginning with an “unnormalized” relation– Hospital example from Atre, S. Data Base:

Structured Techniques for Design, Performance, and Management

Page 32: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 32IS 202 – FALL 2002

Normal Forms

• First Normal Form (1NF)

• Second Normal Form (2NF)

• Third Normal Form (3NF)

• Boyce-Codd Normal Form (BCNF)

• Fourth Normal Form (4NF)

• Fifth Normal Form (5NF)

Page 33: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 33IS 202 – FALL 2002

Normalization

Boyce-Codd and

Higher

Functional dependencyof nonkey attributes on the primary key - Atomic values only

Full Functional dependencyof nonkey attributes on the primary key

No transitive dependency between nonkey attributes

All determinants are candidate keys - Single multivalued dependency

Page 34: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 34IS 202 – FALL 2002

Unnormalized Relations

• First step in normalization is to convert the data into a two-dimensional table

• In unnormalized relations data can repeat within a column

Page 35: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 35IS 202 – FALL 2002

Unnormalized RelationsPatient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drugDrug side effects

1111145 311

Jan 1, 1995; June 12, 1995 John White

15 New St. New York, NY

Beth Little Michael Diamond

Gallstones removal; Kidney stones removal

Penicillin, none-

rash none

1234243 467

Apr 5, 1994 May 10, 1995 Mary Jones

10 Main St. Rye, NY

Charles Field Patricia Gold

Eye Cataract removal Thrombosis removal

Tetracycline none

Fever none

2345 189Jan 8, 1996 Charles Brown

Dogwood Lane Harrison, NY

David Rosen

Open Heart Surgery

Cephalosporin none

4876 145Nov 5, 1995 Hal Kane

55 Boston Post Road, Chester, CN Beth Little

Cholecystectomy Demicillin none

5123 145May 10, 1995 Paul Kosher

Blind Brook Mamaroneck, NY Beth Little

Gallstones Removal none none

6845 243

Apr 5, 1994 Dec 15, 1984 Ann Hood

Hilton Road Larchmont, NY

Charles Field

Eye Cornea Replacement Eye cataract removal

Tetracycline Fever

Page 36: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 36IS 202 – FALL 2002

First Normal Form

• To move to First Normal Form a relation must contain only atomic values at each row and column– No repeating groups– A column or set of columns is called a

Candidate Key when its values can uniquely identify the row in the relation

Page 37: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 37IS 202 – FALL 2002

First Normal Form

Patient # Surgeon #Surgery DatePatient NamePatient AddrSurgeon Name Surgery Drug adminSide Effects

1111 145 01-Jan-95 John White

15 New St. New York, NY Beth Little

Gallstones removal Penicillin rash

1111 311 12-Jun-95 John White

15 New St. New York, NY

Michael Diamond

Kidney stones removal none none

1234 243 05-Apr-94 Mary Jones10 Main St. Rye, NY Charles Field

Eye Cataract removal

Tetracycline Fever

1234 467 10-May-95 Mary Jones10 Main St. Rye, NY Patricia Gold

Thrombosis removal none none

2345 189 08-Jan-96Charles Brown

Dogwood Lane Harrison, NY David Rosen

Open Heart Surgery

Cephalosporin none

4876 145 05-Nov-95 Hal Kane

55 Boston Post Road, Chester, CN Beth Little

Cholecystectomy Demicillin none

5123 145 10-May-95 Paul Kosher

Blind Brook Mamaroneck, NY Beth Little

Gallstones Removal none none

6845 243 05-Apr-94 Ann Hood

Hilton Road Larchmont, NY Charles Field

Eye Cornea Replacement

Tetracycline Fever

6845 243 15-Dec-84 Ann Hood

Hilton Road Larchmont, NY Charles Field

Eye cataract removal none none

Page 38: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 38IS 202 – FALL 2002

1NF Storage Anomalies

• Insertion: A new patient has not yet undergone surgery -- hence no surgeon # -- Since surgeon # is part of the key we can’t insert

• Insertion: If a surgeon is newly hired and hasn’t operated yet -- there will be no way to include that person in the database

• Update: If a patient comes in for a new procedure, and has moved, we need to change multiple address entries

• Deletion (type 1): Deleting a patient record may also delete all info about a surgeon

• Deletion (type 2): When there are functional dependencies (like side effects and drug) changing one item eliminates other information

Page 39: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 39IS 202 – FALL 2002

Second Normal Form

• A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key– That is, every nonkey attribute needs the full

primary key for unique identification

Page 40: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 40IS 202 – FALL 2002

Second Normal Form

Patient # Patient Name Patient Address

1111 John White15 New St. New York, NY

1234 Mary Jones10 Main St. Rye, NY

2345Charles Brown

Dogwood Lane Harrison, NY

4876 Hal Kane55 Boston Post Road, Chester,

5123 Paul KosherBlind Brook Mamaroneck, NY

6845 Ann HoodHilton Road Larchmont, NY

Page 41: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 41IS 202 – FALL 2002

Second Normal Form

Surgeon # Surgeon Name

145 Beth Little

189 David Rosen

243 Charles Field

311 Michael Diamond

467 Patricia Gold

Page 42: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 42IS 202 – FALL 2002

Second Normal Form

Patient # Surgeon # Surgery Date Surgery Drug Admin Side Effects

1111 145 01-Jan-95Gallstones removal Penicillin rash

1111 311 12-Jun-95

Kidney stones removal none none

1234 243 05-Apr-94Eye Cataract removal Tetracycline Fever

1234 467 10-May-95Thrombosis removal none none

2345 189 08-Jan-96Open Heart Surgery

Cephalosporin none

4876 145 05-Nov-95Cholecystectomy Demicillin none

5123 145 10-May-95Gallstones Removal none none

6845 243 15-Dec-84Eye cataract removal none none

6845 243 05-Apr-94Eye Cornea Replacement Tetracycline Fever

Page 43: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 43IS 202 – FALL 2002

1NF Storage Anomalies Removed

• Insertion: Can now enter new patients without surgery

• Insertion: Can now enter Surgeons who haven’t operated

• Deletion (type 1): If Charles Brown dies the corresponding tuples from Patient and Surgery tables can be deleted without losing information on David Rosen

• Update: If John White comes in for third time, and has moved, we only need to change the Patient table

Page 44: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 44IS 202 – FALL 2002

2NF Storage Anomalies

• Insertion: Cannot enter the fact that a particular drug has a particular side effect unless it is given to a patient

• Deletion: If John White receives some other drug because of the penicillin rash, and a new drug and side effect are entered, we lose the information that penicillin can cause a rash

• Update: If drug side effects change (a new formula) we have to update multiple occurrences of side effects

Page 45: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 45IS 202 – FALL 2002

Third Normal Form

• A relation is said to be in Third Normal Form if there is no transitive functional dependency between nonkey attributes– When one nonkey attribute can be determined with

one or more nonkey attributes there is said to be a transitive functional dependency

• The side effect column in the Surgery table is determined by the drug administered – Side effect is transitively functionally dependent on

drug so Surgery is not 3NF

Page 46: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 46IS 202 – FALL 2002

Third Normal Form

Patient # Surgeon # Surgery Date Surgery Drug Admin

1111 145 01-Jan-95 Gallstones removal Penicillin

1111 311 12-Jun-95Kidney stones removal none

1234 243 05-Apr-94 Eye Cataract removal Tetracycline

1234 467 10-May-95 Thrombosis removal none

2345 189 08-Jan-96 Open Heart Surgery Cephalosporin

4876 145 05-Nov-95 Cholecystectomy Demicillin

5123 145 10-May-95 Gallstones Removal none

6845 243 15-Dec-84 Eye cataract removal none

6845 243 05-Apr-94Eye Cornea Replacement Tetracycline

Page 47: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 47IS 202 – FALL 2002

Third Normal Form

Drug Admin Side Effects

Cephalosporin none

Demicillin none

none none

Penicillin rash

Tetracycline Fever

Page 48: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 48IS 202 – FALL 2002

2NF Storage Anomalies Removed

• Insertion: We can now enter the fact that a particular drug has a particular side effect in the Drug relation

• Deletion: If John White recieves some other drug as a result of the rash from penicillin, but the information on penicillin and rash is maintained

• Update: The side effects for each drug appear only once

Page 49: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 49IS 202 – FALL 2002

Boyce-Codd Normal Form

• Most 3NF relations are also BCNF relations

• A 3NF relation is NOT in BCNF if:– Candidate keys in the relation are composite

keys (they are not single attributes)– There is more than one candidate key in the

relation, and– The keys are not disjoint, that is, some

attributes in the keys are common

Page 50: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 50IS 202 – FALL 2002

Most 3NF Relations Are Also BCNF – Is This One?

Patient # Patient Name Patient Address

1111 John White15 New St. New York, NY

1234 Mary Jones10 Main St. Rye, NY

2345Charles Brown

Dogwood Lane Harrison, NY

4876 Hal Kane55 Boston Post Road, Chester,

5123 Paul KosherBlind Brook Mamaroneck, NY

6845 Ann HoodHilton Road Larchmont, NY

Page 51: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 51IS 202 – FALL 2002

BCNF Relations

Patient # Patient Name

1111 John White

1234 Mary Jones

2345Charles Brown

4876 Hal Kane

5123 Paul Kosher

6845 Ann Hood

Patient # Patient Address

111115 New St. New York, NY

123410 Main St. Rye, NY

2345Dogwood Lane Harrison, NY

487655 Boston Post Road, Chester,

5123Blind Brook Mamaroneck, NY

6845Hilton Road Larchmont, NY

Page 52: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 52IS 202 – FALL 2002

Fourth Normal Form

• Any relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial

• Eliminate non-trivial multivalued dependencies by projecting into simpler tables

Page 53: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 53IS 202 – FALL 2002

Fifth Normal Form

• A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation

• Implies that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation

Page 54: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 54IS 202 – FALL 2002

Normalizing to Death

• Normalization splits database information across multiple tables

• To retrieve complete information from a normalized database, the JOIN operation must be used

• JOIN tends to be expensive in terms of processing time, and very large joins are very expensive

Page 55: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 55IS 202 – FALL 2002

Lecture Overview

• Review– Databases and Database Design

– Database Life Cycle

– ER Diagrams

• Database Design

• Normalization

• Web-Enabled Databases

Page 56: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 56IS 202 – FALL 2002

Overview

• Why use a database system for Web design and e-commerce?

• What systems are available?

• Pros and Cons of different Web database systems?

• Text retrieval in database systems

• Search engines for Intranet and Intrasite searching

Page 57: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 57IS 202 – FALL 2002

Why Use a Database System?• Simple Web sites with only a few pages

don’t need much more than static HTML files

Page 58: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 58IS 202 – FALL 2002

Simple Web Applications

Server

Web Server

Internet

Files

Clients

Page 59: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 59IS 202 – FALL 2002

Adding Dynamic Content to the Site

• Small sites can often use simple HTML and CGI scripts accessing data files to create dynamic content for small sites

Page 60: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 60IS 202 – FALL 2002

Dynamic Web Applications 1

Server

CGIWeb Server

Internet

Files

Clients

Page 61: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 61IS 202 – FALL 2002

Issues For Scaling Up Web Applications

• Performance

• Scalability

• Maintenance

• Data integrity

• Transaction support

Page 62: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 62IS 202 – FALL 2002

Why Use a Database System?• Database systems have concentrated on

providing solutions for all of these issues for scaling up Web applications– Performance– Scalability– Maintenance– Data integrity– Transaction support

• While systems differ in their support, most offer some support for all of these

Page 63: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 63IS 202 – FALL 2002

Dynamic Web Applications 2

Server

database

CGI

DBMS

Web Server

Internet

Files

Clients

database

database

Page 64: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 64IS 202 – FALL 2002

Server Interfaces

Adapted from John P. Ashenfelter, Choosing a Database for Your Web Site

DatabaseWeb Server

Web ApplicationServer

Web DBApp

HTML

JavaScript

DHTML

CGI

Web Server API’s

ColdFusion PhP Perl

Java ASP

SQL

ODBCNative DBinterfaces JDBC

Native DB

Interfaces

Page 65: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 65IS 202 – FALL 2002

Web Application Server Software

• ColdFusion

• PHP

• ASP

• All of these are server-side scripting languages that embed code in HTML pages

Page 66: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 66IS 202 – FALL 2002

ColdFusion

• Developing WWW sites typically involved a lot of programming to build dynamic sites– E.g., pages generated as a result of catalog

searches, etc.

• ColdFusion was designed to permit the construction of dynamic Web sites with only minor extensions to HTML through a DBMS interface

Page 67: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 67IS 202 – FALL 2002

What ColdFusion Is Good For

• Putting up databases onto the Web

• Handling dynamic databases (frequent updates, etc.)

• Making databases searchable and updateable by users

Page 68: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 68IS 202 – FALL 2002

CFML ColdFusion Markup Language

• Read data from and update data to databases and tables

• Create dynamic data-driven pages• Perform conditional processing• Populate forms with live data• Process form submissions• Generate and retrieve email messages• Perform HTTP and FTP function• Perform credit card verification and authorization• Read and write client-side cookies

Page 69: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 69IS 202 – FALL 2002

Templates

• Assume we have a database named contents_of_my_shopping_cart.mdb -- single table called contents...

• Create an HTML page (uses extension .cfm), before <HEAD>...

• <CFQUERY NAME= ”cart" DATASOURCE=“contents_of_my_shopping_cart"> SELECT * FROM contents ; </CFQUERY>

Page 70: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 70IS 202 – FALL 2002

• <HEAD>• <TITLE>Contents of My Shopping Cart</TITLE>• </HEAD>• <BODY>• <H1>Contents of My Shopping Cart</H1>• <CFOUTPUT QUERY= ”cart">• <B>#Item#</B> <BR>• #Date_of_item# <BR>• $#Price# <P>• </CFOUTPUT>• </BODY>• </HTML>

Templates (cont.)

Page 71: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 71IS 202 – FALL 2002

Contents of My Shopping Cart

Bouncy Ball with Psychedelic Markings 12 December 1998 $0.25

Shiny Blue Widget 14 December 1998 $2.53

Large Orange Widget 14 December 1998 $3.75

Templates (cont.)

Page 72: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 72IS 202 – FALL 2002

CFIF and CFELSE

<CFOUTPUT QUERY= ”cart"> Item: #Item# <BR><CFIF #Picture# EQ""> <IMG SRC=“generic_picture.jpg"> <BR><CFELSE> <IMG SRC="#Picture#"> <BR></CFIF></CFOUTPUT>

Page 73: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 73IS 202 – FALL 2002

Photo Browser

• The current photo browser uses a combination of – Javascript for expandable hierarchies– Database in MS Access– ColdFusion to search the database when one

of the facets is selected

• The database design for the photo database currently looks like…

Page 74: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 74IS 202 – FALL 2002

Photo Browser ER

Page 75: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 75IS 202 – FALL 2002

Photo Database

• Lets look at the photo database in the Access interface– Multi-Facet queries– Queries for multiple descriptors in the same

facet (harder)

Page 76: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 76IS 202 – FALL 2002

Assignment 7 (Database Design)

• Involves– Examining a Web Site (probably) using a

DBMS for E-commerce to sell books– Inferring the structure and kinds of entities

and attributes used in that site (book info only)– Creating your own design using ER diagrams

showing the entities and relationships that you inferred

Page 77: 2002.10.17 - SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002

2002.10.17 - SLIDE 77IS 202 – FALL 2002

Next Week

• Introduction to Information Retrieval