7-1 chapter 7 the relational data model, relational constraints, and the relational algebra

67
7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

Upload: holly-andrews

Post on 18-Jan-2018

229 views

Category:

Documents


0 download

DESCRIPTION

7-3 Outlines ( Continued ) Relational Algebra Operations –SELECT and PROJECT –Set Operations –JOIN Operations –Additional Relational Operations

TRANSCRIPT

Page 1: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

7-1

Chapter 7

The Relational Data Model, Relational Constraints,

and the Relational Algebra

Page 2: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

7-2

Outlines

• Relational Model Concepts• Characteristics of Relations• Relational Integrity Constraints

– Key Constraints– Entity Integrity Constraints– Referential Integrity Constraints

• Update Operations on Relations• Relational Algebra Operations

Page 3: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

7-3

Outlines (Continued)

• Relational Algebra Operations– SELECT and PROJECT– Set Operations– JOIN Operations– Additional Relational Operations

Page 4: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-2.1 7-4

1 1 Relational Model ConceptsRelational Model ConceptsDatabase: a collection of relations

Relation (informally): A table of values. Each column the table has a column header called an attribute. Each row is called a tuple.

attributetuple

(entity or relationship)

Formal Relational Concept:

-- DomainDomain: A set of atomic (indivisible) values. Domain NameData type(Format)

Page 5: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-2.2 7-5

-- AttributeAttribute: A name to suggest the meaning that a domain plays in a

a particular relation. Each attribute Ai has a domain dom( ).

-- Relation schemaRelation schema: A relation name R and a set of attributes Ai

that define the relation.

iA(role)

Continued

E.G. Names (set of names of Persons) EMPNAME, MGRNAME

intension

Denoted by: R(A1,A2,…, An ) where

Example: STUDENT (Name,SSN,BirthDate,Addr)

R: relation nameA1,A2, …, An : attributes

Page 6: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-2.3 7-6

-- Degree of a relationDegree of a relation: Its number of attributes n. n=4

-- TupleTuple t (of R(A1,A2,…,An)): A (ordered) set of values t = <v1,v2,…,vn> where each value vi is an element of dom(Ai). Also called an n-tuple.

-- Relation instanceRelation instance r(R): A set of tuples. r(R) = { t1, t2, …, tm}, or alternatively r(R) dom(A1) × dom(A2) × … × dom(An)

extension (state)

Continued

Page 7: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-3 7-7

Figure 7.1 The attributes and tuples of a relation STUDENT

(student entity) From the same domain play different role * value unknown * attribute does not apply to this tuple * this tuple has no value for this attribute

degree=7

Page 8: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-4 7-8

2 2 Characteristics of RelationsCharacteristics of Relations

Ordering of tuples in a relation r(R): Ordering of tuples in a relation r(R):

The tuples are not considered to be ordered, even though they appear to be in the tabular form.

No order

c.f. sequential file

Ordering of attributes in a relation schema R (and of values within Ordering of attributes in a relation schema R (and of values within each tuple):each tuple):

We will consider the attributes in R(A1,A2,…,An) and the values in

t=<v1,v2,…,vn> to be ordered.

(However, a more general alternative definition of relation does not require this ordering)An ordered set of values t=<v1,…,vn> vs. a set of (<attr>,<value>) pairs

Page 9: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-4a 7-9

Values in a tupleValues in a tuple: All values are considered atomic (indivisible).

A special null value is used to represent values that are unknown

or inapplicable to certain tuples.

Interpretation: Interpretation: relations (entity, relationship) :

Notation:

-- We refer to component values of a tuple t by t[Ai]=vi (the value

of attribute Ai for tuple t).

Similarly, t[Au, Av, …, Aw] refers to the subtuple of t containing

the values of attribute Au, Av, …, Aw respectively.

First normal form assumption

t= <‘Barbara Benson’, ‘533-69-1238’, ‘839-8461’,’7384 Fontana Lane’, null, 19,3.25>

t [Name]=<‘Barbara Benson’>

t [SSN, GPA, Age]=<‘533-69-1238’, 3.25, 19>

* assertion* predicate

how to map composite & multivalued

Page 10: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-5 7-10

3 3 Relational Integrity ConstraintsRelational Integrity Constraints

Constraints are conditions that must hold on all valid

relation instances. There are four main types of constrains:

Domain constraints, Key constraints, Entity integrity

constraints, and Referential integrity constraints.

functional dependencies, ...

Domain ConstraintsDomain Constraints: the value of each attribute

v(A) dom(A)

Page 11: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-5a 7-11

3.1 3.1 Key ConstraintsKey Constraints

SuperkeySuperkey of R of R: A set attributes SK of R such that no two tuples in any valid relation instance r(R) will have the same value for SK.That is, for any distinct tuples t1 and t2 in r(R), t1 [SK] t2 [SK].

KeyKey of R of R: A “minimal” superkey; that is, a superkey K such that removal of any attribute from K results in a set of attributes that is nota superkey.

ExampleExample: The CAR relation schema:CAR(State, Reg#, SerialNo, Make, Model, Year)has two keys Key1={State,Reg#}, Key2={SerialNo}, which are also superkeys. {SerialNo, Make} is a superkey but not a key

the set of all attributes forms a superkey, too

a relation schema may have more than one key If a relation has several candidate keys, one is chosen arbitrarily to

be the primary key. The primary key attributes are underlined.

Page 12: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-6 7-12

3.2 3.2 Entity IntegrityEntity Integrity

Relational Database SchemaRelational Database Schema: A set S of relation : A set S of relation schemas that belong to the same database and a set of schemas that belong to the same database and a set of integrity constraints IC. S is the integrity constraints IC. S is the name of the database.name of the database.

R={RR={R11,R,R22,…,R,…,Rnn}}

Entity IntegrityEntity Integrity: : The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R). This is because primary key values are used to identify the individual tuples.

t[PK] null for any tuple t in r(R)

relational database instance DB={r1,r2,…rn} s.t. ri satisfy IC (see 7-13,7-14)

Page 13: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-7 7-13

Figure 7.5 The COMPANY relational database schema; primary key are underlined.

Allow attribute that represent the same real world concept to have name that may or may not identical in different relation.

Allow attributes that represent different concept to have the same name in different relations.

(to 7-16)

Page 14: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-8 7-14

Figure 7.6 A relational database instance (state) of COMPANY schema

Page 15: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-8 7-15

Continued

Page 16: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-6a 7-16

Note: Other attributes of R may be similarly constrained to

Disallow null values, even though they are not members of the

primary key.

Key Constraints and Entity Integrity Constraints are specified on

individual relations.

Page 17: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-9 7-17

3.3 3.3 Referential IntegrityReferential IntegrityA tuple in one relation that refers to another relation must refer to an existing tuple in it.

A constraint involving two relations (the previous constraints involve a single relation). Arise from relationships among entities EMPLOYEE(DNo) DEPARTMENT(DNUMBER)

Used to specify a relationship among tuples in two relations: the referencing relation and the referenced relation.

Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R2. A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1 [FK]=t2 [PK]. FK can be null.

(EMPLOYEE) (DEPARTMENT)

have the same domain

Page 18: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-9a 7-18

A referential integrity constraint can be displayed in a relational

database schema as a directed arc from R1 FK to R2. (see 7-19)

FK

PK

R1

R2

‧‧‧ ‧‧‧ referencing relation

referenced relation

Semantic integrity Constraint

˙ the salary of an employee the salary of his boss

˙ the maximum work hours number specified in worker law

>>

Page 19: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-10 7-19

a foreign key refers to its own relation

7-18

Page 20: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-11 7-20

Operations: retrievals and updates4 Update Operations on Relations

-- INSERT a tuple.-- DELETE a tuple.-- MODIFY a tuple.

-- Integrity constraints should not be violated by the update operations.

-- Several update operations may have to be grouped together.

-- Updates may propagate to cause other updates automatically. This may be necessary to maintain integrity constraints.

-- In case of integrity violation, several actions can be taken: -cancel the operation that causes the violation -perform the operation but inform the user of the violation -trigger additional updates so the violation is corrected -execute a user-specified error-correction routine

Page 21: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-11a 7-21

Insert operation

Domain Constraint: if an attribute value is given that does not appear in the domain

Key Constraint:

if a key value in the new tuple t already exists in another tuple in the relation.

Entity Integrity:

if the primary key of new tuple is null

Referential Integrity:

if the value of any foreign key in t refers to a tuple that does not exist in the referenced relation.

Page 22: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-12 7-22

1.Insert < ‘Cecilia’, ’F’, ‘ Kolowsky’, ‘677678989’, ’05-APR-50’,‘6357Windy Lane, Katy, TX’, F, 28000, null, 4 > into EMPLOYEE.acceptable

2.Insert <‘Alicia’,’J’,’Zelaya’,’999887777’,’05-APR-50’,’6357 WindyLane, Katy,TX’, F, 28000,’987654321’,4> into EMPLOYEE.Violate key constraint.

3.Insert <‘Cecilia’,’F’,’Kolowsky’,null,’05-APR-50’,’6357 Windy Lane, Katy,TX’, F, 28000, null,4> into EMPLOYEE.Violate entity integrity constraint

4.Insert<‘Cecilia’, ‘F’, ‘Kolowsky’,’677678989’,’05-APR-50’,’6357 Windswept Katy, TX, F, 28000, ‘987654321’,7> into EMPLOYEEViolate referential integrity constraint

(See 7-14)

SSN SUPER SSN DNo

DNUM MGRSSN

EMPLOYEE

DEPARTMENT

‧‧‧ ‧‧‧ ‧ ‧ ‧‧‧

Page 23: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-13 7-23

Two options are available.

Reject the insertion

Correct the reason for rejecting the insertion.

(3) Provide an acceptable SSN.

(4) ‧ Change the value of DNo, or

‧ insert a DEPARTMENT tuple with DNUMER T=7

(Cascade back to EMPLOYEE relation)

SSN SUPER SSN DNo

DNUM MGRSSN

EMPLOYEE

DEPARTMENT

‧‧‧ ‧‧‧ ‧ ‧ ‧‧‧

Page 24: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-14 7-24

DELETE OPERATION

Only referential integrity constraint may be violated.× domain constraint× key constraint× entity integrity constraint

1. Delete the WORKS_ON tuple with ESSN=‘999887777’ and DNo=10 acceptable

2. Delete the EMPLOYEE tuple with SSN=‘999887777’ unacceptable. Two tuples in WORKS_ON refer to this tuple.

3. Delete the EMPLOYEE tuple with SSN=‘333445555’ unacceptable The tuple involved is referenced by tuples from EMPLOYEE, DEPARTMENT, WORKS_ON and DEPENDENT relations.

Page 25: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-14a 7-257-24

Page 26: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-15 7-26

Three options are available.

Reject the deletion.

Attempt to cascade (or propagate) the deletion

E.G. Delete the two offending tuples in(2).

Modify the referencing attribute values that cause the violation.

Change to another valid tuple, or set to null. When a referential integrity constraint is specified, the DBMS should allow the users to specify which of the three options applies in case of a violation of the constraint.

Combine these three alternatives. E.G. in (3) operation WORKS_ON, DEPENDENT => automatically delete EMPLOYEE => Set to null or change to another tuple

Page 27: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-16 7-27

Modify operationModify operation

1. Modify the SALARY of the EMPLOYEE tuple with SSN=‘999887777’ to 28000. Acceptable

2. Modify the DNO of the EMPLOYEE tuple with SSN=‘999887777’ to 1. Acceptable

3. Modify the DNO of the EMPLOYEE tuple with SSN=‘999887777’ to 7. Unacceptable (Violate referential integrity)

4. Modify the SSN of the EMPLOYEE tuple with SSN=‘999887777’ to ‘987654321’. Unacceptable (Violate primary key and referential integrity constraints)

Page 28: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-16a 7-28

987654321

7

Page 29: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-17 7-29

Modify attributes other than primary key and foreign key:

check correct data type & domain

Modify primary key:

check domain constraint, key constraint, entity integrity constraint, referential integrity constraint.

Modify foreign key:

check referential integrity constraint

Page 30: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-18 7-30

Defining RelationsDefining Relations

Deciding which attributes belong together in each relation

Choosing appropriate names for the relations and their

attributes.

Specifying the domains and data types of various

attributes.

Identifying the candidate keys and choosing a primary key

for each relation.

Specifying all foreign keys.

Page 31: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-19 7-31

Name of relational database schemaDECLARE SCHEMA COMPANY;

domain name & data typeDECLARE DOMAIN PERSON_SSNS TYPE FIXED_CHAR (9) ;DECLARE DOMAIN PERSON_NAMES TYPE VARIABLE_CHAR (15) ;DECLARE DOMAIN PERSON_INITIALS TYPE ALPHABETIC_CHAR (1) ;DECLARE DOMAIN DATES TYPE DATE ;DECLARE DOMAIN ADDRESSES TYPE VARIABLE_CHAR (35) ;DECLARE DOMAIN PERSON_SEX TYPE ENUMERATED {M, F} ;DECLARE DOMAIN DEPT_SALARIES TYPE MONEY ;DECLARE DOMAIN DEPT_NUMBERS TYPE INTEGER_RANGE [1,10] ;DECLARE DOMAIN DEPT_NAMES TYPE VARIABLE_CHAR (20) ;

relationsDECLARE RELATION EMPLOYEEFOR SCHEMA COMPANYATTRIBUTES FNAME DOMAIN PERSON_NAMES, MINIT DOMAIN PERSON_INITIALS, LNAME DOMAIN PERSON_NAMES, SSN DOMAIN PERSON_SSNS,

Page 32: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-19a 7-32

BDATE DOMAIN DATES, ADDRESS DOMAIN ADDRESSES,SEX DOMAIN PERSON_SEX,SALARY DOMAIN PERSON_SALARIES, SUPERSSN DOMAIN PERSON_SSNS,DNO DOMAIN DEPT_NUMBERS

CONSTRAINTS PRIMARY_KEY (SSN), FOREIGN_KEY (SUPERSSN) REFERENCES EMPLOYEE, FOREIGN_KEY (DNO) REFERENCES DEPARTMENT;

DECLARE RELATION DEPARTMENTFOR SCHEMA COMPANYATTRIBUTES DNAME DOMAIN DEPT_NAMES, DNUMBER DOMAIN DEPT_NUMBERS, MGRSSN DOMAIN PERSON_SSNS, MGRSTARTDATE DOMAIN DATES

CONSTRAINTS PRIMARY_KEY (DNUMBER), KEY (DNAME), FOREIGN_KEY (MGRSSN) REFERENCES EMPLOYEE;

Page 33: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-20 7-33

5 The Relational Algebra

- Operations to manipulate relations.

- Used to specify retrieval requests (queries).

- Query result is in the form of a relation.

Relational Operations:

5.1 SELECT and PROJECTΠ operations.

5.2 Set operations: These include UNION , INTERSECTION ,

DIFFERENCE , CARTESIAN PRODUCT .

5.3 JOIN operations .

5.4 Other relational operations: DIVISION, OUTER JOIN,

AGGREGATE FUNCTIONS.

˙set operations˙ specific for relational databases

Page 34: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-21 7-34

5.1 5.1 SELECT and PROJECT SELECT and PROJECT ΠΠ

SELECT operation (denoted by ):-Selects the tuples (rows) from a relation R that satisfy a certain

selection condition c

- Form of the operation: c(R)

- The condition c is an arbitrary Boolean expression on the attributes of R <attr. name> <comp. op> <constant value> <attr. name> <comp. op> <attr. name> <comp. op>: =, <, , >, , ≠, AND, OR, NOT.≦ ≧

-Resulting relation has the same attributes as R

-Resulting relation includes each tuple in r(R) whose attributes values satisfy the condition c.

selection

Page 35: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-21 7-35

Continued

Example:

To select the subset of EMPLOYEE tuples who work in department 4

DNO = 4 (EMPLOYEE)

To select the subset of EMPLOYEE tuples whose salary is greater than 30000. SALARY>30000 (EMPLOYEE)

To select tuples for all employees who either work in department 4 and make over $25000 per year, or work in department 5.

(DNO=4 AND SALARY>25000) OR DNO=5 (EMPLOYEE)

Page 36: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-21a 7-36

SELECT operation is commutative.

<cond 1>( <cond 2> (R)) = <cond 2>( <cond 1> (R))

Combine a cascade of SELECT operations into a single SELECT operation with conjunction.

<cond 1>( <cond 2> (…( <cond n> (R)) …))

= <cond 1>AND <cond 2>AND . . . AND <cond n> (R)

Page 37: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-22 7-37

PROJECT operation (denoted by Π ):

- Keeps only certain attributes (columns) from a relation R

specified in an attribute list L

- Form of operation ΠL(R)

- Resulting relation has only those attributes of R specified in L

Its degree is equal to # of attributes in L

List each employee’s first and last names and salary.

Example: ΠFNAME,LNAME,SALARY (EMPLOYEE)

projection

Page 38: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-22 7-38

-The PROJECT operation eliminates duplicate tuples in the resulting relation so that it remains a mathematical set ( no

duplicate elements). (duplicate elimination)

Example: ΠSEX, SALARY (EMPLOYEE)

If several male employees have salary 30000, only a single tuple <M, 30000> is kept in the resulting relation.Duplicate tuples are eliminated by the Π operation.

Continued

# of tuples in the resulting relation # of tuples in the original relation≦

Page 39: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-22a 7-39

Π<list 1> (Π <list 2> (R) ) ≠ Π<list 2> (Π <list 1> (R) )

Π<list 1> (Π <list 2> (R) ) = Π<list 1> (R)

when <list 1> <list 2>

Page 40: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-23 7-40

Sequences of operations:--Several operations can be combined to form a relational algebra expression (query)

Example: Retrieve the names and salaries of employees who work in department 4:

ΠFNAME,LNAME,SALARY ( DNO=4 (EMPLOYEE))

--Alternatively, we specify explicit intermediate relations for each step:

DEPT4_EMPS DNO=4 (EMPLOYEE)

R ΠFNAME,LNAME,SALARY (DEPT4_EMPS )

Page 41: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-23a 7-41

Continued

Rename the attributes.

--Attributes can optionally be renamed in the resulting left-hand- side relation (this may be required for some operations that will be presented later): e.g. UNION, JOIN

DEPT4_EMPS DNO=4 (EMPLOYEE)

R(FIRSTNAME,LASTNAME,SALARY)

ΠFNAME,LNAME,SALARY (DEPT4_EMPS )

No renaming: the resulting relation has the same attribute names.

Page 42: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-24 7-42

5.2 5.2 Set OperationsSet OperationsRelation: a set of tuples- Binary operations from mathematical set theory: UNION: R1 R∪ 2, INTERSECTION: R1 ∩ R2, SET DIFFERENCE: R1 - R2, CARTESIAN PRODUCT: R1 × R2.

- For , ∩, ∪ - , the operand relations R1(A1,A2,…,An) and R2(B1,B2,…,Bn) must have the same number of attributes, and the domains of corresponding attributes must be compatible; that is, dom(Ai)=dom(Bi) for i=1,2,…, n. This condition is called union compatibility.

- The resulting relation for , ∩, or∪ - has the same attribute names as the first operand R1 (by convention).

Page 43: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-25 7-43

Figure 7.11 Two union compatible relations

STUDENT ∪ INSTRUCTOR STUDENT ∩ INSTRUCTOR

STUDENT - INSTRUCTOR ≠ INTRUCTOR - STUDENT

R-S≠S-R

R S=S R∪ ∪R∩S=S∩RR (S T)=(R S) T∪ ∪ ∪ ∪(R∩S)∩T=R∩(S∩T)

commutative

associative

7-42

Page 44: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-26 7-44

CARTESTIAN PRODUCT (CROSS )

m+n attributes nR1 × nR2 tuplesR (A1,A2,…,Am, B1,B2,…,Bn) R1 (A1,A2,…,Am) × R2(B1,B2,…,Bn) m attributes n attributes nR1 tuples nR2 tuples

-A tuple exists in R for each combination of tuples t1 from R1 and t2 from R2 such that: t[A1,A2,…,Am]= t1 and t [B1,B2,…,Bn]= t2

-If R1 has n1 tuples and R2 has n2 tuples, then R will have n1* n2 tuples.

PRODUCTJOIN

Page 45: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-26a 7-45

-CARTESTIAN PRODUCT is a meaningless operation on its own.

It can combine related tuples from two relations if followed by the

appropriate SELECT operation.

=> JOIN

Example: Combine each DEPARTMENT tuple with the EMPLOYEE

tuple of the manager.

DEP_EMP DEPARTMENT × EMPLOYEE

DEPT_MANAGER MGRSSN=SSN (DEP_EMP)

Page 46: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-27 7-46

5.3 5.3 JOIN OperationsJOIN Operations

THETA JOIN: Similar to a CARTESIAN PRODUCT

followed by a SELECT. The condition c is called a

join condition.

m+n attributes n≦ R1 × nR2 R (AR (A11,A,A22,…,A,…,Amm, B, B11,B,B22,…,B,…,Bnn))RR11 (A (A11,A,A22,…,A,…,Amm) ) cc R R22(B(B11,B,B22,…,B,…,Bnn)) m attributes n attributes nR1 nR2 c :<cond> AND … AND<cond> c :<cond> AND … AND<cond>

cond: Acond: Aii θ B θ Bjj θ θ {=,<, ,>, , ≠} ≦ ≧ {=,<, ,>, , ≠} ≦ ≧

Page 47: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-27a 7-47

EQUIJOIN: The join condition c includes one or more

equality comparisons involving attributes from R1 and R2.

That is, c is of the form:

( A( Ai i = B= Bj j ) AND…AND ( A) AND…AND ( Ah h = B= Bk k ); 1 i, h m, 1 j, k n≦ ≦ ≦ ≦); 1 i, h m, 1 j, k n≦ ≦ ≦ ≦

In the above EQUIJOIN operation:

Ai,…,Ah are called the join attributes of R1

Bj,…,Bk are called the join attributes of R2

Page 48: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-27b 7-48

Example of using EQUIJOIN:

Retrieve each DEPARTMENT’s name and its manager’s

name:

T DEPARTMENT MGRSSN=SSN EMPLOYEE

RESULT ΠDNAME,FNAME,LNAME(T)

EMP_DEPENDENTS EMPNAMES × DEPENDENT

ACTUAL_DEPENDENTS SSN=ESSN(EMP_DEPENDENTS)

ACTUAL_DEPENDENTS

EMPNAMES SSN=ESSNDEPENDENT

Page 49: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-28 7-49

NATURAL JOIN (*):

In an EQUIJOIN RR RR1 c1 c R R2 2 , , the join attribute of RR2 2 appears

redundantly in the result relation RR. In a NATURAL JOIN, the

redundant join attributes of RR2 2 are eliminated from RR. The equality

condition is implied and need not be specified.

RR RR11**(join attributes of R1),(join attributes of R2) (join attributes of R1),(join attributes of R2) RR2 2

Example: Retrieve each EMPLOYEE’s name and the name

of the DEPARTMENT he/she works for:

TTEMPLOYEE*EMPLOYEE*(DNO),(DNUMBER)(DNO),(DNUMBER)DEPARTMENTDEPARTMENT

RESULTRESULTΠΠFNAME,LNAME,DNAMEFNAME,LNAME,DNAME(T)(T)

0≦nR ≦ nR1 × nR2

nR1 nR2

Page 50: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-28a 7-50

If the join attributes have the same names in both relations, they

need not be specified and we can write RR RR11* R* R2 2 .

Example: Retrieve each EMPLOYEE’s name and name of

his/her SUPERVISOR:

SUPERVISOR(SUPERSSN,SFN,SLN)SUPERVISOR(SUPERSSN,SFN,SLN) ΠΠSSN,FNAME,LNAMESSN,FNAME,LNAME(EMPLOYEE)(EMPLOYEE)

T T EMPLOYEE * SUPERVISOR EMPLOYEE * SUPERVISOR

RESULTRESULTΠΠFNAME,LNAME,SFN,SLNFNAME,LNAME,SFN,SLN(T)(T)

Page 51: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-29 7-51

Note: In the original definition of NATURAL JOIN, the join attributes were required to have the same names in both relations.

There can be a more than one set of join attributes with a different meaning between the same two relations. For example:

JOIN ATTRIBUTES

EMPLOYEE.SSN=DEPARTMENT.MGRSSN

EMPLOYEE.DNO=DEPARTMENT.DNUMBER

RELATIONSHIP

EMPLOYEE manages the DEPARTMENT

EMPLOYEE works for the DEPARTMENT

Example: Retrieve each EMPLOYEE’s name and the name of the

DEPARTMENT he/she works for:

TT EMPLOYEE EMPLOYEE DNO=DNUMBERDNO=DNUMBERDEPARTMENTDEPARTMENT

RESULTRESULT ΠΠFNAME,LNAME,DNAMEFNAME,LNAME,DNAME(T)(T)

Page 52: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-30 7-52

A relation can have a set of join attributes to join it with itself:

JOIN ATTRIBUTES RELATIONSHIP

EMPLOYEE(1).SUPERSSN=EMPLOYEE(2).SSN EMPLOYEE(2) supervises EMPLOYEE(1)

-One can think of this as joining two distinct copies of the relation, although only one relation actually exists.

-In this case, renaming can be useful.

Example: Retrieve each EMPLOYEE’s name and the name of his/her SUPERVISOR:

SUPERVISOR(SSN,SFN,SLN)ΠSSN,FNAME,LNAME,LNAME(EMPLOYEE)(EMPLOYEE)

TT EMPLOYEE EMPLOYEE SUPERSSN=SSSNSUPERSSN=SSSNSUPERVISORSUPERVISOR

RESULTRESULT ΠΠFNAME,LNAME,SFN,SLNFNAME,LNAME,SFN,SLN((T)T)

Page 53: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-31 7-53

DIVISION Operation

R(Z) ÷ S(X) where X R(Z) ÷ S(X) where X Z Z

Y = Z – XY = Z – X

TT11 ΠΠYY (R) (R)

TT22 ΠΠYY ((S × T ((S × T11) – R)) – R)

T T T T1 1 – T– T22

A relation T(Y)T(Y) that includes a tuple tt if a tuple ttRR whose ttRR[Y]=t [Y]=t appears in RR with ttRR[X]= t[X]= tSS for every tuple ttSS in SS.

Page 54: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-31a 7-54

DIVISION Operation

Retrieve the names of employees who work on all the projects that ‘John Smith’ works on.

(1) Retrieve the list of project numbers that ‘John Smith’ works on. SMITH SMITH σ σFNAME=‘John’ AND LNAME=‘Smith’FNAME=‘John’ AND LNAME=‘Smith’(EMPLOYEE) (EMPLOYEE) SMITH_PNOSSMITH_PNOS ΠΠPNOPNO(WORKS_ON) (WORKS_ON) ESSN=SSNESSN=SSNSMITH)SMITH)

(2) Create a relation that includes tuples <PNo, ESSN> from WORKS_ON.SSN_PNO SSN_PNO ΠΠPNO,ESSNPNO,ESSN(WORKS_ON) (WORKS_ON)

(3) Apply the DIVISION operationSSNS(SSN)SSNS(SSN) SSN_PNOS ÷ SMITH_PNOS SSN_PNOS ÷ SMITH_PNOSRESULTRESULT ΠΠFNAME,LNAMEFNAME,LNAME(SSNS * EMPLOYEE(SSNS * EMPLOYEE) )

Page 55: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-31b 7-55

SSNSSSN_PNOS÷SMITH_PNOS T R ÷ S

Figure 7.15 DIVISION

Page 56: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-32 7-56

Complete Set of Relational Algebra Operations:--All the operations discussed so far can be described as a sequence of only the operations SELECT,PROJECT,UNION,SET DIFFERENCE, and CARTESIAN PRODUCT. R∩S≡(R S) ∪R∩S≡(R S) ∪ -- ((R((R -- S) (S∪S) (S∪ -- R))R)) (natural)(natural) join ≡join ≡ ΠΠLLσσC C ( R × S )( R × S )

--Hence, the set {σ, Π, , ∪ - , × } is called a complete set of relational algebra operations. Any query language equivalent to these operations is called relationally complete.

--For database applications, additional operations are needed that were not part of the original relational algebra. These include:

1.Aggregate functions and grouping. 2.OUTER JOIN and OUTER UNION.

more than complete

Page 57: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-33 7-57

A list of (<function>,<attributes>) pairs

5.4 5.4 Additional Relational OperationsAdditional Relational OperationsAGGREGRATE FUNCTIONS:

-- Functions such as SUM, COUNT, AVERAGE, MIN, MAX are often applied to sets of values or sets of tuples in database applications.

<grouping attributes><grouping attributes>ℱℱ <<function list>function list>(R)(R)

--The grouping attributes are optional.

Example 1: Retrieve the average salary of all employees (no grouping needed):

R(AVAGSAL) R(AVAGSAL) ℱℱaverage salaryaverage salary(EMPLOYEE)(EMPLOYEE)

grouping attr.t attr. In : function list

Script F

a list of attributes of the relation specified in R

degree: 1

single tuple only

Page 58: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-33a 7-58

Continued

Example 2: For each department, retrieve the department

number,the number of employees, and the average salary

(in the department):

R(DNO, NUMEMPS,AVGSAL)R(DNO, NUMEMPS,AVGSAL)

DNO DNO ℱℱ COUNT SSN,AVERAGE SALARY COUNT SSN,AVERAGE SALARY(EMPLOYEE)(EMPLOYEE)

DNO is called the grouping attribute in the above example.

degree:3

Page 59: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-34 7-59

7.9b 7.16b

Page 60: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-35 7-60

Recursive Closure OperationRecursive Closure Operation applied to recursive relationship between tuples of the same types. E.g. employee and supervisor

Retrieve all supervisees of an employee e at all levels.

Need a looping mechanism

Cannot be specified in relational algebra.

1st + 2nd level

RESULT2(SSN)RESULT2(SSN)ΠΠSSN1SSN1(SUPERVISION (SUPERVISION SSN2=SSNSSN2=SSNRESULT1)RESULT1)

RESULT3RESULT3( RESULT 1 ( RESULT 1 ∪ ∪ RESULT 2)RESULT 2)

1st levelBORG_SSNBORG_SSNΠΠSSNSSN(σ(σFNAME=‘James’ANDLNAME=‘BORG’FNAME=‘James’ANDLNAME=‘BORG’(EMPLOYEE))(EMPLOYEE))

SUPERVISION(SSN1,SSN2)SUPERVISION(SSN1,SSN2)ΠΠSSN,SUPERSSNSSN,SUPERSSN(EMPLOYEE)(EMPLOYEE)

RESULT1(SSN)RESULT1(SSN)ΠΠSSN1SSN1(SUPERVISION (SUPERVISION SSN2=SSNSSN2=SSNBORG_SSN)BORG_SSN)

Page 61: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-36 7-61

Employee ▽ James Bars 888665558 ▽

John Smith 123456789 ▽ Franklin Wong 333445555 ▽

Alicia Zelaya 999887777▽ Jennifer Wallace 987654321▽ Ramesh Narayan 666884444 Joce English 453453453 Ahmad Jabbar 987987987

DepartmentResearch 333445555Administration 987654321Headquarters 888665555

OUTER JOINOUTER JOIN

--In a regular EQUIJOIN or NATURAL operation, tuples in R1 or R2 that do not have matching tuples in the other relation do not appear in the result. Employee * Department

--Some queries require all tuples in R1 (or R2 or both) to appear in the result.

--When no matching tuples are found, nulls are placed for the missing attributes. List all employee names and the name of the department they manage

Page 62: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-36a 7-62

--LEFT OUTER JOIN : RR1 1 RR22 lets every tuple in R1

appear in the result.

TEMPTEMP( EMPLOYEE ( EMPLOYEE SSN=MGRSSN SSN=MGRSSN DEPARTMENT)DEPARTMENT) RESULT RESULT Π Π FNAME,MINIT,LNAME,DNAMEFNAME,MINIT,LNAME,DNAME (TEMP) (TEMP)

--RIGHT OUTER JOIN: RR1 1 RR22 lets every tuple in R2

appear in the result.

TEMPTEMP( EMPLOYEE ( EMPLOYEE SSN=MGRSSN SSN=MGRSSN DEPARTMENT)DEPARTMENT) RESULT RESULT Π Π FNAME,MINIT,LNAME,DNAMEFNAME,MINIT,LNAME,DNAME (TEMP) (TEMP)

--FULL OUTER JOIN: RR1 1 RR22 lets every tuple in R1or R2

appear in the result.

TEMPTEMP( EMPLOYEE ( EMPLOYEE SSN=MGRSSN SSN=MGRSSN DEPARTMENT)DEPARTMENT) RESULT RESULT Π Π FNAME,MINIT,LNAME,DNAMEFNAME,MINIT,LNAME,DNAME (TEMP) (TEMP)

(See 6-38)

Department + sell 555555555

4 tuples

9 tuples

Page 63: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-37 7-63

OUTER UNIONOUTER UNION

Take the union of tuples from two relations that are

partially compatible.

STUDENT(Name, SSN, Department, Advisor)

FACULTY(Name, SSN, Department, Rank)

= R (Name, SSN, Department, Advisor, Rank)

OUTER UNION

7 2 4

Page 64: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-38 7-64

Page 65: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-39 7-65

Examples of Queries in Relational AlgebraExamples of Queries in Relational Algebra

QUERY 1Retrieve the name and address of all employees who work for the ‘Research’ department. RESEARCH_DEPTRESEARCH_DEPTσσDNAME=‘RESEARCH’DNAME=‘RESEARCH’(DEPARTMENT)(DEPARTMENT) RESEARCH_DEPT_EMPS RESEARCH_DEPT_EMPS(RESEARCH_DEPT (RESEARCH_DEPT DNUMBER=DNODNUMBER=DNOEMPLOYEE)EMPLOYEE) RESULT RESULTππFNAME,LNAME,ADDRESSFNAME,LNAME,ADDRESS(RESEARCH_DEPT_EMPS)(RESEARCH_DEPT_EMPS)

QUERY 2For every project located in ‘Stafford’, list the project number, the controlling department number, and the department manager’s last name, address, and birthdate. STAFFORD_PROJSSTAFFORD_PROJSσσPLOCATION=‘Stafford’PLOCATION=‘Stafford’(PROJECT)(PROJECT) CONTR_DEPT CONTR_DEPT(STAFFORD_PROJS (STAFFORD_PROJS DNUM=DNUMBERDNUM=DNUMBERDEPARTMENT)DEPARTMENT) PROJ_DEPT_MGR PROJ_DEPT_MGR (CONT_DEPT (CONT_DEPT MGRSSN=SSNMGRSSN=SSNEMPLOYEE)EMPLOYEE) RESULT RESULT ππPNUMBER,DNUM,LNAME,ADDRESS,BDATEPNUMBER,DNUM,LNAME,ADDRESS,BDATE (PROJ_DEPT_MGR) (PROJ_DEPT_MGR),,

Page 66: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-40 7-66

QUERY 3Find the names of employees who work on all the projects controlled by department number 5 DEPT5_PROJS(PNO)DEPT5_PROJS(PNO)ππPNUMBERPNUMBER(σ(σDNUM=5DNUM=5(PROJECT))(PROJECT)) EMP_PROJ(SSN,PNO) EMP_PROJ(SSN,PNO)ππESSN,PNO ESSN,PNO (WORKS_ON)(WORKS_ON) RESULT_EMP_SSNS RESULT_EMP_SSNS EMP_PROJ ÷ DEPT5_PROJS EMP_PROJ ÷ DEPT5_PROJS RESULT RESULT ππLNAME,FNAMELNAME,FNAME (RESULT_EMP_SSNS * EMPLOYEE) (RESULT_EMP_SSNS * EMPLOYEE),,

QUERY 4Make a list of project numbers for projects that involve an employee whose last name is ‘Smith’, either as a worker or as a manager of the department that controls the project. SMITHS(ESSN)SMITHS(ESSN)ππSSNSSN(σ(σLNAME=‘Smith’LNAME=‘Smith’(EMPLOYEE))(EMPLOYEE))SMITH_WORKER_PROJSSMITH_WORKER_PROJSππPNOPNO(WORKS_ON * SMITHS)(WORKS_ON * SMITHS)MGRSMGRS π πLNAME,DNUMBERLNAME,DNUMBER(EMPLOYEE (EMPLOYEE SSN=MGRSSN SSN=MGRSSN DEPARTMENT )DEPARTMENT )SMITH_MGRSSMITH_MGRS , , σσLNAME=‘Smith’LNAME=‘Smith’(MGRS)(MGRS)SMITH_MANAGED_DEPTS(DNUM)SMITH_MANAGED_DEPTS(DNUM)ππDNUMBERDNUMBER(SMITH_MGRS)(SMITH_MGRS)SMITH_MGR_PROJS(PNO)SMITH_MGR_PROJS(PNO)ππPNUMBERPNUMBER(SMITH_MANAGED_DEPTS*PROJECT)(SMITH_MANAGED_DEPTS*PROJECT)RESULTRESULT(SMITH_WORKER_PROJS SMITH_MGR_PROJS) ∪(SMITH_WORKER_PROJS SMITH_MGR_PROJS) ∪

Page 67: 7-1 Chapter 7 The Relational Data Model, Relational Constraints, and the Relational Algebra

6-41 7-67

QUERY 5List the names of all employees with two or more dependents.

TT11(SSN, NO_OF_DEPS)(SSN, NO_OF_DEPS) ESSNESSN ℱℱ COUNT DEPENDENT_NAMECOUNT DEPENDENT_NAME(DEPENDENT)(DEPENDENT)

T T22 σ σNO_OF_DEPSNO_OF_DEPS≥2≥2(T(T11)) RESULT RESULT ππ LNAME,FNAMELNAME,FNAME(T(T22 * EMPLOYEE) * EMPLOYEE)

QUERY 6Retrieve the names of employees who have no dependents. ALL_EMPSALL_EMPSππSSNSSN(EMPLOYEE)(EMPLOYEE) EMPS_WITH_DEPS(SSN) EMPS_WITH_DEPS(SSN) ππ ESSNESSN(DEPENDENT)(DEPENDENT) EMPS_WITHOUT_DEPS EMPS_WITHOUT_DEPS(ALL_EMPS(ALL_EMPS -- EMPS_WITH_DEPS)EMPS_WITH_DEPS) RESULT RESULT ππLNAME,FNAMELNAME,FNAME (EMPS_WITHOUT_DEPS * EMPLOYEE) (EMPS_WITHOUT_DEPS * EMPLOYEE)QUERY 7List the names of managers who have at least one dependent. MGRS(SSN)MGRS(SSN)ππMGRSSNMGRSSN(DEPARTMENT)(DEPARTMENT) EMPS_WITH_DEPS(SSN) EMPS_WITH_DEPS(SSN) ππ ESSNESSN(DEPENDENT)(DEPENDENT) MGRS_WITH_DEPS MGRS_WITH_DEPS(MGRS ∩ EMPS_WITH_DEPS)(MGRS ∩ EMPS_WITH_DEPS) RESULT RESULT ππLNAME,FNAMELNAME,FNAME (MGRS_WITH_DEPS * EMPLOYEE) (MGRS_WITH_DEPS * EMPLOYEE)