understanding about relational database m-square systems inc

60
Understanding about Relational DATABASE – IBM DB2 Knowledge Sharing Session October 27, 2014 Presented By: Muthukumaran Natarajan 1

Upload: muthu-natarajan

Post on 12-Jul-2015

127 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Understanding about relational database m-square systems inc

Understanding about Relational

DATABASE – IBM DB2

Knowledge Sharing Session

October27, 2014

Presented By: Muthukumaran Natarajan

1

Page 2: Understanding about relational database m-square systems inc

Introduction to Relation Database Management System (RDMS) �  Relationship database is developed with a collection of one or more relation and

Data �  Leading (commercial) manufacturers of relational DB-products:

ü Oracle ü Microsoft(MS-Access, SQL server) ü IBM(DB2 LUW & Z/os,Informix) ü Sybase

�  A language called SQL (Structured Query Language) was developed to work with relational databases.

�  All data is stored in the form of tables (relations) comprised of rows (records) and columns (fields).

2

Page 3: Understanding about relational database m-square systems inc

INTRODUCTION TO RELATIONSHIP DATABASE – DB2

3

Page 4: Understanding about relational database m-square systems inc

RELATIONSHIP DATABASE

A well-designed database should:

ü Eliminate Data Redundancy: the same piece of data shall not be stored in more than one place. This is because duplicate data not only waste storage spaces but also easily lead to inconsistencies & Performance issues.

ü Ensure Data Integrity and Accuracy.

4

Page 5: Understanding about relational database m-square systems inc

There are several types of database relationships.

ü One to One Relationships ü One to Many and Many to One Relationships ü Many to Many Relationships ü Self Referencing Relationships

5

Page 6: Understanding about relational database m-square systems inc

ONE TO MANY RELATIONSHIP

� A one-to-many (1:m) relationship is where, for each instance of table A, many instances of the table B exist, but for each instance of table B, only once instance of table A exists.

�  For example: ◦  Each artist, there are many paintings. Since it

is a one-to-many relationship, and not many-to-many.

6

Page 7: Understanding about relational database m-square systems inc

Continued.. author(id,otherAttributtes) books(id,authorid,otherAttributes) Or author(id,otherAttributtes) books(id,otherAttributes) authorConnectsBooks(authorid,booksid)

7

Page 8: Understanding about relational database m-square systems inc

Examples one to Many relationships This is the most commonly used relationship

type: Consider an e-commerce website, with the following:

�  Customers can make many orders. �  Orders can contain many items. �  Items can have descriptions in many

languages.

8

Page 9: Understanding about relational database m-square systems inc

Relationship & Understanding One-to-many relationships ◦  The most common relationship used when creating relational

databases.  A row in a table in a database can be associated with one or (likely) more rows in another table.  An example of a one-to-many relationship is a single order has many items on that order.  And since relationships work both ways it is not uncommon to hear reference to many-to-one-relationships as well.

One-to-one relationship ◦  A row in a table is associated to one and only one row in another table.  An

example of a one-to-one relationship is a person can have one social security number and a social security number can only be assigned to one person.

◦  In most cases there is no need for a one-to-one relationship as the contents of the two tables can be combined into one table.

Many-to-many relationships ◦  When one or more rows in a table are associated with one or more rows in

another table.  An example of a many-to-many relationship is a table of customers who can purchase many different products and a table of products that can be purchased by many different customers.

9

Page 10: Understanding about relational database m-square systems inc

CONSTRAINTS AND KEYS

10

Page 11: Understanding about relational database m-square systems inc

PRIMARY KEY

� A primary key is a field in a table which uniquely identifies each row/record in a database table.

� Primary keys must contain unique values.

�  A primary key column cannot have NULL values.

� For example, an unique number customerID can be used as the primary key for the Customers table.

11

Page 12: Understanding about relational database m-square systems inc

Types of Constraints �  The following types of constraints are

available:

ü NOT NULL constraints

ü Unique (or unique key) constraints

ü Primary key constraints

ü  Foreign key (or referential integrity) constraints

ü  (Table) Check constraint

12

Page 13: Understanding about relational database m-square systems inc

Referential integrity

� Referential integrity is a relational database concept in which multiple tables share a relationship based on the data stored in the tables, and that relationship must remain consistent.

� Referential integrity enforces the following rules:

13

Page 14: Understanding about relational database m-square systems inc

�  Referential Integrity Rule: Each foreign key value must be matched to a primary key value in the table referenced (or parent table). ◦  You can insert a row with a foreign key in the child table only if the value

exists in the parent table. ◦  If the value of the key changes in the parent table (e.g., the row updated or

deleted), all rows with this foreign key in the child table(s) must be handled accordingly. You could either (a) disallow the changes; (b) cascade the change (or delete the records) in the child tables accordingly; (c) set the key value in the child tables to NULL.

◦  Most RDBMS can be setup to perform the check and ensure the referential integrity, in the specified manner.

�  Business logic Integrity: Beside the above two general integrity rules, there could be integrity (validation) pertaining to the business logic, e.g., zip code shall be 5-digit within a certain range, delivery date and time shall fall in the business hours; quantity ordered shall be equal or less than quantity in stock, etc. These could be carried out in validation rule (for the specific column) or programming logic.

14

Page 15: Understanding about relational database m-square systems inc

SET RULES –RI Types Cascading can be defined for UPDATE and DELETE. There are

four different options available: �  1. SET NULL: This action specifies that the column will be set to NULL

when the referenced column is updated/deleted. �  2. CASCADE: CASCADE specifies that the column will be updated when the

referenced column is updated, and rows will be deleted when the referenced rows are deleted.

�  3. SET DEFAULT: Column will be set to DEFAULT value when UPDATE/

DELETE is performed on referenced rows. �  4. NO ACTION: This is the default behaviour. If a DELETE/UPDATE is

executed on referenced rows, the operation is denied. An error is raised.

15

Page 16: Understanding about relational database m-square systems inc

Parent Table - RI �  A FOREIGN KEY in one table points to a PRIMARY KEY in

another table.

Consider the structure of the two tables as follows: Customers & Department table CREATE TABLE Department( BranchID Integer NOT NULL, Branch Name Varchar (20) NOT NULL, Branch Start-Date Date , PRIMARY KEY (BranchID) ); or To create a PRIMARY KEY constraint on the ”Bracnh ID" column when CUSTOMERS table already exists, use the following SQL syntax: �  ALTER TABLE CUSTOMER ADD PRIMARY KEY (ID);

16

Page 17: Understanding about relational database m-square systems inc

Parent-Child Table

17

Page 18: Understanding about relational database m-square systems inc

Parent-Child Relations (RI) CREATE TABLE Customer ( CustID Integer NOT NULL, Name Varchar (20) NOT NULL, AccNo Varchar (20)

Branchid Integer );

db2 alter table Customer add foreign key (Custid) references department on delete cascade

18

Page 19: Understanding about relational database m-square systems inc

Delete Rule ◦  Delete Rule indicates the rule for deleting from the child table when

a row in the parent table is deleted or updated. ◦  Cascade Delete All child rows are deleted when the parent row is

deleted. Cascade Set Null Foreign key columns are set to NULL when the parent row is deleted. ◦  Note: When you delete or update a row in a parent table for which a

Cascade Delete or Cascade Set Null rule is defined, the related rows in the child table will be adjusted appropriately, whether or not

explicitly included in the Access Definition or process.

Table Name � Table Name identifies the table affected by

the delete or update of parent rows.

19

Page 20: Understanding about relational database m-square systems inc

Normalization

� Normalization is a technique of organizing the data in a table. ◦ Mainly used for two purpose �  Eliminating redundant data �  Ensuring Data Dependencies i.e. logically stored. Problem without Normalization: It becomes difficult to handle and update the database without facing data loss.

20

Page 21: Understanding about relational database m-square systems inc

Normalization �  First Normal Form (1NF): ◦  No Two rows of data must contain repeating group of information. ◦  Each Set of column must have a unique value.

◦  Each row should have primary key (unique column)

�  In First normal form, any row must not have a column in which more than one value is saved, like separated commas. We should separate such data into multiple rows.

Student Age Subject

Adam 15 Biology, Mathematics

Alex 14 Mathematics

Stuart 16 Mathematics

21

Page 22: Understanding about relational database m-square systems inc

First Normal Form Student Age Subject

Adam 15 Biology

Adam 15 Mathematics

Alex 14 Mathematics

Stuart 17 Mathematics

Using First Normal Data Redundancy increases, many columns with same data in multiple rows.

22

Page 23: Understanding about relational database m-square systems inc

Second Normal Form

�  Second Normal Form must not have any partial dependency of any column on Primary key. Each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. Student Age

Adam 15

Alex 14

Stuart 17

23

Page 24: Understanding about relational database m-square systems inc

New Subject Table for 2NF:

Student Subject

Adam Biology

Adam Mathematics

Alex Mathematics

Stuart Mathematics

Both above tables qualifies for Second Normal Form. But in Second Normal form the updates and insertion may have few complex cases, updating in two places.

24

Page 25: Understanding about relational database m-square systems inc

Third Normal Form

� Non Prime Attribute of table must be dependent on primary key.

St-id St-Name DOB Add1 Add2 City State Zipcode

St-id St-Name DOB ZIP

ZIP Add1 Add2 City State

In the above table, street, city, state depends upon zip code and this is called as Transitive dependency. We need to apply 3NF to move the street, city and state to new table with zip as primary key.

25

Page 26: Understanding about relational database m-square systems inc

Normal Form

� Higher Normal Form: 3NF has its inadequacies, which leads to higher Normal form, such as Boyce/Codd Normal form, Fourth Normal Form (4NF)

26

Page 27: Understanding about relational database m-square systems inc

Schema

�  A schema is a collection of named database objects. �  Schemas provide a way to logically classify objects such as tables,

views, triggers, routines, or packages. �  A schema name is used as the first part of a table. �  A schema is itself a database object that is created using the

CREATE SCHEMA statement. The syntax of the CREATE SCHEMA statement is as follows:

�  CREATE SCHEMA { <schema-name> | AUTHORIZATION <authorization-name> |

<schema-name> AUTHORIZATION <authorization-name> } [ <schema-SQL-statement> ... ]o-part object name

27

Page 28: Understanding about relational database m-square systems inc

Performance

�  DB2 has a number of performance optimization capabilities that given the insight and ability to optimize workload execution.

�  These capabilities can save money and lower your risks by helping you to do more work with your existing hardware, ensure Service Level Agreements (SLAs) are met or exceeded and increase DBA productivity.

There are different types: �  Server Performance �  Database Performance �  Query Performance ◦  Index Scan ◦  Table scan ◦  Sorting ◦  Access Methods

28

Page 29: Understanding about relational database m-square systems inc

Naming Standards Database Naming Conventions:

�  Database object naming standards should be developed in conjunction with all other IT naming standards in your organization.

�  In all cases, database naming standards should

be developed in cooperation with the data administration department (if one exists) and, wherever possible, should peacefully coexist with other IT standards, but not at the expense of impairing the database environment.

29

Page 30: Understanding about relational database m-square systems inc

Data Definition Language (DDL)

The DDLs are: � Create � Drop � Rename

30

Page 31: Understanding about relational database m-square systems inc

Data Manipulation Language (DML)

The DMLs are: �  Select �  Insert � Delete � Update

31

Page 32: Understanding about relational database m-square systems inc

JOINS

The different types of joins are: �  Inner Join � Outer Join v  Left Outer Join v  Full Outer Join

32

Page 33: Understanding about relational database m-square systems inc

Inner Join

33

Page 34: Understanding about relational database m-square systems inc

Inner Join Example �  An inner join of A and B gives the result of A intersect B, i.e. the

inner part of a venn diagram intersection. �  An outer join of A and B gives the results of A union B, i.e. the

outer parts of a venn diagram union. ◦  Examples ◦  Suppose you have two Tables, with a single column each, and data as follows:

A B ◦  - - ◦  1 3 ◦  2 4 ◦  3 5 ◦  4 6

Note that (1,2) are unique to A, (3,4) are common, and (5,6) are unique to B.

◦  Inner join ◦  An inner join using either of the equivalent queries gives the intersection of the two tables,

i.e. the two rows they have in common. ◦  select * from a INNER JOIN b on a.a = b.b;

�  a | b �  --+-- �  3 | 3 �  4 | 4   34

Page 35: Understanding about relational database m-square systems inc

Left Outer Join

35

Page 36: Understanding about relational database m-square systems inc

Left Outer Join Example Left outer join �  A left outer join will give all rows in A, plus any

common rows in B. �  select * from a LEFT OUTER JOIN b on a.a = b.b; �  select a.*,b.* from a,b where a.a = b.b;

�  a | b �  ------- �  1 | null �  2 | null �  3 | 3 �  4 | 4

36

Page 37: Understanding about relational database m-square systems inc

FULL OUTER JOIN

37

Page 38: Understanding about relational database m-square systems inc

FULL OUTER JOIN EXAMPLE �  Full outer join �  A full outer join will give you the union of A and B, i.e. All the rows in A

and all the rows in B. If something in A doesn't have a corresponding datum in B, then the B portion is null, and vice versa.

�  select * from a FULL OUTER JOIN b on a.a = b.b;  �  a | b �  -----+----- �  1 | null �  2 | null �  3 | 3 �  4 | 4 �  null | 6 �  null | 5

38

Page 39: Understanding about relational database m-square systems inc

Column Selection � Specify only the columns needed � Avoid SELECT * � Extra columns increases row size of the result set � Retrieving very few columns can encourage index-only access

39

Page 40: Understanding about relational database m-square systems inc

Use For Fetch Only � When a SELECT statement is used

only for data retrieval - use FOR FETCH ONLY

� FOR READ ONLY clause provides the same function –

40

Page 41: Understanding about relational database m-square systems inc

Avoid Sorting

� DISTINCT -always results in a sort � UNION -always results in a sort � UNION ALL -does not sort, but

retains any duplicates

41

Page 42: Understanding about relational database m-square systems inc

SQL TUNING TIPS �  ORDER BY �  –may be faster if columns are indexed �  – use it to guarantee the sequence of the Data GROUP BY �  –specify only columns that need to be grouped �  –may be faster if the columns are indexed �  – do not include extra columns in SELECT list or GROUP BY because DB2 must sort the rows

42

Page 43: Understanding about relational database m-square systems inc

Indexes � Create indexes for columns you

frequently:�–ORDER BY

� –GROUP BY (better than a DISTINCT)

� –SELECT DISTINCT � –JOIN

43

Page 44: Understanding about relational database m-square systems inc

Join Predicates

� Response time -> determined mostly by the number of rows participating in the join

� Provide accurate join predicates � Never use a JOIN without a

predicate Join ON indexed columns.

� Use Joins over sub queries

44

Page 45: Understanding about relational database m-square systems inc

Use BETWEEN � BETWEEN is usually more efficient

than <= predicate and the >= predicate

45

Page 46: Understanding about relational database m-square systems inc

Use IN Instead of Like �  If you know that only a certain

number of values exist and can- be put in a list Use IN or BETWEEN

�  IN (‘ Value1’, ‘ Value2’, ‘ Value3’) � BETWEEN :valuelow

AND :valuehigh � – Rather than: � LIKE ‘ Value ’

46

Page 47: Understanding about relational database m-square systems inc

Avoid Percentage

� Avoid the % or at the beginning because it prevents DB2 from using matching index and may cause a table scan.

� Use the % or the at the end to encourage index usage

47

Page 48: Understanding about relational database m-square systems inc

Avoid NOT

� Predicates formed using NOT are not indexable

� For Subquery -when using negative logic:

� –Use NOT Exists

48

Page 49: Understanding about relational database m-square systems inc

Use EXISTS � Use EXISTS to test for a condition

and get a True or False returned by DB2 and not return any rows to the query:

� SELECT col1 FROM table1 � WHERE EXISTS �  (SELECT 1 FROM table2 � WHERE table2.col2 = table1.col1)

49

Page 50: Understanding about relational database m-square systems inc

Avoid Arithmetic in Predicates � An index is not used for a column

when the column is an arithmetic expression.

SELECT col1 FROM table1 � WHERE col2 = :hostvariable + 10

50

Page 51: Understanding about relational database m-square systems inc

Limit Scalar Function Usage � Scalar functions are not indexable � But you can use scalar functions to

offload work from the application program

� Examples: � –DATE functions � – SUBSTR � –CHAR � –etc.

51

Page 52: Understanding about relational database m-square systems inc

Other Cautions

� Predicates that contain concatenated columns are not indexable

� SELECT Count(*) can be expensive � CASE Statement -powerful but can

be expensive

52

Page 53: Understanding about relational database m-square systems inc

Difference between OLAP VS OLTP.

53

Page 54: Understanding about relational database m-square systems inc

Database Design Process

Steps in designing a Database: 1.  Determine the purpose of your database 2.  Determine the tables you need 3.  Determine the fields, data type, size and

primary/foreign key constraints required for each table.

4.  Determine the Relationships 5.  Refine your design

54

Page 55: Understanding about relational database m-square systems inc

Discussions & Scenarios Scenario: 1 SELECT Name, NVL (Salary, 0) FROM TBL_EMP WHERE Salary is NULL ORDER BY Name Question: What is displayed when the salary is NULL?

55

Page 56: Understanding about relational database m-square systems inc

Discussions & Scenarios … Scenario: 2 SELECT Name FROM TBL_EMP WHERE Name LIKE ‘_a%’ Question: Which names are displayed?

56

Page 57: Understanding about relational database m-square systems inc

Discussions & Scenarios … Scenario: 3 Which two relationships exist for patient and doctor if a patient can have many doctors, a doctor can have many patients and a doctor can have a patient? Scenario: 4 Which type of entity relationship exists between patient and doctor if a patient can have only one doctor but a doctor can have many patients? Note: Doctor cannot be a patient

57

Page 58: Understanding about relational database m-square systems inc

Discussions & Scenarios … Scenario: 5 List the employee names, their role, respective manager who are working under each manager group by Department? Scenario: 6 Write a query to analyze how long your orders be shipped from the date the order was placed. Create a report that should display customer number, order date, date shipped and the number of months in whole numbers from the time the order is placed to the time the order is shipped.

58

Page 59: Understanding about relational database m-square systems inc

SELECT Customer_ID, Order_Dt, Ship_Dt, ROUND(MONTHS_BETWEEN(Ship_Dt, Order_Dt)) as “Days Taken” FROM TBL_Order

59

Page 60: Understanding about relational database m-square systems inc

Getting Started or Support –

Muthu Natarajan [email protected]

www.msquaresystems.com Phone: 703-222-5500/212-941-6000

60