understanding about relational database m-square systems inc
Post on 12-Jul-2015
127 Views
Preview:
TRANSCRIPT
Understanding about Relational
DATABASE – IBM DB2
Knowledge Sharing Session
October27, 2014
Presented By: Muthukumaran Natarajan
1
Introduction to Relation Database Management System (RDMS) � Relationship database is developed with a collection of one or more relation and
Data � Leading (commercial) manufacturers of relational DB-products:
ü Oracle ü Microsoft(MS-Access, SQL server) ü IBM(DB2 LUW & Z/os,Informix) ü Sybase
� A language called SQL (Structured Query Language) was developed to work with relational databases.
� All data is stored in the form of tables (relations) comprised of rows (records) and columns (fields).
2
INTRODUCTION TO RELATIONSHIP DATABASE – DB2
3
RELATIONSHIP DATABASE
A well-designed database should:
ü Eliminate Data Redundancy: the same piece of data shall not be stored in more than one place. This is because duplicate data not only waste storage spaces but also easily lead to inconsistencies & Performance issues.
ü Ensure Data Integrity and Accuracy.
4
There are several types of database relationships.
ü One to One Relationships ü One to Many and Many to One Relationships ü Many to Many Relationships ü Self Referencing Relationships
5
ONE TO MANY RELATIONSHIP
� A one-to-many (1:m) relationship is where, for each instance of table A, many instances of the table B exist, but for each instance of table B, only once instance of table A exists.
� For example: ◦ Each artist, there are many paintings. Since it
is a one-to-many relationship, and not many-to-many.
6
Continued.. author(id,otherAttributtes) books(id,authorid,otherAttributes) Or author(id,otherAttributtes) books(id,otherAttributes) authorConnectsBooks(authorid,booksid)
7
Examples one to Many relationships This is the most commonly used relationship
type: Consider an e-commerce website, with the following:
� Customers can make many orders. � Orders can contain many items. � Items can have descriptions in many
languages.
8
Relationship & Understanding One-to-many relationships ◦ The most common relationship used when creating relational
databases. A row in a table in a database can be associated with one or (likely) more rows in another table. An example of a one-to-many relationship is a single order has many items on that order. And since relationships work both ways it is not uncommon to hear reference to many-to-one-relationships as well.
One-to-one relationship ◦ A row in a table is associated to one and only one row in another table. An
example of a one-to-one relationship is a person can have one social security number and a social security number can only be assigned to one person.
◦ In most cases there is no need for a one-to-one relationship as the contents of the two tables can be combined into one table.
Many-to-many relationships ◦ When one or more rows in a table are associated with one or more rows in
another table. An example of a many-to-many relationship is a table of customers who can purchase many different products and a table of products that can be purchased by many different customers.
9
CONSTRAINTS AND KEYS
10
PRIMARY KEY
� A primary key is a field in a table which uniquely identifies each row/record in a database table.
� Primary keys must contain unique values.
� A primary key column cannot have NULL values.
� For example, an unique number customerID can be used as the primary key for the Customers table.
11
Types of Constraints � The following types of constraints are
available:
ü NOT NULL constraints
ü Unique (or unique key) constraints
ü Primary key constraints
ü Foreign key (or referential integrity) constraints
ü (Table) Check constraint
12
Referential integrity
� Referential integrity is a relational database concept in which multiple tables share a relationship based on the data stored in the tables, and that relationship must remain consistent.
� Referential integrity enforces the following rules:
13
� Referential Integrity Rule: Each foreign key value must be matched to a primary key value in the table referenced (or parent table). ◦ You can insert a row with a foreign key in the child table only if the value
exists in the parent table. ◦ If the value of the key changes in the parent table (e.g., the row updated or
deleted), all rows with this foreign key in the child table(s) must be handled accordingly. You could either (a) disallow the changes; (b) cascade the change (or delete the records) in the child tables accordingly; (c) set the key value in the child tables to NULL.
◦ Most RDBMS can be setup to perform the check and ensure the referential integrity, in the specified manner.
� Business logic Integrity: Beside the above two general integrity rules, there could be integrity (validation) pertaining to the business logic, e.g., zip code shall be 5-digit within a certain range, delivery date and time shall fall in the business hours; quantity ordered shall be equal or less than quantity in stock, etc. These could be carried out in validation rule (for the specific column) or programming logic.
14
SET RULES –RI Types Cascading can be defined for UPDATE and DELETE. There are
four different options available: � 1. SET NULL: This action specifies that the column will be set to NULL
when the referenced column is updated/deleted. � 2. CASCADE: CASCADE specifies that the column will be updated when the
referenced column is updated, and rows will be deleted when the referenced rows are deleted.
� 3. SET DEFAULT: Column will be set to DEFAULT value when UPDATE/
DELETE is performed on referenced rows. � 4. NO ACTION: This is the default behaviour. If a DELETE/UPDATE is
executed on referenced rows, the operation is denied. An error is raised.
15
Parent Table - RI � A FOREIGN KEY in one table points to a PRIMARY KEY in
another table.
Consider the structure of the two tables as follows: Customers & Department table CREATE TABLE Department( BranchID Integer NOT NULL, Branch Name Varchar (20) NOT NULL, Branch Start-Date Date , PRIMARY KEY (BranchID) ); or To create a PRIMARY KEY constraint on the ”Bracnh ID" column when CUSTOMERS table already exists, use the following SQL syntax: � ALTER TABLE CUSTOMER ADD PRIMARY KEY (ID);
16
Parent-Child Table
17
Parent-Child Relations (RI) CREATE TABLE Customer ( CustID Integer NOT NULL, Name Varchar (20) NOT NULL, AccNo Varchar (20)
Branchid Integer );
db2 alter table Customer add foreign key (Custid) references department on delete cascade
18
Delete Rule ◦ Delete Rule indicates the rule for deleting from the child table when
a row in the parent table is deleted or updated. ◦ Cascade Delete All child rows are deleted when the parent row is
deleted. Cascade Set Null Foreign key columns are set to NULL when the parent row is deleted. ◦ Note: When you delete or update a row in a parent table for which a
Cascade Delete or Cascade Set Null rule is defined, the related rows in the child table will be adjusted appropriately, whether or not
explicitly included in the Access Definition or process.
Table Name � Table Name identifies the table affected by
the delete or update of parent rows.
19
Normalization
� Normalization is a technique of organizing the data in a table. ◦ Mainly used for two purpose � Eliminating redundant data � Ensuring Data Dependencies i.e. logically stored. Problem without Normalization: It becomes difficult to handle and update the database without facing data loss.
20
Normalization � First Normal Form (1NF): ◦ No Two rows of data must contain repeating group of information. ◦ Each Set of column must have a unique value.
◦ Each row should have primary key (unique column)
� In First normal form, any row must not have a column in which more than one value is saved, like separated commas. We should separate such data into multiple rows.
Student Age Subject
Adam 15 Biology, Mathematics
Alex 14 Mathematics
Stuart 16 Mathematics
21
First Normal Form Student Age Subject
Adam 15 Biology
Adam 15 Mathematics
Alex 14 Mathematics
Stuart 17 Mathematics
Using First Normal Data Redundancy increases, many columns with same data in multiple rows.
22
Second Normal Form
� Second Normal Form must not have any partial dependency of any column on Primary key. Each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. Student Age
Adam 15
Alex 14
Stuart 17
23
New Subject Table for 2NF:
Student Subject
Adam Biology
Adam Mathematics
Alex Mathematics
Stuart Mathematics
Both above tables qualifies for Second Normal Form. But in Second Normal form the updates and insertion may have few complex cases, updating in two places.
24
Third Normal Form
� Non Prime Attribute of table must be dependent on primary key.
St-id St-Name DOB Add1 Add2 City State Zipcode
St-id St-Name DOB ZIP
ZIP Add1 Add2 City State
In the above table, street, city, state depends upon zip code and this is called as Transitive dependency. We need to apply 3NF to move the street, city and state to new table with zip as primary key.
25
Normal Form
� Higher Normal Form: 3NF has its inadequacies, which leads to higher Normal form, such as Boyce/Codd Normal form, Fourth Normal Form (4NF)
26
Schema
� A schema is a collection of named database objects. � Schemas provide a way to logically classify objects such as tables,
views, triggers, routines, or packages. � A schema name is used as the first part of a table. � A schema is itself a database object that is created using the
CREATE SCHEMA statement. The syntax of the CREATE SCHEMA statement is as follows:
� CREATE SCHEMA { <schema-name> | AUTHORIZATION <authorization-name> |
<schema-name> AUTHORIZATION <authorization-name> } [ <schema-SQL-statement> ... ]o-part object name
27
Performance
� DB2 has a number of performance optimization capabilities that given the insight and ability to optimize workload execution.
� These capabilities can save money and lower your risks by helping you to do more work with your existing hardware, ensure Service Level Agreements (SLAs) are met or exceeded and increase DBA productivity.
There are different types: � Server Performance � Database Performance � Query Performance ◦ Index Scan ◦ Table scan ◦ Sorting ◦ Access Methods
28
Naming Standards Database Naming Conventions:
� Database object naming standards should be developed in conjunction with all other IT naming standards in your organization.
� In all cases, database naming standards should
be developed in cooperation with the data administration department (if one exists) and, wherever possible, should peacefully coexist with other IT standards, but not at the expense of impairing the database environment.
29
Data Definition Language (DDL)
The DDLs are: � Create � Drop � Rename
30
Data Manipulation Language (DML)
The DMLs are: � Select � Insert � Delete � Update
31
JOINS
The different types of joins are: � Inner Join � Outer Join v Left Outer Join v Full Outer Join
32
Inner Join
33
Inner Join Example � An inner join of A and B gives the result of A intersect B, i.e. the
inner part of a venn diagram intersection. � An outer join of A and B gives the results of A union B, i.e. the
outer parts of a venn diagram union. ◦ Examples ◦ Suppose you have two Tables, with a single column each, and data as follows:
A B ◦ - - ◦ 1 3 ◦ 2 4 ◦ 3 5 ◦ 4 6
Note that (1,2) are unique to A, (3,4) are common, and (5,6) are unique to B.
◦ Inner join ◦ An inner join using either of the equivalent queries gives the intersection of the two tables,
i.e. the two rows they have in common. ◦ select * from a INNER JOIN b on a.a = b.b;
� a | b � --+-- � 3 | 3 � 4 | 4 34
Left Outer Join
35
Left Outer Join Example Left outer join � A left outer join will give all rows in A, plus any
common rows in B. � select * from a LEFT OUTER JOIN b on a.a = b.b; � select a.*,b.* from a,b where a.a = b.b;
� a | b � ------- � 1 | null � 2 | null � 3 | 3 � 4 | 4
36
FULL OUTER JOIN
37
FULL OUTER JOIN EXAMPLE � Full outer join � A full outer join will give you the union of A and B, i.e. All the rows in A
and all the rows in B. If something in A doesn't have a corresponding datum in B, then the B portion is null, and vice versa.
� select * from a FULL OUTER JOIN b on a.a = b.b; � a | b � -----+----- � 1 | null � 2 | null � 3 | 3 � 4 | 4 � null | 6 � null | 5
38
Column Selection � Specify only the columns needed � Avoid SELECT * � Extra columns increases row size of the result set � Retrieving very few columns can encourage index-only access
39
Use For Fetch Only � When a SELECT statement is used
only for data retrieval - use FOR FETCH ONLY
� FOR READ ONLY clause provides the same function –
40
Avoid Sorting
� DISTINCT -always results in a sort � UNION -always results in a sort � UNION ALL -does not sort, but
retains any duplicates
41
SQL TUNING TIPS � ORDER BY � –may be faster if columns are indexed � – use it to guarantee the sequence of the Data GROUP BY � –specify only columns that need to be grouped � –may be faster if the columns are indexed � – do not include extra columns in SELECT list or GROUP BY because DB2 must sort the rows
42
Indexes � Create indexes for columns you
frequently:�–ORDER BY
� –GROUP BY (better than a DISTINCT)
� –SELECT DISTINCT � –JOIN
43
Join Predicates
� Response time -> determined mostly by the number of rows participating in the join
� Provide accurate join predicates � Never use a JOIN without a
predicate Join ON indexed columns.
� Use Joins over sub queries
44
Use BETWEEN � BETWEEN is usually more efficient
than <= predicate and the >= predicate
45
Use IN Instead of Like � If you know that only a certain
number of values exist and can- be put in a list Use IN or BETWEEN
� IN (‘ Value1’, ‘ Value2’, ‘ Value3’) � BETWEEN :valuelow
AND :valuehigh � – Rather than: � LIKE ‘ Value ’
46
Avoid Percentage
� Avoid the % or at the beginning because it prevents DB2 from using matching index and may cause a table scan.
� Use the % or the at the end to encourage index usage
47
Avoid NOT
� Predicates formed using NOT are not indexable
� For Subquery -when using negative logic:
� –Use NOT Exists
48
Use EXISTS � Use EXISTS to test for a condition
and get a True or False returned by DB2 and not return any rows to the query:
� SELECT col1 FROM table1 � WHERE EXISTS � (SELECT 1 FROM table2 � WHERE table2.col2 = table1.col1)
49
Avoid Arithmetic in Predicates � An index is not used for a column
when the column is an arithmetic expression.
SELECT col1 FROM table1 � WHERE col2 = :hostvariable + 10
50
Limit Scalar Function Usage � Scalar functions are not indexable � But you can use scalar functions to
offload work from the application program
� Examples: � –DATE functions � – SUBSTR � –CHAR � –etc.
51
Other Cautions
� Predicates that contain concatenated columns are not indexable
� SELECT Count(*) can be expensive � CASE Statement -powerful but can
be expensive
52
Difference between OLAP VS OLTP.
53
Database Design Process
Steps in designing a Database: 1. Determine the purpose of your database 2. Determine the tables you need 3. Determine the fields, data type, size and
primary/foreign key constraints required for each table.
4. Determine the Relationships 5. Refine your design
54
Discussions & Scenarios Scenario: 1 SELECT Name, NVL (Salary, 0) FROM TBL_EMP WHERE Salary is NULL ORDER BY Name Question: What is displayed when the salary is NULL?
55
Discussions & Scenarios … Scenario: 2 SELECT Name FROM TBL_EMP WHERE Name LIKE ‘_a%’ Question: Which names are displayed?
56
Discussions & Scenarios … Scenario: 3 Which two relationships exist for patient and doctor if a patient can have many doctors, a doctor can have many patients and a doctor can have a patient? Scenario: 4 Which type of entity relationship exists between patient and doctor if a patient can have only one doctor but a doctor can have many patients? Note: Doctor cannot be a patient
57
Discussions & Scenarios … Scenario: 5 List the employee names, their role, respective manager who are working under each manager group by Department? Scenario: 6 Write a query to analyze how long your orders be shipped from the date the order was placed. Create a report that should display customer number, order date, date shipped and the number of months in whole numbers from the time the order is placed to the time the order is shipped.
58
SELECT Customer_ID, Order_Dt, Ship_Dt, ROUND(MONTHS_BETWEEN(Ship_Dt, Order_Dt)) as “Days Taken” FROM TBL_Order
59
Getting Started or Support –
Muthu Natarajan info@msquaresystems.com
www.msquaresystems.com Phone: 703-222-5500/212-941-6000
60
top related