dbmsmis1

Upload: dheeman-ghosh

Post on 06-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 DBMSmis1

    1/56

    Data

    Data is a collection of facts, such as values or measurements.

    It can be numbers, words, characters, symbols,measurements, observations or even just descriptions ofthings.

    Data is the lowest level of abstraction, information is the nextlevel, and finally, knowledge is the highest level among allthree

    Data on its own carries no meaning. For data to becomeinformation, it must be interpreted and take on a meaning by

    a human or machine.

  • 8/3/2019 DBMSmis1

    2/56

    Why data matters

    As organizations continue to struggle to maintain competitiveadvantage, information becomes the key component inenabling executives and decision makers to make informeddecisions based on a 360-degree view of the organizationand its various operational processes.

  • 8/3/2019 DBMSmis1

    3/56

    Data files

    Each application generates a specific file type Read by an identical application produced by the

    same vendor. Some applications do have import and export

    facilities to allow a range of different formats tobe produced or read,

    The specific issues with any data file relate tothe following:

    -Version number of the application-Structure of data

    e.g. student data file in an institute

  • 8/3/2019 DBMSmis1

    4/56

    Data Processing

    Data processing is the act of handling or manipulating data insome fashion.Regardless of the activities involved in it,

    processing tries to assign meaning to data. Thus, the ultimate

    goal of processing is to transform data into information.

  • 8/3/2019 DBMSmis1

    5/56

    Information

    Knowledge derived from study, experience (bythe senses), or instruction.

    Communication of intelligence.

    "Information is any kind of knowledge that isexchangeable amongst people, about things,facts, concepts, etc., in some context." *

    "Information is interpreted data" (Data operatedin such a way as to display information)

    e.g. if student is new to institute or not

  • 8/3/2019 DBMSmis1

    6/56

    Why Information?

    Information is critical

    Information is a resource

    -It is scarce

    -It has a cost

    -It has alternative uses

    -cost factor involved if one does not process

    information Ensure effective and efficient decision making

    leading to prosperity of organization

  • 8/3/2019 DBMSmis1

    7/56

    DATA

    INFORMATION

    KNOWLEDGE

    Levels of Abstraction of Data , Information and

    Knowledge

    "Information is

    interpreted data"

    Raw Facts

    Knowledge derived fromstudy, experience (by

    the senses)

  • 8/3/2019 DBMSmis1

    8/56

    Qualitative vs Quantitative Data

    Data can be qualitative or quantitative.

    Qualitative data is descriptive information (it describes something).

    Quantitative data, is numerical information (numbers).

  • 8/3/2019 DBMSmis1

    9/56

    Variables

    Variables hold or store DataBasic Types of variables

    Logical

    Numeric - Integer, Float String or Text Variable

    Mixed Variables (Data Structures)

    - Complex data structures. To store records ofmixed types.Eg: AXEMP001 (Alphanumeric), Sailesh Singh (Text),20,000 (Numeric)

    10(Numeric)

  • 8/3/2019 DBMSmis1

    10/56

    Data files

    Data Storage Flat files Data Base Management Systems

    Flat files

    Plain text fileBefore 1960s, when the concept of DBMS

    was not there, flat text files were used as

    databases, and programmers wroteprograms to store or retrieve data indata files

  • 8/3/2019 DBMSmis1

    11/56

    Advantages of files as databases

    Cheap - Using a flat file database costs practically nothing because data is storedas text files. No software is required other than the program that needs to accessthe data.

    Platform Independent - Since text files are universally accepted by all serverplatforms, there is no problem moving your database from one server to another.

    Very Simple to Understand - Records in a flat file are stored in one straight lineand are separated by delimiters.

  • 8/3/2019 DBMSmis1

    12/56

    Disadvantages of Using FlatFiles

    : Low Security - No security feature is built into a textfile. It can be opened for viewing by anyone whohappens to know where to look.

    Data Redundancy- Duplication of same data indifferent files.- Wastage of storage space, since duplicated data isstored.-Errors may be generated due to updating of the same

    data in different files.-Time in entering data again and again is wasted.-Computer Resources are needlessly used.-It is very difficult to combine information

  • 8/3/2019 DBMSmis1

    13/56

    Disadvantages of Using FlatFiles

    Data Inconsistency- Conflicting data in files.

    (Example)

    Suppose that in STUDENT file it is indicated that Roll no= 10

    has opted for 'Computer course but in RESULT file it isindicated that Roll No. =10 has opted for 'Accounts' course.

    Low Reliability & Integrity - Flat files are very prone to data

    corruption especially if the size of the database grows beyondwhat the server resources are prepared to handle.

  • 8/3/2019 DBMSmis1

    14/56

    Disadvantages of Using FlatFiles

    Limited Data Structuring - As mentionedpreviously, records are stored as lines of text.This does not offer the flexibility of creating"relationships" between data whether within oneflat file or across several.

    Difficult to Integrate with Other Programs -Once a flat file is created for use by oneprogram, it is impossible to have anotherprogram use it. This is because the succeedingprograms need to conform to the structure of theflat file

  • 8/3/2019 DBMSmis1

    15/56

    What Is a DBMS?

    Database - A very large, integrated collection

    of data or facts. E.g.The information in aphone book is an example of a database.The database is the information stored on thepages of the book, not the book itself

    A Database Management System (DBMS)isa software package designed to store andmanage databases. Typical examples of

    DBMSs include Oracle, Microsoft Access,

  • 8/3/2019 DBMSmis1

    16/56

    Advantage of DatabaseTechnology

    Redundancy controlled (normalization)Efficientdata processing and storage

    Data integrity and avoid inconsistencies

    Integrity constraints Sharing data by many applications- Good fordecision support system

    Data security

    Standards can be enforced in datarepresentation, naming of variables anddocumentation

  • 8/3/2019 DBMSmis1

    17/56

    Advantage of DatabaseTechnology

    Data centralization- Shared by manydepartments

    Data independence-Changes in structureof data files do not affect applicationprogram

  • 8/3/2019 DBMSmis1

    18/56

    Disadvantage of DatabaseTechnology

    Complex- Database administrator requiredfor maintenance

    Costly to purchase and install

    Since it is centralized high impact onorganization because of failure

  • 8/3/2019 DBMSmis1

    19/56

    Structure of a DBMS

    A typical DBMS has alayered architecture.

    The figure does not

    show the concurrencycontrol and recoverycomponents.

    This is one of several

    possible architectures;each system has itsown variations.

    Query Optimizationand Execution

    Relational Operators

    Files and Access Methods

    Buffer Management

    Disk Space Management

    DB

    These layers

    must consider

    concurrency

    control andrecovery

  • 8/3/2019 DBMSmis1

    20/56

    Department Technician

    Employees Equipment

    Maintenance

    Records

    RDBMS

    Model

  • 8/3/2019 DBMSmis1

    21/56

    Motivation: Why databasemanagementsystems?

    Database management systems (DBMSs)are very good at organizing and managinglarge collections of persistent data.

    e.g. finding a particular book in a typicaluniversity library if the library does notkeep the books arranged in any particular

    order or if the library has no indexes.

  • 8/3/2019 DBMSmis1

    22/56

    Motivation: Why databasemanagementsystems?

    Using a big collection of unorganizedthings is practically impossible. Structureturns data into information.

    Persistencemeans that the data existpermanently; they do not disappear whenthe computer is shut off.

  • 8/3/2019 DBMSmis1

    23/56

    Motivation: Why databasemanagementsystems?

    Shift from computationto information

    at the low end? scramble to webspace (amess!)

    at the high end? scientific applications

    Datasets increasing in diversity and volume.

    Digital libraries, interactive video, Human

    Genome project.

  • 8/3/2019 DBMSmis1

    24/56

    Motivation: Why databasemanagementsystems?

    DBMSs : data all in one place and easy to get to.

    DBMSs help protect data from unauthorizedaccess

    DBMSs help protect data from accidentalcorruption or loss due to:

    -hardware failures such as power outages and

    computer crashes-software failures such as operating systemcrashes

  • 8/3/2019 DBMSmis1

    25/56

    Motivation: Why relational databasemanagement systems?

    Concurrency Control

    DBMSs allow concurrent access, meaning that asingle data set can be accessed by more than

    one user at a time virtually all commercial database applications

    require the data entry staff to have access to thedatabase simultaneously. E.g. an airlinereservation system cannot restrict access to thedatabase to a single travel agent.

  • 8/3/2019 DBMSmis1

    26/56

    Motivation: Why relational databasemanagement systems?

    These problems can cause the databaseto be corrupted or for a users interfaceprogram to never complete its query.

    e.g. if there are no traffic lights or stopsigns -chaos

    RDBMSs provide mechanisms to prevent

    concurrent access problems; thesemechanisms are collectively calledconcurrency control.

  • 8/3/2019 DBMSmis1

    27/56

    Motivation: Why relational databasemanagement systems?

    Concurrent data access introducesunwanted problems caused by two usersmanipulating exactly the same data at

    exactly the same time.

    Logical data independence: Protectionfrom changes in logicalstructure of data.

    Physical data independence: Protectionfrom changes in physicalstructure of data.

  • 8/3/2019 DBMSmis1

    28/56

    Distributed RDBMS

    A distributedDBMS allows a single database tobe split apart such that its pieces reside atgeographically separated sites.

    this can provide performance improvements byeliminating transmitting the data across arelatively slow long distance communicationchannel (its a lot faster to have the database onhard drive than to access it across an Ethernetor via a modem)

    this can reduce concurrency control problems bygiving each user that part of the database whichthey need rather than having all the userscompete for access to the whole database

  • 8/3/2019 DBMSmis1

    29/56

    RDBMS characteristics

    RDBMSs are not necessarily meant for dataanalysis; that is more the job of a spread sheetor some other special-purpose analysis tool.

    RDBMSs are general-purpose tools. It isbasically irrelevant to the DBMS what is storedwithin it. Software design principles suggest de-coupling domain specific analysis packagesfrom the DBMS to keep the division of laborclear.

    RDBMSs are very good at retrieving a relativelysmall portion of the database and passing italong for detailed analysis by a tool designed forthat purpose.

  • 8/3/2019 DBMSmis1

    30/56

    RDBMS characteristics

    RDBMSs often allow integrity constraints to be imposedon the data to insure validity and consistency. When anintegrity constraint applies to a table, all data in the tablemust conform to the corresponding rule.

    E.g. TABLE Dept .ADD PRIMARY KEY (Deptno); Then,create a rule that every department listed in theemployee table must match one of the values in thedepartment table: alter table Emp ADD FOREIGN KEY(Deptno) REFERENCES Dept_tab (Dept no); When you

    add a new employee record to the table, automaticcheck that its department number appears in thedepartment table

  • 8/3/2019 DBMSmis1

    31/56

    Referential Integrity Rules

    A rule defined on a key (a column or set of columns) inone table that guarantees that the values in that keymatch the values in a key in a related table (thereferenced value).

    Referential integrity also includes the rules that dictatewhat types of data manipulation are allowed onreferenced values and how these actions affectdependent values. The rules associated with referentialintegrity are:

    Restrict: Disallows the update or deletion of referenceddata.

    Set to Default: When referenced data is updated ordeleted, all associated dependent data is set to a default

    value.

  • 8/3/2019 DBMSmis1

    32/56

    Referential integrity rules

    Cascade: When referenced data isupdated, all associated dependent data iscorrespondingly updated. When a

    referenced row is deleted, all associateddependent rows are deleted

  • 8/3/2019 DBMSmis1

    33/56

    Data integrity constraints

    Null Rule0A null is a rule defined on a single column that allows ordisallows inserts or updates of rows containing a null(the absence of a value) in that column.

    Unique Column Values- A unique value defined on a column (or set of columns)

    allows the insert or update of a row only if it contains aunique value in that column (or set of columns).

    Primary Key Values-A primary key value defined on a key (a column or set of

    columns) specifies that each row in the table can beuniquely identified by the values in the key.

  • 8/3/2019 DBMSmis1

    34/56

    Other integrity constraints

    Validation rules e.g.This integrity constraintenforces the rule that no row in this table cancontain a numeric value greater than 10,000 inthis column. If an INSERT or UPDATE

    statement attempts to violate this integrityconstraint, then returns an error message.

    CHECK Integrity Constraints

    ACHECK

    integrity constraint on a column or set ofcolumns requires that a specified condition betrue or unknown for every row of the table..Usually Boolean expression evaluated using thevalues in the row being inserted or updated.

  • 8/3/2019 DBMSmis1

    35/56

    Levels of Abstraction in DBMS

    Many views, single conceptual(logical)schemaand physical

    schema.

    Views describe how users

    see the data-Filedescription,recorddescription

    Conceptual schema defineslogical structure

    Physical schemahowcomputer views data onsecondary device

    Physical Schema

    Conceptual Schema

    View 1 View 2 View 3

    Disk

  • 8/3/2019 DBMSmis1

    36/56

    Summary

    DBMS used to maintain, query large datasets. Benefits include recovery from system crashes,

    concurrent access, quick applicationdevelopment, data integrity and security.

    Levels of abstraction give data independence.

    A DBMS typically has a layered architecture.

  • 8/3/2019 DBMSmis1

    37/56

    Fundamental Concepts and Terminology

    Data are facts. Some facts are more importantto than others. Some facts are importantenough to warrant keeping track of them in aformal, organized way.

    Data" is a broad concept that can include thingssuch as pictures (binary images), programs, andrules. Informally, dataare the things you want tostore in a database

    Data mining: applied to large volumes of data todiscover trends and patterns.

  • 8/3/2019 DBMSmis1

    38/56

    Metadata

    Meta means "about," so metadata is"about data," or, more specifically,"information about data." Metadata thatdescribes the fields and formats of

    databases and data warehouses. Databasecontains fields such as Name, Address,City, and so on. Metadata names thesefields, describes the size of the fields, and

    may put restrictions on what can go in thefield ( data schema) (for example,numbers only).

  • 8/3/2019 DBMSmis1

    39/56

    Data Repository

    A repository is a structure that stores andprotects data. (Database+metadata)

    Repositories provide the following

    functionality: add (insert) data to the repository

    retrieve (find, select) data in the repository

    delete data from the repository Some repositories allow data to be

    changed, to be updated.

  • 8/3/2019 DBMSmis1

    40/56

    Data Warehouse

    Central repository of all data which anorganizations various business systems

    collect.e.g. financial data used for

    planning,marketing, contracting anddecision-making

  • 8/3/2019 DBMSmis1

    41/56

    Data Repository

    Repositories are like a bank vault. They existmainly to protect their contents from theft andaccidental destruction.

    Security: repositories are typically passwordprotected, many have much more elaboratesecurity mechanisms.

    Robustness: Accidental data loss is safeguardedagainst via the transactionmechanism.

    A transactionis a sequence of databasemanipulation operations.

    Data warehouse is the main repository of an

    organization's historical data -management's

  • 8/3/2019 DBMSmis1

    42/56

    Queries

    Many DBMSs provide a user interface consistingof some sort of formal language.

    A data definition language(DDL) is used to

    specify which data will be stored in the databaseand how they are related. E.g. create table ordrop table

    A data manipulation language(DML) is used

    to add, retrieve, update, and delete data in theDBMS.

  • 8/3/2019 DBMSmis1

    43/56

    Queries

    A queryis often taken as a statement or groupof statements in either a DDL or a DML orboth. Some researchers view queries as read-

    only operations, no data modifications areallowed e.g. Codd

    A query languageis a formal language thatimplements a DDL, a DML, or both. Examples

    of query languages include SQL (StructuredQuery Language),

  • 8/3/2019 DBMSmis1

    44/56

    Database report

    A database report presents informationretrieved from a table or query in apreformatted, attractive manner. Reporting

    Services uses a SQL Server database forinternal storage. Microsoft Access can beused to create non-interactive HTML

    reports. This is the easiest way to presentdatabase information on the Web.

    D t M d l

  • 8/3/2019 DBMSmis1

    45/56

    Data Models

    A data modelis mathematical formalismconsisting of two

    A notation for describing data, and

    A set of operations used to manipulatethat data.

    A data model is a way of organizing a

    collection of facts pertaining to a systemunder investigation.

  • 8/3/2019 DBMSmis1

    46/56

    Data models

    Different models provide differentconceptualizations of the world; they havedifferent outlooks and different

    perspectives. There is no universally agreed upon best

    data model. The most common ones are

    presented

  • 8/3/2019 DBMSmis1

    47/56

    Overview of Database Design

    Entity-Relationship Model

    The ER model envisions the world ascomprised of entitiesthat are associated

    with each other by relationships. All ofthe entities of a particular type arecollected together into entity sets. Anentity-relationship model (ERM) is anabstract conceptual representation ofstructured data;

  • 8/3/2019 DBMSmis1

    48/56

    Overview of Database Design

    What are the entitiesand relationshipsin theenterprise?

    What information about these entities and

    relationships should we store in the database? What are the integrity constraintsthat hold?

    A database `schema?in the ER Model can berepresented pictorially (ER diagrams).

    Can map an ER diagram into a relationalschema.

  • 8/3/2019 DBMSmis1

    49/56

    Entities

    Entities are distinguishable real-world objectssuch as employees, maps, airplanes, or busschedules.

    -Distinguishable means that all entities can beuniquely identified.

    -Entities have common attributes that definewhat it means to be such an entity.

    -For any given real-world object, differentmodelers can choose different sets of attributesof the object that are of interest to their particularsituation.

  • 8/3/2019 DBMSmis1

    50/56

    Relationship

    A relationship Association among two or more entities.An association is a business component that defines arelationship between two entity objects based oncommon attributes. Relationship Set: Collection ofsimilar relationships.

    Notation: two entity sets A and Bthat stand inrelationship r is written A rB.

    Types of Relationship

    One-One: if A rBand r is one-one then each entity of B

    is in relationship with at most one entity of A and vice-versa. e.g. if CAPTAINcommands VESSEL andcommands is one-one then, in model, each vessel hasat most one captain and each captain commands atmost one vessel at a time.

  • 8/3/2019 DBMSmis1

    51/56

    Types of Relationship

    Many-one : if A rBand r is many-one then eachentity of A is in relationship with at most oneentity of Bbut not vice-versa. e.g. if CREWassigned-toVESSEL and assigned-to is

    many-one then, in this model, a vessel hasmany crew members but a crew member isassigned to only one vessel.

    Many-many if A r Band r is many-many then

    each entity of A can be in relationship with anynumber of Bentities and vice-versa. if VESSELpatrols REGIONand patrols is many-manythen, in our model, a vessel patrols many

    regions and a region is patrolled by many ships.

  • 8/3/2019 DBMSmis1

    52/56

    ER model

    isa -relationships: if A isa Bthen A is aspecialization of B, or, conversely, Bis ageneralization of A.

    For example, if CAPTAINisa CREWthen,in model, captains have all the attributesof crew members but not vice versa.

    The isa relationship allows hierarchies tobe established among entity sets.

  • 8/3/2019 DBMSmis1

    53/56

    ER model basics

    Consider Works: An employee can workin many departments; a dept can havemany employees.(one-many)

    In contrast, each dept has at most onemanager, according to the key constrainton Manages.

  • 8/3/2019 DBMSmis1

    54/56

    Primary and foreign key

    Primary and Foreign key constraints are andwhat they are used for:

    Primary Key:

    A primary key is a field or combination of fieldsthat uniquely identify a record in a table, so thatan individual record can be located withoutconfusion.

    Foreign Key: A foreign key (sometimes called a referencing

    key) is a key used to link two tables together.Typically you take the primary key field from one

    table and insert it into the other table where it

    Primary and foreign key

  • 8/3/2019 DBMSmis1

    55/56

    Primary and foreign keyconstraints

    primary key constraint is a rule that says thatthe primary key fields cannot be null and cannotcontain duplicate data.

    A foreign key constraint specifies that the datain a foreign key must match the data in theprimary key of the linked table.This system iscalled referential integrity, it is to ensure that

    the data entered is correct and not orphaned(i.e. there are no broken links between data inthe tables)

  • 8/3/2019 DBMSmis1

    56/56

    RDBMS

    A relational database management systemisa DBMS based on the relational model asdefined by Codd.

    There is no commercially available DBMS thatfully implements the relational model as definedby (Codd 1990). .

    Advantages of the Relational Model

    -queries can be automatically compiled,executed, and optimized without resorting toprogramming

    -correctness: the semantics of the relational