l14_ppt_ivsem

Upload: rohit-tiwari

Post on 07-Jul-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/18/2019 L14_PPT_IVSem

    1/33

    Lecture- 14

    File tructures

    Sequential Files,

  • 8/18/2019 L14_PPT_IVSem

    2/33

    OO BJECTIVES BJECTIVES After reading this chapter, we should After reading this chapter, we should be able to: be able to:

    Understand theUnderstand the file access methodsfile access methods. .

    Describe the characteristics of aDescribe the characteristics of a sequential filesequential file. .

    Describe the characteristics of anDescribe the characteristics of an indexed fileindexed file. .

    Describe the characteristics of aDescribe the characteristics of a hashed filehashed file. .

    Distinguish between aDistinguish between a text filetext file and aand a binary file binary file. .

  • 8/18/2019 L14_PPT_IVSem

    3/33

    e

    treated as a unit.es are store n aux ary secon ary

    storage devices .DiskTapes

    A file is a collection of data records witheach record consisting of one or more fields .

  • 8/18/2019 L14_PPT_IVSem

    4/33

  • 8/18/2019 L14_PPT_IVSem

    5/33

    Figure 13-1 Taxonomy of file structures

    The access methoddetermines how records can be retrieved:sequen a y or ran om y .

    • One record after another,from beginning to end

    • Access one specific recordwithout having to retrieve all records before it

  • 8/18/2019 L14_PPT_IVSem

    6/33

    FILEFILEFILEFILE

  • 8/18/2019 L14_PPT_IVSem

    7/33

    Figure 13-2 Sequential file

    Sequential file – y u y,

    one after another, from beginning to end.

  • 8/18/2019 L14_PPT_IVSem

    8/33

    While Not EOF . .

    Read the next record

    }

  • 8/18/2019 L14_PPT_IVSem

    9/33

    pp ca ons –

    that need to access all records from beginning to end.

    Because you have to process each record,se uential access is more efficient and easier thanrandom access.

    Sequential file is not efficient for random access.

  • 8/18/2019 L14_PPT_IVSem

    10/33

    U datin se uential files sequential files must be updated periodically toreflect chan es in information.

    The updating process –all of the records need to be checked and updated.

    New Master File

    Old Master FileTransaction File –contains changes to be applied to the master file.

    Add transaction

    e e e ransac onChange transaction A key is one or more fields that uniquely identify the data in the file.

  • 8/18/2019 L14_PPT_IVSem

    11/33

  • 8/18/2019 L14_PPT_IVSem

    12/33

    U datin se uential files To make updating process efficient, all files are

    .The update process requires that you compare :

    .< : add transaction to new master = :

    Change content of master file data (transaction code = R(revise) )Remove data from master file (transaction code = D(delete) )

    > : wr e o mas er e recor o new mas er e(transaction code = A(add) )

  • 8/18/2019 L14_PPT_IVSem

    13/33

    Updating process

  • 8/18/2019 L14_PPT_IVSem

    14/33

    FILEFILEFILEFILE

  • 8/18/2019 L14_PPT_IVSem

    15/33

    Ma in in an indexed file

    To access a record in a file randoml ,

    you need to know the address of the record. An index file can relate the key to the record address .

  • 8/18/2019 L14_PPT_IVSem

    16/33

    Indexed files An index file is made of a data file , which is a sequential file,and an index .

    Index – a small file with only two fields:The key of the sequential file .

    To access a record in the file :1. Load the entire index file into main memory.2. earc e n ex e o n e es re ey.3. Retrieve the address the record.4. Retrieve the data record. (using the address)

    Inverted file –ou can have more than one index each with a different ke .

  • 8/18/2019 L14_PPT_IVSem

    17/33

    nver e e

    A file that reorganizes the structure of an existing data file to enablea rapid search to be made for all records having one field fallingwithin set limits.

    For example, a file used by an estate agent might store records oneach house for sale, using a reference number as the key field forsorting . One field in each record would be the asking price of the

    ouse. o spee up e process o raw ng up s s o ouses a ngwithin certain price ranges, an inverted file might be created in whichthe records are rearranged according to price. Each record wouldconsist of an asking price, followed by the reference numbers of all

    .

  • 8/18/2019 L14_PPT_IVSem

    18/33

    Logical view of an indexed file

  • 8/18/2019 L14_PPT_IVSem

    19/33

    FILEFILEFILEFILE

  • 8/18/2019 L14_PPT_IVSem

    20/33

    Ma in in a hashed file A hashed file uses a hash function to map the key to theaddress .

    Eliminates the need for an extra file ( index ).There is no need for an index and all of the overhead.

  • 8/18/2019 L14_PPT_IVSem

    21/33

    Hashin methodsDirect Hashing –the key is the address without any algorithmic manipulation.

    Modulo Division Hashing – ( Division remainder hashing )

    use the remainder plus 1 for the address .

    Digit Extraction Hashing –selected digits are extracted from the key and used as theaddress .

  • 8/18/2019 L14_PPT_IVSem

    22/33

    Direct hashing

    Direct Hashing –

  • 8/18/2019 L14_PPT_IVSem

    23/33

    Direct Hashinthe file must contain a record for every possible key.

    Disadv. – space is wasted.

    Hashing techniques –map a large population of possible keys into

    a small address space .

  • 8/18/2019 L14_PPT_IVSem

    24/33

    Modulo divisionaddress = key % list_size + 1list_size : a prime number produces fewer collisions

    A new em lo ee numberin s stemthat will handle 1 million employees.

  • 8/18/2019 L14_PPT_IVSem

    25/33

    Di it Extraction Hashinselected digits are extracted from the keyand used as the address .

    For example : 1 ,3 ,4

    - g emp oyee num er → → → - g a ress125870 → 158

    121267 → 112…

    123413 → 134

  • 8/18/2019 L14_PPT_IVSem

    26/33

    Collision

    there is a possibility that more than one key will hash to thesame address in the file.

    – .

    Collision – a hashing algorithm produces an address for aninsertion key, and that address is already occupied .Prime area – the part of the file that contains all of the homeaddresses.

    Home address

  • 8/18/2019 L14_PPT_IVSem

    27/33

    Collision Resolution

    With the exception of the directed hashing ,none of the methods we discussed creates one-to-one

    .

    Several collision resolution methods :Open addressing resolutionLinked list resolutionBucket hashin resolution

  • 8/18/2019 L14_PPT_IVSem

    28/33

    Figure 13-11 Open addressing resolution

    Resolve collisions in the prime area.The prime area addresses are searched for an open orunoccup e recor w ere t e new ata can e p ace .One simplest strategy –the next address home address + 1Disadv. –each collision resolution increases the possibility of future

    .

  • 8/18/2019 L14_PPT_IVSem

    29/33

  • 8/18/2019 L14_PPT_IVSem

    30/33

    Figure 13-13 Bucket hashing resolution

    Bucket –a node that can accommodate more than one record.

  • 8/18/2019 L14_PPT_IVSem

    31/33

    VERVERVERVERBINARYBINARYBINARYBINARY

  • 8/18/2019 L14_PPT_IVSem

    32/33

    Text and binary interpretations of a file A file stored on a storage deviceis a sequence of bits that can be interpreted by an

    .

  • 8/18/2019 L14_PPT_IVSem

    33/33

    Text vs. Binary

    Text files – A file of characters .

    anno con a n n egers, oa ng-po n num ers, or any o er a a

    structures in their internal memory format.Encoding system – ASCII or EBCDIC …

    Binary files – A collection of data stored in the internal format of the computer.Contain data that are meaningfulonly if they are properly interpreted by a program.