l14_ppt_ivsem

8/18/2019 L14_PPT_IVSem

1/33

Lecture- 14

File tructures

Sequential Files,

8/18/2019 L14_PPT_IVSem

2/33

OO BJECTIVES BJECTIVES After reading this chapter, we should After reading this chapter, we should be able to: be able to:

Understand theUnderstand the file access methodsfile access methods. .

Describe the characteristics of aDescribe the characteristics of a sequential filesequential file. .

Describe the characteristics of anDescribe the characteristics of an indexed fileindexed file. .

Describe the characteristics of aDescribe the characteristics of a hashed filehashed file. .

Distinguish between aDistinguish between a text filetext file and aand a binary file binary file. .

8/18/2019 L14_PPT_IVSem

3/33

e

treated as a unit.es are store n aux ary secon ary

storage devices .DiskTapes

A file is a collection of data records witheach record consisting of one or more fields .

8/18/2019 L14_PPT_IVSem

4/33

8/18/2019 L14_PPT_IVSem

5/33

Figure 13-1 Taxonomy of file structures

The access methoddetermines how records can be retrieved:sequen a y or ran om y .

• One record after another,from beginning to end

• Access one specific recordwithout having to retrieve all records before it

8/18/2019 L14_PPT_IVSem

6/33

FILEFILEFILEFILE

8/18/2019 L14_PPT_IVSem

7/33

Figure 13-2 Sequential file

Sequential file – y u y,

one after another, from beginning to end.

8/18/2019 L14_PPT_IVSem

8/33

While Not EOF . .

Read the next record

}

8/18/2019 L14_PPT_IVSem

9/33

pp ca ons –

that need to access all records from beginning to end.

Because you have to process each record,se uential access is more efficient and easier thanrandom access.

Sequential file is not efficient for random access.

8/18/2019 L14_PPT_IVSem

10/33

U datin se uential files sequential files must be updated periodically toreflect chan es in information.

The updating process –all of the records need to be checked and updated.

New Master File

Old Master FileTransaction File –contains changes to be applied to the master file.

Add transaction

e e e ransac onChange transaction A key is one or more fields that uniquely identify the data in the file.

8/18/2019 L14_PPT_IVSem

11/33

8/18/2019 L14_PPT_IVSem

12/33

U datin se uential files To make updating process efficient, all files are

.The update process requires that you compare :

.< : add transaction to new master = :

Change content of master file data (transaction code = R(revise) )Remove data from master file (transaction code = D(delete) )

> : wr e o mas er e recor o new mas er e(transaction code = A(add) )

8/18/2019 L14_PPT_IVSem

13/33

Updating process

8/18/2019 L14_PPT_IVSem

14/33

FILEFILEFILEFILE

8/18/2019 L14_PPT_IVSem

15/33

Ma in in an indexed file

To access a record in a file randoml ,

you need to know the address of the record. An index file can relate the key to the record address .

8/18/2019 L14_PPT_IVSem

16/33

Indexed files An index file is made of a data file , which is a sequential file,and an index .

Index – a small file with only two fields:The key of the sequential file .

To access a record in the file :1. Load the entire index file into main memory.2. earc e n ex e o n e es re ey.3. Retrieve the address the record.4. Retrieve the data record. (using the address)

Inverted file –ou can have more than one index each with a different ke .

8/18/2019 L14_PPT_IVSem

17/33

nver e e

A file that reorganizes the structure of an existing data file to enablea rapid search to be made for all records having one field fallingwithin set limits.

For example, a file used by an estate agent might store records oneach house for sale, using a reference number as the key field forsorting . One field in each record would be the asking price of the

ouse. o spee up e process o raw ng up s s o ouses a ngwithin certain price ranges, an inverted file might be created in whichthe records are rearranged according to price. Each record wouldconsist of an asking price, followed by the reference numbers of all

.

8/18/2019 L14_PPT_IVSem

18/33

Logical view of an indexed file

8/18/2019 L14_PPT_IVSem

19/33

FILEFILEFILEFILE

8/18/2019 L14_PPT_IVSem

20/33

Ma in in a hashed file A hashed file uses a hash function to map the key to theaddress .

Eliminates the need for an extra file ( index ).There is no need for an index and all of the overhead.

8/18/2019 L14_PPT_IVSem

21/33

Hashin methodsDirect Hashing –the key is the address without any algorithmic manipulation.

Modulo Division Hashing – ( Division remainder hashing )

use the remainder plus 1 for the address .

Digit Extraction Hashing –selected digits are extracted from the key and used as theaddress .

8/18/2019 L14_PPT_IVSem

22/33

Direct hashing

Direct Hashing –

8/18/2019 L14_PPT_IVSem

23/33

Direct Hashinthe file must contain a record for every possible key.

–

Disadv. – space is wasted.

Hashing techniques –map a large population of possible keys into

a small address space .

8/18/2019 L14_PPT_IVSem

24/33

Modulo divisionaddress = key % list_size + 1list_size : a prime number produces fewer collisions

A new em lo ee numberin s stemthat will handle 1 million employees.

8/18/2019 L14_PPT_IVSem

25/33

Di it Extraction Hashinselected digits are extracted from the keyand used as the address .

For example : 1 ,3 ,4

- g emp oyee num er → → → - g a ress125870 → 158

→

121267 → 112…

123413 → 134

8/18/2019 L14_PPT_IVSem

26/33

Collision

there is a possibility that more than one key will hash to thesame address in the file.

– .

Collision – a hashing algorithm produces an address for aninsertion key, and that address is already occupied .Prime area – the part of the file that contains all of the homeaddresses.

Home address

8/18/2019 L14_PPT_IVSem

27/33

Collision Resolution

With the exception of the directed hashing ,none of the methods we discussed creates one-to-one

.

Several collision resolution methods :Open addressing resolutionLinked list resolutionBucket hashin resolution

8/18/2019 L14_PPT_IVSem

28/33

Figure 13-11 Open addressing resolution

Resolve collisions in the prime area.The prime area addresses are searched for an open orunoccup e recor w ere t e new ata can e p ace .One simplest strategy –the next address home address + 1Disadv. –each collision resolution increases the possibility of future

.

8/18/2019 L14_PPT_IVSem

29/33

8/18/2019 L14_PPT_IVSem

30/33

Figure 13-13 Bucket hashing resolution

Bucket –a node that can accommodate more than one record.

8/18/2019 L14_PPT_IVSem

31/33

VERVERVERVERBINARYBINARYBINARYBINARY

8/18/2019 L14_PPT_IVSem

32/33

Text and binary interpretations of a file A file stored on a storage deviceis a sequence of bits that can be interpreted by an

.

8/18/2019 L14_PPT_IVSem

33/33

Text vs. Binary

Text files – A file of characters .

anno con a n n egers, oa ng-po n num ers, or any o er a a

structures in their internal memory format.Encoding system – ASCII or EBCDIC …

Binary files – A collection of data stored in the internal format of the computer.Contain data that are meaningfulonly if they are properly interpreted by a program.

l14_ppt_ivsem

Documents