csc313 hashing(1)

8/10/2019 Csc313 Hashing(1)

1/32

Data Structures

Hashing

Hikmat Farhat

December 19, 2013

Hikmat Farhat Data Structures December 19, 2013 1 / 32
http://find/

8/10/2019 Csc313 Hashing(1)

2/32

Introduction

Many applications need to store a collection of element that supportsdictionaryoperations

Insert Delete Search

A hash table is a an efficient data structure that implementsdictionary operations.

The average complexity of searching for an element in a hash table is

O(1). The worst-case is O(n).

http://find/

8/10/2019 Csc313 Hashing(1)

3/32

What is a Hash Table?

A hash table is a generalization of an array

If we have a set ofn element, each having a key k

We keep an array T ofn elements where T[k] contains element k (ora pointer to it).

To find the element whose key is kwe just fetch the T[k] which is anO(1) operation. This is called direct addressing.

Direct addressing can be done if we can allocate an array with oneposition for every possible key

When the number ofpossiblekeys is large the array solution is notefficient.

http://find/http://goback/

8/10/2019 Csc313 Hashing(1)

4/32

4

7F5$#6+1*+ 43 0+'*H

;F")%5"/0+'*H

IJ

KL

8

MN

7O

P I

J

K

L

0+' *"%+//#%+ &"%"

I

7

8

J

N

K

P

O

L

M

http://find/

8/10/2019 Csc313 Hashing(1)

5/32

Hash Tables

When the set of possible keys is large storing all possible keysbecomes impractical.

Usually the set of stored keys K is small compared to the set ofpossible keys U.

In such situation we use a hash function , h.

h:U {0, . . . ,m1}

When K is much smaller than Uthen a hash table requires much less

space than an arrayThe storage requirement will be (|K |)

http://find/

8/10/2019 Csc313 Hashing(1)

6/32

An element with key kwill be stored in slot h(k) instead of slot k.

A problem when two keys hash to the same value: collision

One way to resolve collisions is to use chainingAnother way is open addressing to be discussed later.

http://find/

8/10/2019 Csc313 Hashing(1)

7/32

Chaining

2

8@($%C#, 6:

8/10/2019 Csc313 Hashing(1)

8/32

Average Complexity

The average case complexity of operations depends on the length ofthe lists.

For a hash table with m slots and n elements define the load factor= n

m

To insert an element with key k, it is added to the front of the list at

positionh(k). This is an O(1) operation in the worst and averagecase

To find an element with key kwe need to search in the list stored inelement h(k). This is an O(nk) operation where nk is the length of

the list at position h(k).The worst case of search is O(n) because it is possible that all keyshash to the same value.

http://find/

8/10/2019 Csc313 Hashing(1)

9/32

The average case is (1 +) (under the assumption of simpleuniform hashing)

Deleting an element is similar to search.

http://find/

8/10/2019 Csc313 Hashing(1)

10/32

Hash Functions

A good hash function satisfies simple uniform hashing:Each key is equally likely to hash to any of the m slots andindependently of other keys.

In practice this cannot be fully satisfied so we aim for anapproximation.

In what follows the key is assumed to be an integer or interpreted asan integer.

For example if the key is a string it can be converted to an integer byusing ASCII code radix 128.

for example the string cs will have the value(99128 + 115) = 12687

http://find/

8/10/2019 Csc313 Hashing(1)

11/32

Division Method

In the division method each key k is mapped to one of the m slots byperforming

h(k) =k mod m

Example: m= 13 and k= 100 then h(k) = 100 mod 13 = 9

In this method there is a restriction on the possible values ofm.

Take the example ofm= 8 and hash the values 7,23,39,55 and 71.

What do you get?

http://find/

8/10/2019 Csc313 Hashing(1)

12/32

In general it is a bad choice for m to be of the form 2p. Unless weknow that the lower order pbits are equally distributed (which is not

the case when the data has a common suffix).

This is because ifbqbq1. . . bp. . . b0 is the binary representation ofkthen h(k) =k mod 2p will depend on the least significant pbits only.

The above is true because we can write

x= xy

y+x mod y

and since division(multiplication) by 2 is equivalent to a right(left)shift and y= 2p then x

yy=bq. . . bp0 . . . 0

p times

.

Therefore x mod y=bp1. . . b0

http://find/

8/10/2019 Csc313 Hashing(1)

13/32

Example

A good choice is for m to be a prime number not close to a power of2.

As an example suppose that we have 2000 strings to be stored

Say we can afford 3 search operations on averageChoose m= 701 because it is a prime number with

512< 701

8/10/2019 Csc313 Hashing(1)

14/32

Multiplication Method

In the multiplication method we choose a fractional value 0< A< 1and the hash of a key k in a hash table containing m slots iscomputed as

h(k) =m(kA kA)

Note that kA kA is just the fractional part ofkA.

The advantage of the multiplication method is that the value ofm isnot critical.

Typically m is chosen as a power of 2.

http://goforward/http://find/http://goback/

8/10/2019 Csc313 Hashing(1)

15/32

Example

As an example of the multiplicative method we choose the sameexample as before.

6901,775,1994,3396,4508

With A= 0.1 and m= 512 then

69010.1 = 690.1 thus h(6901) =5120.1= 51

is 0.1 a good choice for A ?

http://find/

8/10/2019 Csc313 Hashing(1)

16/32

Open Addressing

We saw that one way of resolving collisions is chaining.

Another way is open addressing.

In open addressing all the elements are stored in the table itself.

This means that in m slots we cannot store more than m elements.

The advantage of open addressing is that no pointers are used.

Since two or more keys can hash to the same value, open addressinguses probing to resolve collisions.

http://find/

8/10/2019 Csc313 Hashing(1)

17/32

In chaining we search for a key kby using h(k) as a starting pointand following pointers.

In open addressing the exact position of a key is computed

The hash function becomes a function

h:U {0, 1, . . . ,m1} {0, 1, . . .m1}

Given an element with key k ifh(k, 0) is used we try h(k, 1) and soon until we find an empty slot.


O O
http://find/

8/10/2019 Csc313 Hashing(1)

18/32

Open Addressing Operations

HASH-INSERT(T,k);i0;repeat

jh(k, i) ;

ifT[j] =NULL thenT[j]kelse

ii+ 1end

until j=m;

http://find/

8/10/2019 Csc313 Hashing(1)

19/32

HASH-SEARCH(T,k)

i0repeat

jh(k, i)ifT[j] =k then

return jelse

ii+ 1end

until T[j] =NULL or j=m


Li P bi
http://find/

8/10/2019 Csc313 Hashing(1)

20/32

Linear Probing

Given a hash function g :U {0, . . . ,m1}, sometimes called anauxiliary hash function

Linear probing uses the hash function

h(k, i) = (g(k) +i) mod m


E l Li P bi
http://find/

8/10/2019 Csc313 Hashing(1)

21/32

Example Linear Probing

using h(k, i) = ((k mod 10) +i) mod 10

Suppose that we want to insert the sequence 89,18,49,58,69.

Obviously (89 mod 10) = (49 mod 10) = (69 mod 10) = 9 and (18mod 10) = (58 mod 10) = 8.

89 is inserted into slot 9 and 18 is inserted into slot 8.

to insert 49 which should go in slot 9 which is not empty we probe fori=1 which gives h(49) = ((49 mod 10) + 1) mod 10 = 0 thus 49 isinserted in slot 0.

to insert 58, slot 8 is not empty so we probe i=1, h(8, 1) = ((48

mod 10) + 1) mod 10 = 9) also not emptyThen we probe i=2 h(8, 2) = ((48 mod 10) + 2) mod 10 = 0) alsonot empty

http://find/

8/10/2019 Csc313 Hashing(1)

22/32

Finally h(8, 3) = ((48 mod 10) + 3) mod 10 = 1) is empty.

To insert 69, h(69, 0) = ((69 mod 10) + 0) mod 10 = 9)not emptySo we probe i=1,h(69, 1) = ((69 mod 10) + 1) mod 10 = 0)alsonot empty

So we probe i=2,h(69, 2) = ((69 mod 10) + 2) mod 10 = 1)alsonot empty

So we probe i=3,h(69, 3) = ((69 mod 10) + 3) mod 10 = 2)isempty thus 69 is insert in slot 2.

Note that the more items we add the more time it takes toprobe

The reason is that items tend to cluster together

http://find/

8/10/2019 Csc313 Hashing(1)

23/32


Quadratic Probing
http://find/

8/10/2019 Csc313 Hashing(1)

24/32

Quadratic Probing

To solve the clustering problem we introduce Quadratic probing.

h(k, i) = ((g(k) +c1i+c2i2) mod m

With c2= 0. As an example we choose

h(k, i) = ((k mod 10) +i2) mod 10

we use the same example of inserting 89,18,49,58,69.

http://find/

8/10/2019 Csc313 Hashing(1)

25/32

89 and 18 are inserted at 9 and 8 respectively.

to insert 49, slot 9 is occupied so we probe i=1 which is slot 0

to insert 58, slot 8 is occupied and probing with i=1 gives slot 9which is also occupied. the next probe is at i=2 which gives (8 + 4)

mod 10 = 2 so 58 is inserted into slot 2.to insert 69, slot 9 is occupied, first probe gives slot 0 , also occupied,second probe gives (9 + 4) mod 10 = 3

http://goforward/http://find/http://goback/

8/10/2019 Csc313 Hashing(1)

26/32


Double Hashing
http://find/

8/10/2019 Csc313 Hashing(1)

27/32

Double Hashing

In double hashing probing is done based on another auxiliary hashfunction

h(k, i) = (g(k) +if(k)) mod m

As an example consider

g(k) =k mod 10

f(k) = 7k mod 7

http://find/

8/10/2019 Csc313 Hashing(1)

28/32

89 and 18 are inserted into slots 9 and 8 as before.

to insert 49, slot 9 occupied , first probe gives

h(49, 1) = ((49 mod 10) + (749 mod 7)) mod 10

= 6

thus 49 is insert into slot 6.

to insert 58, slot 8 is occupied, first probe gives

h(58, 1) = ((58 mod 10) + (758 mod 7)) mod 10

= 3

thus 58 is insert into slot 3 after the first probe.


Double Hashing example
http://find/

8/10/2019 Csc313 Hashing(1)

29/32

Double Hashing example


Rehashing
http://find/

8/10/2019 Csc313 Hashing(1)

30/32

Rehashing

When the table gets too full operations will take a long time.

It is even possible for the insertion of an element to fail.

A solution is to build a new table roughly twice as big as the original

one.The original table is scanned and each element is rehashed into thenew table.

As an example we insert 13,15,6,23,24 into a hash table of size 7 withlinear probing.

http://find/

8/10/2019 Csc313 Hashing(1)

31/32

We rehash with a new table of size 17 with linear probing.

http://find/

8/10/2019 Csc313 Hashing(1)

32/32

http://find/

csc313 hashing(1)

Documents