data structures and algorithms 10€¦ · chapter 10 search search in a hash table •10.3.0 basic...

Post on 19-Jun-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Structures and Algorithms(10)

Instructor: Ming ZhangTextbook Authors: Ming Zhang, Tengjiao Wang and Haiyan Zhao

Higher Education Press, 2008.6 (the "Eleventh Five-Year" national planning textbook)

https://courses.edx.org/courses/PekingX/04830050x/2T2014/

Ming Zhang "Data Structures and Algorithms"

2

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Chapter 10. Search

• 10.1 Search in a linear list

• 10.2 Search in a set

• 10.3 Search in a hash table

• Summary

10.3 Search in a Hash Table

3

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Search in a Hash Table

• 10.3.0 Basic problems in hash tables

• 10.3.1 Collisions resolution

• 10.3.2 Open hashing

• 10.3.3 Closed hashing

• 10.3.4 Implementation of closed hashing

• 10.3.5 Efficiency analysis of hash methods

10.3 Search in a Hash Table

4

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Open Hashing

10.3 Search in a Hash Table

• The empty cells in the table

should be marked by special

values

• like -1 or INFINITY

• Or make the contents of hash

table to be pointers, and the

contents of empty cells are null

pointers

0

1

2

3

4

5

6

7

8

9

10

77

14

7

75

110

9562

{77、14、75、 7、110、62 、95 }

h(Key) = Key % 11

5

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Performance Analysis of Chaining Method

• Give you a table of size M which

contains n records. The hash

function (in the best case) put

records evenly into the M positions

of the table which makes each chain

contains n/M records on the average

• When M>n, the average cost of hash

method is Θ(1)

10.3 Search in a Hash Table

6

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

10.3.3 Closed Hashing

• d0=h(K) is called the base address of K.

• When a collision occurs, use some method to generate a

sequence of hash addresses for key K

d1, d

2, ... d

i , ..., d

m-1

• All the di (0<i<m) are the successive hash addresses

• With different way of probing, we get different ways to resolve

collisions.

• Insertion and search function both assume that the probing

sequence for each key has at least one empty cell

• Otherwise it may get into a endless loop

• We can also limit the length of probing sequence

10.3 Search in a Hash Table

7

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Problem may Arise - Clustering

• Clustering

• Nodes with different hash addresses

compete for the same successive hash

address

• Small clustering may merge into large

clustering

• Which leads to a very long probing

sequence

10.3 Search in a Hash Table

8

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Several General Closed Hashing Methods

• 1. Linear probing

• 2. Quadratic probing

• 3. Pseudo-random probing

• 4. Double hashing

10.3 Search in a Hash Table

9

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

1. Linear probing

• Basic idea:

• If the base address of a record is occupied, check

the next address until an empty cell is found

• Probe the following cells in turn: d+1, d+2, ......, M-1,

0, 1, ......, d-1

• A simple function used for the linear probing:

p(K,i) = I

• Advantages:

• All the cell of the table can be candidate cells for

the new record inserted

10.3 Search in a Hash Table

10

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Instance of Hash Table

10.3 Search in a Hash Table

• M = 15, h(key) = key%13

• In the ideal case, all the empty cells in the table should

have a chance to accept the record to be inserted

• The probability of the next record to be inserted at the 11th cell

is 2/15

• The probability to be inserted at the 7th cell is 11/15

0 1 2 3 4 5 6 7 8 9 10 22212 11 12 13 14

26 25 41 15 68 44 6 36 38 12 51

11

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Enhanced Linear Probing

• Every time skip constant c cells rather than 1

• The ith cell of probing sequence is

(h(K) + ic) mod M

• Records with adjacent base address would not get the

same probing sequence

• Probing function is p(K,i) = i*c

• Constant c and M must be co-prime

10.3 Search in a Hash Table

12

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Example: Enhance Linear Probing

• For instance, c = 2, The keys to be

inserted, k1 and k

2. h(k

1) = 3, h(k

2) = 5

• Probing sequences

• The probing sequence of k1: 3, 5, 7, 9, ...

• The probing sequence of k2: 5, 7, 9, ...

• The probing sequences of k1

and k2

are

still intertwine with each other, which

leads to clustering.

10.3 Search in a Hash Table

13

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

2. Quadratic probing

• Probing increment sequence: 12, -1

2, 2

2,

-22, ..., The address formula is

d2i-1

= (d +i2) % M

d2i

= (d – i2) % M

• A function for simple linear probing:p(K, 2i-1) = i*i

p(K, 2i) = - i*i

10.3 Search in a Hash Table

14

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Example: Quadratic Probing

• Example: use a table of size M = 13

• Assume for k1

and k2, h(k

1)=3, h(k

2)=2

• Probing sequences

• The probing sequence of k1: 3, 4, 2, 7, ...

• The probing sequence of k2: 2, 3, 1, 6, ...

• Although k2

would take the base address of

k1

as the second address to probe, but their

probing sequence will separate from each

other just after then

10.3 Search in a Hash Table

15

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

3. Pseudo-Random Probing

• Probing function p(K,i) = perm[i - 1]

• here perm is an array of length M – 1

• It contains a random permutation of numbers between 1 and M

// generate a pseudo-random permutation of n numbers

void permute(int *array, int n) {

for (int i = 1; i <= n; i ++)

swap(array[i-1], array[Random(i)]);

}

10.3 Search in a Hash Table

16

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Example: Pseudo-Random Probing

• Example: consider a table of size M = 13,

perm[0] = 2, perm[1] = 3, perm[2] = 7.

• Assume 2 keys k1

and k2, h(k

1)=4, h(k

2)=2

• Probing sequences

• The probing sequence of k1: 4, 6, 7, 11, ...

• The probing sequence of k2: 2, 4, 5, 9, ...

• Although k2

would take the base address

of k1 as the second address to probe, but

their probing sequence will separate from

each other just after then

10.3 Search in a Hash Table

17

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Secondary Clustering

• Eliminate the primary clustering

• Probing sequences of keys with different base address

overlap

• Pseudo-random probing and quadratic probing can eliminate

it

• Secondary clustering

• The clustering is caused by two keys which are hashed to one

base address, and have the same probing sequence

• Because the probing sequence is merely a function that

depends on the base address but not the original key.

• Example: pseudo-random probing and quadratic probing

10.3 Search in a Hash Table

18

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

4. Double Probing

• Avoid secondary clustering

• The probing sequence is a function that depends

on the original key

• Not only depends on the base address

• Double probing

• Use the second hash function as a constant

• p(K, i) = i * h2

(key)

• Probing sequence function

• d = h1(key)

• di = (d + i h

2(key)) % M

10.3 Search in a Hash Table

19

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Basic ideas of Double Probing

• The double probing uses two hash functions h1

and h2

• If collision occurs at address h1(key) = d, then compute

h2(key), the probing sequence we get is :

(d+h2(key)) % M,(d+2h

2(key)) %M ,(d+3h

2(key)) % M ,...

• It would be better if h2

(key) and M are co-prime

• Makes synonyms that cause collision distributed evenly in the table

• Or it may cause circulation computation of addresses of synonyms

• Advantages: hard to produce “clustering”

• Disadvantages: more computation

10.3 Search in a Hash Table

20

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Method of choosing M and h2(k)

• Method1: choose a prime M, the return values of h2

is in the

range of

1 ≤ h2(K) ≤ M – 1

• Method2: set M = 2m

,let h2

returns an odd number between 1

and 2m

• Method3: If M is a prime, h1(K) = K mod M

• h2

(K) = K mod(M-2) + 1

• or h2(K) = [K / M] mod (M-2) + 1

• Method4: If M is a arbitrary integer, h1(K) = K mod p (p is the

maximum prime smaller than M)

• h2

(K) = K mod q + 1 (q is the maximum prime smaller than p)

10.3 Search in a Hash Table

21

目录页

Ming Zhang “Data Structures and Algorithms”

Chapter 10

Search

Thinking• When inserting synonyms, how to

organize synonyms chain?

• What kind of relationship do the

function of double hashing h2

(key) and

h1

(key) have?

10.3 Search in a Hash Table

Data Structures and Algorithms

Thanks

the National Elaborate Course (Only available for IPs in China)http://www.jpk.pku.edu.cn/pkujpk/course/sjjg/

Ming Zhang, Tengjiao Wang and Haiyan ZhaoHigher Education Press, 2008.6 (awarded as the "Eleventh Five-Year" national planning textbook)

Ming Zhang “Data Structures and Algorithms”

top related