csc313 hashing(1)

Upload: christopher-miller

Post on 02-Jun-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Csc313 Hashing(1)

    1/32

    Data Structures

    Hashing

    Hikmat Farhat

    December 19, 2013

    Hikmat Farhat Data Structures December 19, 2013 1 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    2/32

    Introduction

    Many applications need to store a collection of element that supportsdictionaryoperations

    Insert Delete Search

    A hash table is a an efficient data structure that implementsdictionary operations.

    The average complexity of searching for an element in a hash table is

    O(1). The worst-case is O(n).

    Hikmat Farhat Data Structures December 19, 2013 2 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    3/32

    What is a Hash Table?

    A hash table is a generalization of an array

    If we have a set ofn element, each having a key k

    We keep an array T ofn elements where T[k] contains element k (ora pointer to it).

    To find the element whose key is kwe just fetch the T[k] which is anO(1) operation. This is called direct addressing.

    Direct addressing can be done if we can allocate an array with oneposition for every possible key

    When the number ofpossiblekeys is large the array solution is notefficient.

    Hikmat Farhat Data Structures December 19, 2013 3 / 32

    http://find/http://goback/
  • 8/10/2019 Csc313 Hashing(1)

    4/32

    4

    7F5$#6+1*+ 43 0+'*H

    ;F")%5"/0+'*H

    IJ

    KL

    8

    MN

    7O

    P I

    J

    K

    L

    0+' *"%+//#%+ &"%"

    I

    7

    8

    J

    N

    K

    P

    O

    L

    M

    Hikmat Farhat Data Structures December 19, 2013 4 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    5/32

    Hash Tables

    When the set of possible keys is large storing all possible keysbecomes impractical.

    Usually the set of stored keys K is small compared to the set ofpossible keys U.

    In such situation we use a hash function , h.

    h:U {0, . . . ,m1}

    When K is much smaller than Uthen a hash table requires much less

    space than an arrayThe storage requirement will be (|K |)

    Hikmat Farhat Data Structures December 19, 2013 5 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    6/32

    An element with key kwill be stored in slot h(k) instead of slot k.

    A problem when two keys hash to the same value: collision

    One way to resolve collisions is to use chainingAnother way is open addressing to be discussed later.

    Hikmat Farhat Data Structures December 19, 2013 6 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    7/32

    Chaining

    2

    8@($%C#, 6:

  • 8/10/2019 Csc313 Hashing(1)

    8/32

    Average Complexity

    The average case complexity of operations depends on the length ofthe lists.

    For a hash table with m slots and n elements define the load factor= n

    m

    To insert an element with key k, it is added to the front of the list at

    positionh(k). This is an O(1) operation in the worst and averagecase

    To find an element with key kwe need to search in the list stored inelement h(k). This is an O(nk) operation where nk is the length of

    the list at position h(k).The worst case of search is O(n) because it is possible that all keyshash to the same value.

    Hikmat Farhat Data Structures December 19, 2013 8 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    9/32

    The average case is (1 +) (under the assumption of simpleuniform hashing)

    Deleting an element is similar to search.

    Hikmat Farhat Data Structures December 19, 2013 9 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    10/32

    Hash Functions

    A good hash function satisfies simple uniform hashing:Each key is equally likely to hash to any of the m slots andindependently of other keys.

    In practice this cannot be fully satisfied so we aim for anapproximation.

    In what follows the key is assumed to be an integer or interpreted asan integer.

    For example if the key is a string it can be converted to an integer byusing ASCII code radix 128.

    for example the string cs will have the value(99128 + 115) = 12687

    Hikmat Farhat Data Structures December 19, 2013 10 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    11/32

    Division Method

    In the division method each key k is mapped to one of the m slots byperforming

    h(k) =k mod m

    Example: m= 13 and k= 100 then h(k) = 100 mod 13 = 9

    In this method there is a restriction on the possible values ofm.

    Take the example ofm= 8 and hash the values 7,23,39,55 and 71.

    What do you get?

    Hikmat Farhat Data Structures December 19, 2013 11 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    12/32

    In general it is a bad choice for m to be of the form 2p. Unless weknow that the lower order pbits are equally distributed (which is not

    the case when the data has a common suffix).

    This is because ifbqbq1. . . bp. . . b0 is the binary representation ofkthen h(k) =k mod 2p will depend on the least significant pbits only.

    The above is true because we can write

    x= xy

    y+x mod y

    and since division(multiplication) by 2 is equivalent to a right(left)shift and y= 2p then x

    yy=bq. . . bp0 . . . 0

    p times

    .

    Therefore x mod y=bp1. . . b0

    Hikmat Farhat Data Structures December 19, 2013 12 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    13/32

    Example

    A good choice is for m to be a prime number not close to a power of2.

    As an example suppose that we have 2000 strings to be stored

    Say we can afford 3 search operations on averageChoose m= 701 because it is a prime number with

    512< 701

  • 8/10/2019 Csc313 Hashing(1)

    14/32

    Multiplication Method

    In the multiplication method we choose a fractional value 0< A< 1and the hash of a key k in a hash table containing m slots iscomputed as

    h(k) =m(kA kA)

    Note that kA kA is just the fractional part ofkA.

    The advantage of the multiplication method is that the value ofm isnot critical.

    Typically m is chosen as a power of 2.

    Hikmat Farhat Data Structures December 19, 2013 14 / 32

    http://goforward/http://find/http://goback/
  • 8/10/2019 Csc313 Hashing(1)

    15/32

    Example

    As an example of the multiplicative method we choose the sameexample as before.

    6901,775,1994,3396,4508

    With A= 0.1 and m= 512 then

    69010.1 = 690.1 thus h(6901) =5120.1= 51

    is 0.1 a good choice for A ?

    Hikmat Farhat Data Structures December 19, 2013 15 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    16/32

    Open Addressing

    We saw that one way of resolving collisions is chaining.

    Another way is open addressing.

    In open addressing all the elements are stored in the table itself.

    This means that in m slots we cannot store more than m elements.

    The advantage of open addressing is that no pointers are used.

    Since two or more keys can hash to the same value, open addressinguses probing to resolve collisions.

    Hikmat Farhat Data Structures December 19, 2013 16 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    17/32

    In chaining we search for a key kby using h(k) as a starting pointand following pointers.

    In open addressing the exact position of a key is computed

    The hash function becomes a function

    h:U {0, 1, . . . ,m1} {0, 1, . . .m1}

    Given an element with key k ifh(k, 0) is used we try h(k, 1) and soon until we find an empty slot.

    Hikmat Farhat Data Structures December 19, 2013 17 / 32

    O O

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    18/32

    Open Addressing Operations

    HASH-INSERT(T,k);i0;repeat

    jh(k, i) ;

    ifT[j] =NULL thenT[j]kelse

    ii+ 1end

    until j=m;

    Hikmat Farhat Data Structures December 19, 2013 18 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    19/32

    HASH-SEARCH(T,k)

    i0repeat

    jh(k, i)ifT[j] =k then

    return jelse

    ii+ 1end

    until T[j] =NULL or j=m

    Hikmat Farhat Data Structures December 19, 2013 19 / 32

    Li P bi

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    20/32

    Linear Probing

    Given a hash function g :U {0, . . . ,m1}, sometimes called anauxiliary hash function

    Linear probing uses the hash function

    h(k, i) = (g(k) +i) mod m

    Hikmat Farhat Data Structures December 19, 2013 20 / 32

    E l Li P bi

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    21/32

    Example Linear Probing

    using h(k, i) = ((k mod 10) +i) mod 10

    Suppose that we want to insert the sequence 89,18,49,58,69.

    Obviously (89 mod 10) = (49 mod 10) = (69 mod 10) = 9 and (18mod 10) = (58 mod 10) = 8.

    89 is inserted into slot 9 and 18 is inserted into slot 8.

    to insert 49 which should go in slot 9 which is not empty we probe fori=1 which gives h(49) = ((49 mod 10) + 1) mod 10 = 0 thus 49 isinserted in slot 0.

    to insert 58, slot 8 is not empty so we probe i=1, h(8, 1) = ((48

    mod 10) + 1) mod 10 = 9) also not emptyThen we probe i=2 h(8, 2) = ((48 mod 10) + 2) mod 10 = 0) alsonot empty

    Hikmat Farhat Data Structures December 19, 2013 21 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    22/32

    Finally h(8, 3) = ((48 mod 10) + 3) mod 10 = 1) is empty.

    To insert 69, h(69, 0) = ((69 mod 10) + 0) mod 10 = 9)not emptySo we probe i=1,h(69, 1) = ((69 mod 10) + 1) mod 10 = 0)alsonot empty

    So we probe i=2,h(69, 2) = ((69 mod 10) + 2) mod 10 = 1)alsonot empty

    So we probe i=3,h(69, 3) = ((69 mod 10) + 3) mod 10 = 2)isempty thus 69 is insert in slot 2.

    Note that the more items we add the more time it takes toprobe

    The reason is that items tend to cluster together

    Hikmat Farhat Data Structures December 19, 2013 22 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    23/32

    Hikmat Farhat Data Structures December 19, 2013 23 / 32

    Quadratic Probing

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    24/32

    Quadratic Probing

    To solve the clustering problem we introduce Quadratic probing.

    h(k, i) = ((g(k) +c1i+c2i2) mod m

    With c2= 0. As an example we choose

    h(k, i) = ((k mod 10) +i2) mod 10

    we use the same example of inserting 89,18,49,58,69.

    Hikmat Farhat Data Structures December 19, 2013 24 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    25/32

    89 and 18 are inserted at 9 and 8 respectively.

    to insert 49, slot 9 is occupied so we probe i=1 which is slot 0

    to insert 58, slot 8 is occupied and probing with i=1 gives slot 9which is also occupied. the next probe is at i=2 which gives (8 + 4)

    mod 10 = 2 so 58 is inserted into slot 2.to insert 69, slot 9 is occupied, first probe gives slot 0 , also occupied,second probe gives (9 + 4) mod 10 = 3

    Hikmat Farhat Data Structures December 19, 2013 25 / 32

    http://goforward/http://find/http://goback/
  • 8/10/2019 Csc313 Hashing(1)

    26/32

    Hikmat Farhat Data Structures December 19, 2013 26 / 32

    Double Hashing

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    27/32

    Double Hashing

    In double hashing probing is done based on another auxiliary hashfunction

    h(k, i) = (g(k) +if(k)) mod m

    As an example consider

    g(k) =k mod 10

    f(k) = 7k mod 7

    Hikmat Farhat Data Structures December 19, 2013 27 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    28/32

    89 and 18 are inserted into slots 9 and 8 as before.

    to insert 49, slot 9 occupied , first probe gives

    h(49, 1) = ((49 mod 10) + (749 mod 7)) mod 10

    = 6

    thus 49 is insert into slot 6.

    to insert 58, slot 8 is occupied, first probe gives

    h(58, 1) = ((58 mod 10) + (758 mod 7)) mod 10

    = 3

    thus 58 is insert into slot 3 after the first probe.

    Hikmat Farhat Data Structures December 19, 2013 28 / 32

    Double Hashing example

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    29/32

    Double Hashing example

    Hikmat Farhat Data Structures December 19, 2013 29 / 32

    Rehashing

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    30/32

    Rehashing

    When the table gets too full operations will take a long time.

    It is even possible for the insertion of an element to fail.

    A solution is to build a new table roughly twice as big as the original

    one.The original table is scanned and each element is rehashed into thenew table.

    As an example we insert 13,15,6,23,24 into a hash table of size 7 withlinear probing.

    Hikmat Farhat Data Structures December 19, 2013 30 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    31/32

    We rehash with a new table of size 17 with linear probing.

    Hikmat Farhat Data Structures December 19, 2013 31 / 32

    http://find/
  • 8/10/2019 Csc313 Hashing(1)

    32/32

    Hikmat Farhat Data Structures December 19, 2013 32 / 32

    http://find/