stia2023 hashing

Upload: amimul-ihsan

Post on 03-Jun-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 STIA2023 Hashing

    1/36

    Data Structures & Algorithm

    AnalysisWeek 13: Hashing

  • 8/12/2019 STIA2023 Hashing

    2/36

    Objectives:

    At the end of this lesson, the student will be able to: Describe the basic idea of hashing,

    Describe the purpose of a hash table, and a hash function,

    Describe how a hash function compresses a hash code into an index to

    hash table,

    Explain what collisions are and why they occur,

    Describe open addressing as a method to resolve collisions,

    Describe linear probing, and quadratic probing as particular open

    addressing schemes,

    Describe separate chaining as method to resolve collisions, and

    Describe the relative efficiencies of various collisions resolution

    techniques.

  • 8/12/2019 STIA2023 Hashing

    3/36

    Chapter Contents

    What is Hashing? Hash Functions

    Computing Hash Codes

    Compression a Hash Code into an Index for the Hash Table

    Resolving Collisions Open Addressing with Linear Probing

    Open Addressing with Quadratic Probing

    Separate Chaining

  • 8/12/2019 STIA2023 Hashing

    4/36

    Chapter Contents (ctd.)

    Efficiency The Load Factor

    The Cost of Open Addressing

    The Cost of Separate Chaining

  • 8/12/2019 STIA2023 Hashing

    5/36

    What is Hashing?

    A technique that determines an index or location forstorage of an item in a data structure

    The hash function receives the search key

    Returns the index of an element in an array called the hashtable

    The index is known as the hash index

    A perfect hash function maps each search key into adifferent integer suitable as an index to the hash table

  • 8/12/2019 STIA2023 Hashing

    6/36

    What is Hashing?

    Fig. 1: A hash function indexes its hash table.

  • 8/12/2019 STIA2023 Hashing

    7/36

    What is Hashing?

    Two steps of the hash function Convert the search key into an integer called the hash code

    Compress the hash code into the range of indices for the hash

    table

    Typical hash functions are not perfect They can allow more than one search key to map into a single

    index

    This is known as a collision

  • 8/12/2019 STIA2023 Hashing

    8/36

    What is Hashing?

    Fig. 2: A collision caused by the hash function h

  • 8/12/2019 STIA2023 Hashing

    9/36

    Hash Functions

    General characteristics of a good hash function

    Minimize collisions

    Distribute entries uniformly throughout the hash

    table

    Be fast to compute

  • 8/12/2019 STIA2023 Hashing

    10/36

    Computing Hash Codes

    We will override the hashCodemethod of Object

    Guidelines

    If a class overrides the method equals, it should override

    hashCode

    If the method equalsconsiders two objects equal, hashCodemust

    return the same value for both objects

    If an object invokes hashCodemore than once during execution

    of program on the same data, it must return the same hash code

    If an object's hash code during one execution of a program can

    differ from its hash code during another execution of the sameprogram

  • 8/12/2019 STIA2023 Hashing

    11/36

    Computing Hash Codes

    The hash code for a string, s

    Hash code for a primitive type

    Use the primitive typed key itself Manipulate internal binary representations

    Use folding

    int hash = 0;

    int n = s.length();

    for (int i = 0; i < n; i++)

    hash = g * hash + s.charAt(i);

    // g is a positive constant

  • 8/12/2019 STIA2023 Hashing

    12/36

    Compressing a Hash Code

    Must compress the hash code so it fits into the indexrange

    Typical method for a code c is to compute c modulo n

    nis a prime number (the size of the table)

    Index will then be between 0 and n 1

    private int getHashIndex(Object key)

    { int hashIndex = key.hashCode() % hashTable.length;

    if (hashIndex < 0)

    hashIndex = hashIndex + hashTable.length;

    return hashIndex;

    } // end getHashIndex

  • 8/12/2019 STIA2023 Hashing

    13/36

    Resolving Collisions

    Options when hash functions returns location

    already used in the table

    Use another location in the table (open addressing)

    Change the structure of the hash table so that each arraylocation can represent multiple values (separate

    chaining)

  • 8/12/2019 STIA2023 Hashing

    14/36

    Open Addressing with Linear Probing

    Open addressing scheme locates alternate location

    New location must be open, available

    Linear probing

    If collision occurs at hashTable[k], look successively atlocation k + 1, k + 2,

  • 8/12/2019 STIA2023 Hashing

    15/36

    Fig. 3 : The effect of linear probing after adding four

    entries whose search keys hash to the same index.

    Open Addressing with Linear Probing

  • 8/12/2019 STIA2023 Hashing

    16/36

    Fig. 4: A revision of the hash table shown in 19-3 when

    linear probing resolves collisions; each entry contains a

    search key and its associated value

    Open Addressing with Linear Probing

  • 8/12/2019 STIA2023 Hashing

    17/36

    Removals

    Fig. 5: A hash table if removeused null

    to remove entries.

  • 8/12/2019 STIA2023 Hashing

    18/36

  • 8/12/2019 STIA2023 Hashing

    19/36

    Open Addressing with Linear Probing

    Fig. 6: A linear probe sequence (a) after adding an entry;

    (b) after removing two entries;

  • 8/12/2019 STIA2023 Hashing

    20/36

    Fig. 6: A linear probe sequence (c) after a search; (d)

    during the search while adding an entry; (e) after an

    addition to a formerly occupied location.

    Open Addressing with Linear Probing

  • 8/12/2019 STIA2023 Hashing

    21/36

    Searches that Dictionary Operations Require

    To retrieve an entry

    Search the probe sequence for the key

    Examine entries that are present, ignore locations in available state

    Stop search when key is found or null reached

    To remove an entry

    Search the probe sequence same as for retrieval If key is found, mark location as available

    To add an entry

    Search probe sequence same as for retrieval

    Note first available slot

    Use available slot if the key is not found

  • 8/12/2019 STIA2023 Hashing

    22/36

    Open Addressing, Quadratic Probing

    Change the probe sequence

    Given search key k

    Probe to k + 1, k + 22, k + 32, k + n2

    Reaches every location in the hash table if table size

    is a prime number

    For avoiding primary clustering But can lead to secondary clustering

  • 8/12/2019 STIA2023 Hashing

    23/36

  • 8/12/2019 STIA2023 Hashing

    24/36

    Separate Chaining

    Alter the structure of the hash table

    Each location can represent multiple values

    Each location called a bucket

    Bucket can be a(n)

    List

    Sorted list

    Chain of linked nodes

    Array

    Vector

  • 8/12/2019 STIA2023 Hashing

    25/36

    Separate Chaining

    Fig. 9: A hash table for use with separate chaining; each

    bucket is a chain of linked nodes.

  • 8/12/2019 STIA2023 Hashing

    26/36

    Separate Chaining

    Fig. 10: Where new entry is inserted into linked bucket

    when integer search keys are (a) duplicate and unsorted;

  • 8/12/2019 STIA2023 Hashing

    27/36

    Separate Chaining

    Fig. 10: Where new entry is inserted into linked bucket

    when integer search keys are (b) distinct and unsorted;

  • 8/12/2019 STIA2023 Hashing

    28/36

    Separate Chaining

    Fig. 10: Where new entry is inserted into linked bucket

    when integer search keys are (c) distinct and sorted

  • 8/12/2019 STIA2023 Hashing

    29/36

    Efficiency Observations

    Successful retrieval or removal Same efficiency as successful search

    Unsuccessful retrieval or removal

    Same efficiency as unsuccessful search

    Successful addition

    Same efficiency as unsuccessful search

    Unsuccessful addition

    Same efficiency as successful search

  • 8/12/2019 STIA2023 Hashing

    30/36

    Load Factor

    Perfect hash function not always possible or practical

    Thus, collisions likely to occur

    As hash table fills

    Collisions occur more often

    Measure for table fullness, the load factor

  • 8/12/2019 STIA2023 Hashing

    31/36

    Cost of Open Addressing

    Fig. 11: The average number of comparisons required by

    a search of the hash table for given values of the load

    factor when using linear probing.

  • 8/12/2019 STIA2023 Hashing

    32/36

    Cost of Open Addressing

    Fig. 12: The average number of comparisons

    required by a search of the hash table for given

    values of the load factor when using either

    quadratic probing or double hashing.

    Note: for quadraticprobing or double

    hashing, should

    have < 0.5

  • 8/12/2019 STIA2023 Hashing

    33/36

    Cost of Separate Chaining

    Fig. 13: Average number of comparisons required by

    search of hash table for given values of load factor

    when using separate chaining.

    Note: Reasonable

    efficiency requires

    only < 1

  • 8/12/2019 STIA2023 Hashing

    34/36

    Conclusion

    Q & A Session

  • 8/12/2019 STIA2023 Hashing

    35/36

    Question 1

    The property that is not expected from good hashing

    technique should ______________.

    A) produce keys uniformly distributed over the range

    B)easy to program

    C)produce no collisions

    D)minimize collisions

  • 8/12/2019 STIA2023 Hashing

    36/36

    Question 2

    Assume that a hash function has the following

    characteristics:

    Keys 77, 355, and 276 hash to 3.

    Keys 945 and 579 hash to 5.

    Key 517 hashes to 0.Key 155 hashes to 2.

    Perform insertions in order the 945, 77, 276, 355, 517, 155, 579

    A) Using linear probing technique, indicates the position of the data.

    B) Which element requires the largest number of probes to be found in thetable (if more than one, give the element with the smallest index in the

    hash table)?

    C) Which elements(s) can we access with a single probe?

    D) What is the load factor?