14-04-2011, 11:08 AM
Presented By…
Surya Mukherjee
[attachment=12166]
History
• In January 1953, H. P. Luhn wrote an internal IBM memorandum that used hashing with chaining.
• G. N. Amdahl, E. M. Boehme, N. Rochester, and A. L. Samuel implemented a program using hashing at about the same time.
• Open addressing with linear probing (relatively prime stepping) is credited to Amdahl, but Ershov (in Russia) had the same idea
What is hashing?
The process of building or finding information in the hash table.
Hashing is the antithesis of sorting.
The process of chopping up the key and mixing it up in various ways in order to obtain an index which will be uniformly distributed over the range of indices.
Implementation
Use hash function to map keys into positions in a hash table.
If element “e” has key “k” and “h” is hash function, then “e” is stored in position “h(k)” of table.
To search for “e”, compute “h(k)” to locate position. If no element, dictionary does not contain “e”.
Analysis ( Ideal Case )
O(b) time to initialize hash table (b number of positions or buckets in hash table)
O(1) time to perform insert, remove, search.
Collision
If key range too large, use hash table with fewer buckets and a hash function which maps multiple keys to same bucket:
h(k1) = b = h(k2): k1 and k2 have collision at slot b
Example: hash table with 11 buckets
h(k) = k%11
80 ® 3 (80%11= 3), 40 ® 7, 65 ® 10
58 ® 3 collision
Collision Resolution Policies
Two classes:
Open hashing
Closed hashing
Difference has to do with whether collisions are stored outside the table (open hashing) or whether collisions result in storing one of the records at another slot in the table (closed hashing)
Open Hashing
Open hashing is a rehash strategy:
“If we try to place x in bucket h(x) and find it occupied, find alternative location h1(x), h2(x), etc. Try each in order, if none empty table is full,”
h(x) is called home bucket
Simplest rehash strategy is called linear hashing
hi(x) = (h(x) + i) % D
In general, our collision resolution strategy is to generate a sequence of hash table slots (probe sequence) that can hold the record; test each slot until find empty one (probing)
Example : Linear (OpeN) Hashing
D=8, keys a, b, c, d have hash values h(a)=3, h(b)=0, h©=4, h(d)=3.
Closed Hashing
Each bucket in the hash table is the head of a linked list
All elements that hash to a particular bucket are placed on that bucket’s linked list
Records within a bucket can be ordered in several ways
by order of insertion, by key value order, or by frequency of access order