optimal fast hashing yossi kanizo (technion, israel) joint work with isaac keslassy (technion,...
Post on 21-Dec-2015
215 views
TRANSCRIPT
Optimal Fast Hashing Optimal Fast Hashing
Yossi Kanizo (Technion, Israel)
Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
Hash Tables for Networking DevicesHash Tables for Networking Devices
Hash tables and hash-based structures are often used in high-speed devices Heavy-hitter flow identification Flow state keeping Flow counter management Virus signature scanning IP address lookup algorithms
For hash tables, ideally, 1 memory access per element insertion Maximize throughput & minimize power
Hash Tables for Networking DevicesHash Tables for Networking Devices
Collisions are unavoidable wasted memory accesses
For load≤1, let a and d be the average and worst-case time (number of memory accesses) per element insertion Initially empty buckets Only insertions (no deletions)
Objective: Minimize a and d
1 2 3 4 5 6 7 8 9
Memory
Why We CareWhy We Care
On-chip memory: memory accesses power consumption
Off-chip memory: memory accesses lost on/off-chip pin capacity
Datacenters: memory accesses network & server load
Parallelism does not help reduce these costs d serial or parallel memory accesses have same cost
Traditional Hash Table SchemesTraditional Hash Table Schemes
Example 1: linked lists (chaining)
1 2 3 4 5 6 7 8 9
Memory 1 2
3
4 5
6
7
8
9
Traditional Hash Table SchemesTraditional Hash Table Schemes
Example 1: linked lists (chaining) Example 2: linear probing (open addressing)
Problem: the worst-case time cannot be bounded by a constant d
1 2 3 4 5 6 7 8 9
Memory 1 234 5
6
8
High-Speed HardwareHigh-Speed Hardware
Enable overflows: if time exceeds d → overflow list Can be stored in expensive CAM Otherwise, overflow elements = lost elements
Bucket contains h elements E.g.: 128-bit memory word h=4 elements of 32 bits Assumption: Access cost (read & write word) = 1 cycle
1 2 3 4 5 6 7 8 9Memory
4
7
1 5
3
6
2 8
h
CAM
9
Problem FormulationProblem Formulation
1 2 3 4 5 6 7 8 9Memory
4
7
1 5
3
6
2 8
h
CAM
9
Given average time a and worst-case time d,
Minimize overflow rate
Example: Power of Example: Power of dd Random Choices Random Choices
d hash functions: pick least loaded bucket. Break ties u.a.r. [Azar et al.] or to the left
[Vöcking] Intuition: can reach low
… but average time a = worst-case time d wasted memory accesses
1 2 3 4 5 6 7 8 9Memory
4
7
1 5
3
6
2 8
h
CAM
9
Main ResultsMain Results
Lower bound on overflow for any scheme
Optimality of three schemes on successively larger ranges: SIMPLE GREEDY MHT (optimal when subtable sizes fall
geometrically)
Overflow Lower BoundOverflow Lower Bound
Objective: given any online scheme with average a and worst-case d, find lower-bound on overflow .
[h=4, load=n/(mh)=0.95, fixed d]
No scheme can achieve (capacity region)
Overflow Lower BoundOverflow Lower Bound
Problem: the number of hashes of each element depends on the instantaneous memory state. How can we bound the overflow?
1 2 3 4 5 6 7 8 9
4
7
1 5
3
6
2 8
h
CAM
910
11
12
13 14
1 2 3 4 5 6 7 8 9
4
7
1 5
3
6
2 8
h
CAM
910
11
12
Overflow Lower Bound: Proof IntuitionOverflow Lower Bound: Proof Intuition
Assume hashes are uniform. Then relax constraints: Offline, No worst-case d, and Uncolor the hashes
(n elements) x (a hashes per element) = an uncolored hashes Lower-bound on expected number of unhashed memory bins
13 1413 14 14141414
14
14
Overflow Lower BoundOverflow Lower Bound
Result: closed-form lower-bound formula Given n elements in m buckets of height h:
Valid also for non-uniform hashes
Defines a capacity region for high-throughput hashing
Lower-Bound ExampleLower-Bound Example
[h=4, load=n/(mh)=0.95]
For 3% overflow rate, throughput can be at most
1/a = 2/3 of memory rate
Overflow Lower BoundOverflow Lower Bound
Example: d-left scheme: low overflow , but high average memory access rate a
[h=4, load=n/(mh)=0.95, m=5,000]
Main ResultsMain Results
Lower bound on overflow for any scheme
Optimality of three schemes on successively larger ranges: SIMPLE GREEDY MHT (optimal when subtable sizes fall
geometrically)
The SIMPLE SchemeThe SIMPLE Scheme
SIMPLE scheme: single hash function Looks like truncated linked list
Intuition: The final state only depends on the hashes, not on the successive states can uncolor elements
1 2 3 4 5 6 7 8 9Memory
4
7
1 5
3
6
2 8
h
CAM
9
The SIMPLE Scheme: Proof IntutionThe SIMPLE Scheme: Proof Intution
Same reasoning as offline lower-bound Result: for a = 1, SIMPLE is optimal (i.e.
achieves min ) Formal proof relies on mean-field analysis
(differential equations with continuous-time fluid limit)
1 2 3 4 5 6 7 8 9Memory
4
7
1 5
3
6
2 8
h
CAM
9
10
11When all elements have been hashed:
Performance of SIMPLE SchemePerformance of SIMPLE Scheme
[h=4, load=0.95, m=5,000]
The lower bound can actually be achieved
for a=1
The GREEDY SchemeThe GREEDY Scheme
Using uniform hashes, try to insert each element greedily until either inserted or d
1 2 3 4 5 6 7 8 9Memory
4
7
1 5
3
6
2 8
h
CAM
9
d=2
The GREEDY Scheme: Proof IntuitionThe GREEDY Scheme: Proof Intuition
Un-coloring argument: 2nd try of collided element new element with 1 hash
(GREEDY with x elements, i.e. x∙a(x) hashes) (SIMPLE with x∙a(x) elements)
Optimal: For any xn elements Optimality true until no more elements can be added:
cut-off point aco ≡ a(n)
1 2 3 4 5 6 7 8 9
4
7
1 5
3
6
2 8
h
CAM
910
11
12
13 1413 141414
Performance of GREEDY SchemePerformance of GREEDY Scheme
[d=4, h=4, load=0.95, m=5,000]
The GREEDY scheme is always optimal until aco
Performance of GREEDY SchemePerformance of GREEDY Scheme
[d=4, h=4, load=0.95, m=5,000]
Overflow rate worse than 4-left, but better throughput (1/a)
The MHT SchemeThe MHT Scheme
MHT (Multi-Level Hash Table) [Broder&Karlin]: d successive subtables with their d hash functions
1 2 3 4 5 6 7Memory
4
7
15
3
6
2 8
h
CAM
9
1st Subtable 2nd Subtable 3rd Subtable
Performance of MHT SchemePerformance of MHT Scheme
Optimality of MHT until cut-off point aco(MHT) Proof that subtable sizes fall geometrically
Confirmed in simulations
[d=4, h=4, load=0.95, m=5,000]
Overflow rate close to 4-left, with much better throughput (1/a)
ConclusionConclusion
Established “capacity region” of high-speed hashing
Showed that three schemes are optimal on different ranges
MHT is optimal when subtable sizes fall geometrically Long-known rule-of-thumb
The MHT cut-off point is larger than the Greedy one
Thank you.Thank you.