perfect_hash.py generates perfect minimal hash
functions for a set of keys using the algorithm described in
"Optimal algorithms for minimal perfect hashing", G. Havas, B.S.
Majewski. The paper is available as a technical report from the CS
department, University of Queensland (ftp://ftp.cs.uq.oz.au/).
The script reads an input file containing pairs of lines. The
first line of each pair is a string containing a key, and the
second line is an integer which is the desired hash code. It'll
then churn through a bunch of computations, and generate Python
code that implements a perfecthash() function that
will map all the keys to the specified hash codes.
The algorithm used works like this:
- You have K keys that you want to perfectly hash to a bunch of hash values.
- Choose a number N larger than K. This is the number of vertices in a graph G, and also the size of the resulting table.
- Pick two random hash functions f1, f2, that output values from 0...N-1.
- Complete the graph G by drawing various edges:
for key in keys: h1 = f1(key) ; h2 = f2(key) Draw an edge between vertices h1 and h2 of the graph. Associate the desired hash value with that edge. - Check if G is acyclic; if no, go back to step 2.
- Assign values to each vertex such that, for each edge, you can add the values for the two vertices and get the desired value for that edge -- which is the desired hash key. This task is dead easy, because the graph is acyclic. This is done by picking a vertex V, and assigning it a value of 0. You then do a depth-first search, assigning values to new vertices so that they sum up properly.
- f1, f2, and G now make up your perfect hash function.