perfect_hash.py generates perfect minimal hash functions for a set of keys using the algorithm described in "Optimal algorithms for minimal perfect hashing", G. Havas, B.S. Majewski. The paper is available as a technical report from the CS department, University of Queensland (ftp://ftp.cs.uq.oz.au/).

The script reads an input file containing pairs of lines. The first line of each pair is a string containing a key, and the second line is an integer which is the desired hash code. It'll then churn through a bunch of computations, and generate Python code that implements a perfecthash() function that will map all the keys to the specified hash codes.

The algorithm used works like this:

  1. You have K keys that you want to perfectly hash to a bunch of hash values.
  2. Choose a number N larger than K. This is the number of vertices in a graph G, and also the size of the resulting table.
  3. Pick two random hash functions f1, f2, that output values from 0...N-1.
  4. Complete the graph G by drawing various edges:
    for key in keys:
        h1 = f1(key) ; h2 = f2(key)
        Draw an edge between vertices h1 and h2 of the graph.
        Associate the desired hash value with that edge.
    
  5. Check if G is acyclic; if no, go back to step 2.
  6. Assign values to each vertex such that, for each edge, you can add the values for the two vertices and get the desired value for that edge -- which is the desired hash key. This task is dead easy, because the graph is acyclic. This is done by picking a vertex V, and assigning it a value of 0. You then do a depth-first search, assigning values to new vertices so that they sum up properly.
  7. f1, f2, and G now make up your perfect hash function.

Download [12K] Signature


[Contact me]