Haskell Hierarchical Libraries (base package)Source codeContentsIndex
Data.HashTable
Portabilityportable
Stabilityprovisional
Maintainerlibraries@haskell.org
Contents
Basic hash table operations
Converting to and from lists
Hash functions
Diagnostics
Description
An implementation of extensible hash tables, as described in Per-Ake Larson, Dynamic Hash Tables, CACM 31(4), April 1988, pp. 446--457. The implementation is also derived from the one in GHC's runtime system (ghc/rts/Hash.{c,h}).
Synopsis
data HashTable key val
new :: (key -> key -> Bool) -> (key -> Int32) -> IO (HashTable key val)
insert :: HashTable key val -> key -> val -> IO ()
delete :: HashTable key val -> key -> IO ()
lookup :: HashTable key val -> key -> IO (Maybe val)
update :: HashTable key val -> key -> val -> IO Bool
fromList :: Eq key => (key -> Int32) -> [(key, val)] -> IO (HashTable key val)
toList :: HashTable key val -> IO [(key, val)]
hashInt :: Int -> Int32
hashString :: String -> Int32
prime :: Int32
longestChain :: HashTable key val -> IO [(key, val)]
Basic hash table operations
data HashTable key val
new
:: (key -> key -> Bool)eq: An equality comparison on keys
-> (key -> Int32)hash: A hash function on keys
-> IO (HashTable key val)Returns: an empty hash table

Creates a new hash table. The following property should hold for the eq and hash functions passed to new:

   eq A B  =>  hash A == hash B
insert :: HashTable key val -> key -> val -> IO ()

Inserts a key/value mapping into the hash table.

Note that insert doesn't remove the old entry from the table - the behaviour is like an association list, where lookup returns the most-recently-inserted mapping for a key in the table. The reason for this is to keep insert as efficient as possible. If you need to update a mapping, then we provide update.

delete :: HashTable key val -> key -> IO ()
Remove an entry from the hash table.
lookup :: HashTable key val -> key -> IO (Maybe val)
Looks up the value of a key in the hash table.
update :: HashTable key val -> key -> val -> IO Bool

Updates an entry in the hash table, returning True if there was already an entry for this key, or False otherwise. After update there will always be exactly one entry for the given key in the table.

insert is more efficient than update if you don't care about multiple entries, or you know for sure that multiple entries can't occur. However, update is more efficient than delete followed by insert.

Converting to and from lists
fromList :: Eq key => (key -> Int32) -> [(key, val)] -> IO (HashTable key val)
Convert a list of key/value pairs into a hash table. Equality on keys is taken from the Eq instance for the key type.
toList :: HashTable key val -> IO [(key, val)]
Converts a hash table to a list of key/value pairs.
Hash functions

This implementation of hash tables uses the low-order n bits of the hash value for a key, where n varies as the hash table grows. A good hash function therefore will give an even distribution regardless of n.

If your keyspace is integrals such that the low-order bits between keys are highly variable, then you could get away with using id as the hash function.

We provide some sample hash functions for Int and String below.

hashInt :: Int -> Int32

A sample (and useful) hash function for Int and Int32, implemented by extracting the uppermost 32 bits of the 64-bit result of multiplying by a 32-bit constant. The constant is from Knuth, derived from the golden ratio:

 golden = round ((sqrt 5 - 1) * 2^31) :: Int
hashString :: String -> Int32

A sample hash function for Strings. We keep multiplying by the golden ratio and adding. The implementation is:

 hashString = foldl' f 0
   where f m c = fromIntegral (fromEnum c + 1) * golden + mulHi m golden

Note that this has not been extensively tested for reasonability, but Knuth argues that repeated multiplication by the golden ratio will minimize gaps in the hash space.

prime :: Int32
A prime larger than the maximum hash table size
Diagnostics
longestChain :: HashTable key val -> IO [(key, val)]
This function is useful for determining whether your hash function is working well for your data set. It returns the longest chain of key/value pairs in the hash table for which all the keys hash to the same bucket. If this chain is particularly long (say, longer than 14 elements or so), then it might be a good idea to try a different hash function.
Produced by Haddock version 0.8