The Hugs-GHC Extension Libraries: Weak

16. Weak

The Weak library provides a "weak pointer" abstraction, giving the user some control over the garbage collection of specified objects, and allowing objects to be "finalised" with an arbitrary Haskell IO computation when they die.

Weak pointers partially replace the old foreign object interface, as we will explain later.

16.1 Module Signature


module Weak (
        Weak,                   -- abstract
        -- instance Eq (Weak v)  

        mkWeak,                 -- :: k -> v -> IO () -> IO (Weak v)
        deRefWeak,              -- :: Weak v -> IO (Maybe v)
        finalise,               -- :: Weak v -> IO ()

        -- Not yet implemented
        -- replaceFinaliser     -- :: Weak v -> IO () -> IO ()

        mkWeakPtr,              -- :: k -> IO () -> IO (Weak k)
        mkWeakPair,             -- :: k -> v -> IO () -> IO (Weak (k,v))
        mkWeakNoFinaliser,      -- :: k -> v -> IO (Weak v)
        addFinaliser,           -- :: k -> IO () -> IO ()
        addForeignFinaliser     -- :: ForeignObj -> IO () -> IO ()
   ) where

16.2 Weak pointers

In general terms, a weak pointer is a reference to an object that is not followed by the garbage collector --- that is, the existence of a weak pointer to an object has no effect on the lifetime of that object. A weak pointer can be de-referenced to find out whether the object it refers to is still alive or not, and if so to return the object itself.

Weak pointers are particularly useful for caches and memo tables. To build a memo table, you build a data structure mapping from the function argument (the key) to its result (the value). When you apply the function to a new argument you first check whether the key/value pair is already in the memo table. The key point is that the memo table itself should not keep the key and value alive. So the table should contain a weak pointer to the key, not an ordinary pointer. The pointer to the value must not be weak, because the only reference to the value might indeed be from the memo table.

So it looks as if the memo table will keep all its values alive for ever. One way to solve this is to purge the table occasionally, by deleting entries whose keys have died.

The weak pointers in this library support another approach, called finalisation. When the key referred to by a weak pointer dies, the storage manager arranges to run a programmer-specified finaliser. In the case of memo tables, for example, the finaliser could remove the key/value pair from the memo table.

Another difficulty with the memo table is that the value of a key/value pair might itself contain a pointer to the key. So the memo table keeps the value alive, which keeps the key alive, even though there may be no other references to the key so both should die. The weak pointers in this library provide a slight generalisation of the basic weak-pointer idea, in which each weak pointer actually contains both a key and a value. We describe this in more detail below.

16.3 The simple interface


        mkWeakPtr    :: a -> IO () -> IO (Weak a)
        deRefWeak    :: Weak a -> IO (Maybe a)
        addFinaliser :: a -> IO () -> IO ()

mkWeakPtr takes a value of any type a, and a finaliser of type IO (), and returns a weak pointer object referring to the value, of type Weak a. It is in the IO monad because it has the side effect of arranging that the finaliser will be run when the object dies. In what follows, a ``weak pointer object'', or ``weak pointer'' for short, means precisely ``a Haskell value of type Weak t'' for some type t. A weak pointer (object) is a first-class Haskell value; it can be passed to functions, stored in data structures, and so on.

deRefWeak dereferences a weak pointer, returning Just v if the value is still alive. If the key has already died, then deRefWeak returns Nothing; that's why it's in the IO monad - the return value of deRefWeak depends on when the garbage collector runs.

addFinaliser is just another name for mkWeakPtr except that it throws the weak pointer itself away. (The runtime system will remember that the weak pointer and hence the finaliser exists even if the program has forgotten it.)


  addFinaliser :: a -> IO () -> IO ()
  addFinaliser v f = do { mkWeakPtr v f; return () }

The effect of addFinaliser is simply that the finaliser runs when the referenced object dies.

The following properties hold:

deRefWeak returns the original object until that object is considered dead; it returns Nothing subsequently.
Every finaliser will eventually be run, exactly once, either soon after the object dies, or at the end of the program. There is no requirement for the programmer to hold onto the weak pointer itself; finalisation is completely unaffected by whether the weak pointer itself is alive.
There may be multiple weak pointers to a single object. In this case, the finalisers for each of these weak pointers will all be run in some arbitrary order, or perhaps concurrently, when the object dies. If the programmer specifies a finaliser that assumes it has the only reference to an object (for example, a file that it wishes to close), then the programmer must ensure that there is only one such finaliser.
The storage manager attempts to run the finaliser(s) for an object soon after the object dies, but promptness is not guaranteed. (What is guaranteed is that the finaliser will eventually run, exactly once.)
At the moment when a finaliser is run, a call to deRefWeak will return Nothing.
A finaliser may contain a pointer to the object, but that pointer will not keep the object alive. For example:
f :: Show a => a -> IO a f x = addFinaliser x (print (show x))
Here the finaliser print (show x) contains a reference to x itself, but that does not keep x alive. When that is the only reference to x, the finaliser is run; and the message appears on the screen.
A finaliser may even resurrect the object, by (say) storing it in some global data structure.

16.4 The general interface

The Weak library offers a slight generalisation of the simple weak pointers described so far:


        mkWeak :: k -> v -> IO () -> IO (Weak v)

mkWeak takes a key of any type k and a value of any type v, as well as a finaliser, and returns a weak pointer of type Weak v.

deRefWeak returns the value only, not the key, as its type (given above) implies:


        deRefWeak :: Weak a -> IO (Maybe a)

However, deRefWeak returns Nothing if the key, not the value, has died. Furthermore, references from the value to the key do not keep the key alive, in the same way that the finaliser does not keep the key alive.

Simple weak pointers are readily defined in terms of these more general weak pointers:


  mkWeakPtr :: a -> IO () -> IO (Weak a)
  mkWeakPtr v f = mkWeak v v f

These more general weak pointers are enough to implement memo tables properly.

A weak pointer can be finalised early, using the finalise operation:


finalise :: Weak v -> IO ()

When you don't need a finaliser, we provide the following operation:


mkWeakNoFinaliser :: k -> v -> IO (Weak v)
mkWeakNoFinaliser k v = mkWeak k v (return ())

Which creates a weak pointer with a null finaliser. Lots of null finalisers can be expensive, because each one runs in a separate thread, so the intention is that mkWeakNoFinaliser> avoids all the extra costs by generating a special kind of weak pointer without a finaliser. So although the semantics of mkWeakNoFinaliser is as given above, its actual implementation is somewhat different.

`16.5 A precise semantics`



The above informal specification is fine for simple situations,
but matters can get complicated.  In particular, it needs to
be clear exactly when a key dies, so that any weak pointers 
that refer to it can be finalised.
Suppose, for example, the value of one weak pointer refers
to the key of another...does that keep the key alive?
The behaviour is simply this:


 If a weak pointer (object) refers to an unreachable
key, it may be finalised.
 Finalisation means (a) arrange that subsequent calls
to deRefWeak return Nothing; and (b) run the finaliser.

This behaviour depends on what it means for a key to be reachable.
Informally,
something is reachable if it can be reached by following ordinary
pointers from the root set, but not following weak pointers.
We define reachability more precisely as 
follows
A heap object is reachable if:


 It is a member of the root set.
 It is directly pointed to by a reachable object, other than
a weak pointer object.
 It is a weak pointer object whose key is reachable.
 It is the value or finaliser of an object whose key is
reachable.

The root set consists of all runnable threads, and all stable pointers
(see Section 
Stable Pointers).
NOTE: currently all top-level objects are considered to be reachable,
although we hope to remove this restriction in the future.  A
Char or small Int will also be constantly reachable, since
the garbage collector replaces heap-resident Chars and small
Ints with pointers to static copies.
Notice that a pointer to the key from its associated 
value or finaliser does not make the key reachable.
However, if the key is reachable some other way, then the value
and the finaliser are reachable, and so, therefore, are any other
keys they refer to directly or indirectly.


 16.6 Finalisation for foreign objects



A foreign object is some data that lives outside the Haskell heap, for
example some malloced data in C land.  It's useful to be able to
know when the Haskell program no longer needs the malloced data,
so it can be freed.  We can use weak pointers and finalisers for
this, but we have to be careful: the foreign data is usually
referenced by an address, ie. an Addr (see Section 
Addr), and we must retain the invariant that
if the Haskell program still needs the foreign object, then it
retains the Addr object in the heap.  This invariant isn't
guaranteed to hold if we use Addr, because an Addr consists
of a box around a raw address Addr#.  If the Haskell program can
manipulate the Addr# object independently of the heap-resident
Addr, then the foreign object could be inadvertently finalised
early, because a weak pointer to the Addr would find no more
references to its key and trigger the finaliser despite the fact that
the program still holds the Addr# and intends to use it.

To avoid this somewhat subtle race condition, we use another type of foreign address, called ForeignObj (see Section Foreign). Historical note: ForeignObj is identical to the old ForeignObj except that it no longer supports finalisation - that's provided by the weak pointerfinalisation mechanism above.

A ForeignObj is basically an address, but the ForeignObj itself is a heap-resident object and can therefore be watched by weak pointers. A ForeignObj can be passed to C functions (in which case the C function gets a straightforward pointer), but it cannot be decomposed into an Addr#.

Next Previous Contents