The Weak
library provides a "weak pointer" abstraction, giving
the user some control over the garbage collection of specified
objects, and allowing objects to be "finalized" with an arbitrary
Haskell IO computation when they die.
Weak pointers partially replace the old foreign object interface, as we will explain later.
module Weak (
Weak, -- abstract
-- instance Eq (Weak v)
mkWeak, -- :: k -> v -> Maybe (IO ()) -> IO (Weak v)
deRefWeak, -- :: Weak v -> IO (Maybe v)
finalize, -- :: Weak v -> IO ()
-- Not yet implemented
-- replaceFinalizer -- :: Weak v -> IO () -> IO ()
mkWeakPtr, -- :: k -> Maybe (IO ()) -> IO (Weak k)
mkWeakPair, -- :: k -> v -> Maybe (IO ()) -> IO (Weak (k,v))
addFinalizer, -- :: k -> IO () -> IO ()
addForeignFinalizer -- :: ForeignObj -> IO () -> IO ()
) where
In general terms, a weak pointer is a reference to an object that is not followed by the garbage collector --- that is, the existence of a weak pointer to an object has no effect on the lifetime of that object. A weak pointer can be de-referenced to find out whether the object it refers to is still alive or not, and if so to return the object itself.
Weak pointers are particularly useful for caches and memo tables. To build a memo table, you build a data structure mapping from the function argument (the key) to its result (the value). When you apply the function to a new argument you first check whether the key/value pair is already in the memo table. The key point is that the memo table itself should not keep the key and value alive. So the table should contain a weak pointer to the key, not an ordinary pointer. The pointer to the value must not be weak, because the only reference to the value might indeed be from the memo table.
So it looks as if the memo table will keep all its values alive for ever. One way to solve this is to purge the table occasionally, by deleting entries whose keys have died.
The weak pointers in this library support another approach, called finalization. When the key referred to by a weak pointer dies, the storage manager arranges to run a programmer-specified finalizer. In the case of memo tables, for example, the finalizer could remove the key/value pair from the memo table.
Another difficulty with the memo table is that the value of a key/value pair might itself contain a pointer to the key. So the memo table keeps the value alive, which keeps the key alive, even though there may be no other references to the key so both should die. The weak pointers in this library provide a slight generalisation of the basic weak-pointer idea, in which each weak pointer actually contains both a key and a value. We describe this in more detail below.
mkWeakPtr :: a -> Maybe (IO ()) -> IO (Weak a)
deRefWeak :: Weak a -> IO (Maybe a)
addFinalizer :: a -> IO () -> IO ()
mkWeakPtr
takes a value of any type a
, and maybe a finalizer
of type IO ()
, and returns a weak pointer object referring to the
value, of type Weak a
. It is in the IO
monad because it has
the side effect of arranging that the finalizer (if there is one) will
be run when the object dies. In what follows, a ``weak pointer
object'', or ``weak pointer'' for short, means precisely ``a Haskell
value of type Weak t
'' for some type t
. A weak pointer
(object) is a first-class Haskell value; it can be passed to
functions, stored in data structures, and so on.
deRefWeak
dereferences a weak pointer, returning Just v
if
the value is still alive. If the key has already died, then
deRefWeak
returns Nothing
; that's why it's in the IO
monad - the return value of deRefWeak
depends on when the garbage
collector runs.
addFinalizer
is just another name for mkWeakPtr
except that
it throws the weak pointer itself away. (The runtime system will
remember that the weak pointer and hence the finalizer exists even if
the program has forgotten it.)
addFinalizer :: a -> IO () -> IO ()
addFinalizer v f = do { mkWeakPtr v f; return () }
The effect of addFinalizer
is simply that the finalizer runs when
the referenced object dies.
The following properties hold:
deRefWeak
returns the original object until
that object is considered dead; it returns Nothing
subsequently.deRefWeak
will return Nothing
.
f :: Show a => a -> IO a
f x = addFinalizer x (print (show x))
Here the finalizer print (show x)
contains a reference to x
itself, but that does not keep x
alive. When that is the only
reference to x
, the finalizer is run; and the message appears
on the screen.
The Weak
library offers a slight generalisation of
the simple weak pointers described so far:
mkWeak :: k -> v -> Maybe (IO ()) -> IO (Weak v)
mkWeak
takes a key of any type k
and a value of any type
v
, as well as a finalizer, and returns a weak pointer of type
Weak v
.
deRefWeak
returns the value only, not the key, as its
type (given above) implies:
deRefWeak :: Weak a -> IO (Maybe a)
However, deRefWeak
returns Nothing
if the key, not the
value, has died. Furthermore, references from the value to the key
do not keep the key alive, in the same way that the finalizer does
not keep the key alive.
Simple weak pointers are readily defined in terms of these more general weak pointers:
mkWeakPtr :: a -> Maybe (IO ()) -> IO (Weak a)
mkWeakPtr v f = mkWeak v v f
These more general weak pointers are enough to implement memo tables properly.
A weak pointer can be finalized early, using the finalize
operation:
finalize :: Weak v -> IO ()
The above informal specification is fine for simple situations, but matters can get complicated. In particular, it needs to be clear exactly when a key dies, so that any weak pointers that refer to it can be finalized. Suppose, for example, the value of one weak pointer refers to the key of another...does that keep the key alive?
The behaviour is simply this:
deRefWeak
return Nothing
; and (b) run the finalizer.This behaviour depends on what it means for a key to be reachable. Informally, something is reachable if it can be reached by following ordinary pointers from the root set, but not following weak pointers. We define reachability more precisely as follows A heap object is reachable if:
The root set consists of all runnable threads, and all stable pointers
(see Section
Stable Pointers).
NOTE: currently all top-level objects are considered to be reachable,
although we hope to remove this restriction in the future. A
Char
or small Int
will also be constantly reachable, since
the garbage collector replaces heap-resident Char
s and small
Int
s with pointers to static copies.
Notice that a pointer to the key from its associated value or finalizer does not make the key reachable. However, if the key is reachable some other way, then the value and the finalizer are reachable, and so, therefore, are any other keys they refer to directly or indirectly.
A foreign object is some data that lives outside the Haskell heap, for
example some malloc
ed data in C land. It's useful to be able to
know when the Haskell program no longer needs the malloc
ed data,
so it can be free
d. We can use weak pointers and finalizers for
this, but we have to be careful: the foreign data is usually
referenced by an address, ie. an Addr
(see Section
Addr), and we must retain the invariant that
if the Haskell program still needs the foreign object, then it
retains the Addr
object in the heap. This invariant isn't
guaranteed to hold if we use Addr
, because an Addr
consists
of a box around a raw address Addr#
. If the Haskell program can
manipulate the Addr#
object independently of the heap-resident
Addr
, then the foreign object could be inadvertently finalized
early, because a weak pointer to the Addr
would find no more
references to its key and trigger the finalizer despite the fact that
the program still holds the Addr#
and intends to use it.
To avoid this somewhat subtle race condition, we use another type of
foreign address, called ForeignObj
(see Section
Foreign). Historical note: ForeignObj
is identical to the old ForeignObj
except that it no longer
supports finalization - that's provided by the weak
pointer/finalization mechanism above.
A ForeignObj
is basically an address, but the ForeignObj
itself is a heap-resident object and can therefore be watched by weak
pointers. A ForeignObj
can be passed to C functions (in which
case the C function gets a straightforward pointer), but it cannot be
decomposed into an Addr#
.