Copyright | (c) The University of Glasgow 2001-2009 (c) Giovanni Campagna <gcampagn@cs.stanford.edu> 2014 |
---|---|
License | BSD-style (see the file LICENSE) |
Maintainer | libraries@haskell.org |
Stability | unstable |
Portability | non-portable (GHC Extensions) |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
This module provides a data structure, called a Compact
, for
holding immutable, fully evaluated data in a consecutive block of memory.
Compact regions are good for two things:
- Data in a compact region is not traversed during GC; any incoming pointer to a compact region keeps the entire region live. Thus, if you put a long-lived data structure in a compact region, you may save a lot of cycles during major collections, since you will no longer be (uselessly) retraversing this data structure.
- Because the data is stored contiguously, you can easily dump the memory to disk and/or send it over the network. For applications that are not bandwidth bound (GHC's heap representation can be as much of a x4 expansion over a binary serialization), this can lead to substantial speedups.
For example, suppose you have a function loadBigStruct :: IO BigStruct
,
which loads a large data structure from the file system. You can "compact"
the structure with the following code:
do r <-compact
=<< loadBigStruct let x =getCompact
r :: BigStruct -- Do things with x
Note that compact
will not preserve internal sharing; use
compactWithSharing
(which is 10x slower) if you have cycles and/or
must preserve sharing. The Compact
pointer r
can be used
to add more data to a compact region; see compactAdd
or
compactAddWithSharing
.
The implementation of compact regions is described by:
- Edward Z. Yang, Giovanni Campagna, Ömer Ağacan, Ahmed El-Hassany, Abhishek Kulkarni, Ryan Newton. "/Efficient communication and Collection with Compact Normal Forms/". In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming. September 2015. http://ezyang.com/compact.html
This library is supported by GHC 8.2 and later.
Synopsis
- data Compact a = Compact Compact# a !(MVar ())
- compact :: a -> IO (Compact a)
- compactWithSharing :: a -> IO (Compact a)
- compactAdd :: Compact b -> a -> IO (Compact a)
- compactAddWithSharing :: Compact b -> a -> IO (Compact a)
- getCompact :: Compact a -> a
- inCompact :: Compact b -> a -> IO Bool
- isCompact :: a -> IO Bool
- compactSize :: Compact a -> IO Word
- compactResize :: Compact a -> Word -> IO ()
- mkCompact :: Compact# -> a -> State# RealWorld -> (# State# RealWorld, Compact a #)
- compactSized :: Int -> Bool -> a -> IO (Compact a)
The Compact type
A Compact
contains fully evaluated, pure, immutable data.
Compact
serves two purposes:
- Data stored in a
Compact
has no garbage collection overhead. The garbage collector considers the wholeCompact
to be alive if there is a reference to any object within it. - A
Compact
can be serialized, stored, and deserialized again. The serialized data can only be deserialized by the exact binary that created it, but it can be stored indefinitely before deserialization.
Compacts are self-contained, so compacting data involves copying
it; if you have data that lives in two Compact
s, each will have a
separate copy of the data.
The cost of compaction is fully evaluating the data + copying it. However,
because compact
does not stop-the-world, retaining internal sharing during
the compaction process is very costly. The user can choose whether to
compact
or compactWithSharing
.
When you have a
, you can get a pointer to the actual object
in the region using Compact
agetCompact
. The Compact
type
serves as handle on the region itself; you can use this handle
to add data to a specific Compact
with compactAdd
or
compactAddWithSharing
(giving you a new handle which corresponds
to the same compact region, but points to the newly added object
in the region). At the moment, due to technical reasons,
it's not possible to get the
if you only have an Compact
aa
,
so make sure you hold on to the handle as necessary.
Data in a compact doesn't ever move, so compacting data is also a way to pin arbitrary data structures in memory.
There are some limitations on what can be compacted:
- Functions. Compaction only applies to data.
- Pinned
ByteArray#
objects cannot be compacted. This is for a good reason: the memory is pinned so that it can be referenced by address (the address might be stored in a C data structure, for example), so we can't make a copy of it to store in theCompact
. - Objects with mutable pointer fields (e.g.
IORef
,MutableArray
) also cannot be compacted, because subsequent mutation would destroy the property that a compact is self-contained.
If compaction encounters any of the above, a CompactionFailed
exception will be thrown by the compaction operation.
Compacting data
compact :: a -> IO (Compact a) Source #
Compact a value. O(size of unshared data)
If the structure contains any internal sharing, the shared data
will be duplicated during the compaction process. This will
not terminate if the structure contains cycles (use compactWithSharing
instead).
The object in question must not contain any functions or data with mutable
pointers; if it does, compact
will raise an exception. In the future, we
may add a type class which will help statically check if this is the case or
not.
compactWithSharing :: a -> IO (Compact a) Source #
Compact a value, retaining any internal sharing and cycles. O(size of data)
This is typically about 10x slower than compact
, because it works
by maintaining a hash table mapping uncompacted objects to
compacted objects.
The object in question must not contain any functions or data with mutable
pointers; if it does, compact
will raise an exception. In the future, we
may add a type class which will help statically check if this is the case or
not.
compactAdd :: Compact b -> a -> IO (Compact a) Source #
Add a value to an existing Compact
. This will help you avoid
copying when the value contains pointers into the compact region,
but remember that after compaction this value will only be deallocated
with the entire compact region.
Behaves exactly like compact
with respect to sharing and what data
it accepts.
compactAddWithSharing :: Compact b -> a -> IO (Compact a) Source #
Add a value to an existing Compact
, like compactAdd
,
but behaving exactly like compactWithSharing
with respect to sharing and
what data it accepts.
Inspecting a Compact
getCompact :: Compact a -> a Source #
Retrieve a direct pointer to the value pointed at by a Compact
reference.
If you have used compactAdd
, there may be multiple Compact
references
into the same compact region. Upholds the property:
inCompact c (getCompact c) == True
inCompact :: Compact b -> a -> IO Bool Source #
Check if the second argument is inside the passed Compact
.
isCompact :: a -> IO Bool Source #
Check if the argument is in any Compact
. If true, the value in question
is also fully evaluated, since any value in a compact region must
be fully evaluated.
Other utilities
compactResize :: Compact a -> Word -> IO () Source #
Experimental This function doesn't actually resize a compact region; rather, it changes the default block size which we allocate when the current block runs out of space, and also appends a block to the compact region.
Internal operations
mkCompact :: Compact# -> a -> State# RealWorld -> (# State# RealWorld, Compact a #) Source #
Make a new Compact
object, given a pointer to the true
underlying region. You must uphold the invariant that a
lives
in the compact region.
:: Int | Size of the compact region, in bytes |
-> Bool | Whether to retain internal sharing |
-> a | |
-> IO (Compact a) |
Transfer a
into a new compact region, with a preallocated size (in
bytes), possibly preserving sharing or not. If you know how big the data
structure in question is, you can save time by picking an appropriate block
size for the compact region.