ghc-compact-0.1.0.0: In memory storage of deeply evaluated data structure
Copyright(c) The University of Glasgow 2001-2009
(c) Giovanni Campagna <gcampagn@cs.stanford.edu> 2014
LicenseBSD-style (see the file LICENSE)
Maintainerlibraries@haskell.org
Stabilityunstable
Portabilitynon-portable (GHC Extensions)
Safe HaskellNone
LanguageHaskell2010

GHC.Compact

Description

This module provides a data structure, called a Compact, for holding immutable, fully evaluated data in a consecutive block of memory. Compact regions are good for two things:

  1. Data in a compact region is not traversed during GC; any incoming pointer to a compact region keeps the entire region live. Thus, if you put a long-lived data structure in a compact region, you may save a lot of cycles during major collections, since you will no longer be (uselessly) retraversing this data structure.
  2. Because the data is stored contiguously, you can easily dump the memory to disk and/or send it over the network. For applications that are not bandwidth bound (GHC's heap representation can be as much of a x4 expansion over a binary serialization), this can lead to substantial speedups.

For example, suppose you have a function loadBigStruct :: IO BigStruct, which loads a large data structure from the file system. You can "compact" the structure with the following code:

     do r <- compact =<< loadBigStruct
        let x = getCompact r :: BigStruct
        -- Do things with x

Note that compact will not preserve internal sharing; use compactWithSharing (which is 10x slower) if you have cycles and/or must preserve sharing. The Compact pointer r can be used to add more data to a compact region; see compactAdd or compactAddWithSharing.

The implementation of compact regions is described by:

  • Edward Z. Yang, Giovanni Campagna, Ömer Ağacan, Ahmed El-Hassany, Abhishek Kulkarni, Ryan Newton. "/Efficient communication and Collection with Compact Normal Forms/". In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming. September 2015. http://ezyang.com/compact.html

This library is supported by GHC 8.2 and later.

Synopsis

The Compact type

data Compact a Source #

A Compact contains fully evaluated, pure, immutable data.

Compact serves two purposes:

  • Data stored in a Compact has no garbage collection overhead. The garbage collector considers the whole Compact to be alive if there is a reference to any object within it.
  • A Compact can be serialized, stored, and deserialized again. The serialized data can only be deserialized by the exact binary that created it, but it can be stored indefinitely before deserialization.

Compacts are self-contained, so compacting data involves copying it; if you have data that lives in two Compacts, each will have a separate copy of the data.

The cost of compaction is fully evaluating the data + copying it. However, because compact does not stop-the-world, retaining internal sharing during the compaction process is very costly. The user can choose whether to compact or compactWithSharing.

When you have a Compact a, you can get a pointer to the actual object in the region using getCompact. The Compact type serves as handle on the region itself; you can use this handle to add data to a specific Compact with compactAdd or compactAddWithSharing (giving you a new handle which corresponds to the same compact region, but points to the newly added object in the region). At the moment, due to technical reasons, it's not possible to get the Compact a if you only have an a, so make sure you hold on to the handle as necessary.

Data in a compact doesn't ever move, so compacting data is also a way to pin arbitrary data structures in memory.

There are some limitations on what can be compacted:

  • Functions. Compaction only applies to data.
  • Pinned ByteArray# objects cannot be compacted. This is for a good reason: the memory is pinned so that it can be referenced by address (the address might be stored in a C data structure, for example), so we can't make a copy of it to store in the Compact.
  • Objects with mutable pointer fields (e.g. IORef, MutableArray) also cannot be compacted, because subsequent mutation would destroy the property that a compact is self-contained.

If compaction encounters any of the above, a CompactionFailed exception will be thrown by the compaction operation.

Constructors

Compact Compact# a (MVar ()) 

Compacting data

compact :: a -> IO (Compact a) Source #

Compact a value. O(size of unshared data)

If the structure contains any internal sharing, the shared data will be duplicated during the compaction process. This will not terminate if the structure contains cycles (use compactWithSharing instead).

The object in question must not contain any functions or data with mutable pointers; if it does, compact will raise an exception. In the future, we may add a type class which will help statically check if this is the case or not.

compactWithSharing :: a -> IO (Compact a) Source #

Compact a value, retaining any internal sharing and cycles. O(size of data)

This is typically about 10x slower than compact, because it works by maintaining a hash table mapping uncompacted objects to compacted objects.

The object in question must not contain any functions or data with mutable pointers; if it does, compact will raise an exception. In the future, we may add a type class which will help statically check if this is the case or not.

compactAdd :: Compact b -> a -> IO (Compact a) Source #

Add a value to an existing Compact. This will help you avoid copying when the value contains pointers into the compact region, but remember that after compaction this value will only be deallocated with the entire compact region.

Behaves exactly like compact with respect to sharing and what data it accepts.

compactAddWithSharing :: Compact b -> a -> IO (Compact a) Source #

Add a value to an existing Compact, like compactAdd, but behaving exactly like compactWithSharing with respect to sharing and what data it accepts.

Inspecting a Compact

getCompact :: Compact a -> a Source #

Retrieve a direct pointer to the value pointed at by a Compact reference. If you have used compactAdd, there may be multiple Compact references into the same compact region. Upholds the property:

inCompact c (getCompact c) == True

inCompact :: Compact b -> a -> IO Bool Source #

Check if the second argument is inside the passed Compact.

isCompact :: a -> IO Bool Source #

Check if the argument is in any Compact. If true, the value in question is also fully evaluated, since any value in a compact region must be fully evaluated.

compactSize :: Compact a -> IO Word Source #

Returns the size in bytes of the compact region.

Other utilities

compactResize :: Compact a -> Word -> IO () Source #

Experimental This function doesn't actually resize a compact region; rather, it changes the default block size which we allocate when the current block runs out of space, and also appends a block to the compact region.

Internal operations

mkCompact :: Compact# -> a -> State# RealWorld -> (# State# RealWorld, Compact a #) Source #

Make a new Compact object, given a pointer to the true underlying region. You must uphold the invariant that a lives in the compact region.

compactSized Source #

Arguments

:: Int

Size of the compact region, in bytes

-> Bool

Whether to retain internal sharing

-> a 
-> IO (Compact a) 

Transfer a into a new compact region, with a preallocated size (in bytes), possibly preserving sharing or not. If you know how big the data structure in question is, you can save time by picking an appropriate block size for the compact region.