7.2. Unboxed types and primitive operations

7.2. Unboxed types and primitive operations
Prev	Chapter 7. GHC Language Features	Next

GHC is built on a raft of primitive data types and operations; "primitive" in the sense that they cannot be defined in Haskell itself. While you really can use this stuff to write fast code, we generally find it a lot less painful, and more satisfying in the long run, to use higher-level language features and libraries. With any luck, the code you write will be optimised to the efficient unboxed version in any case. And if it isn't, we'd like to know about it.

All these primitive data types and operations are exported by the library GHC.Prim, for which there is detailed online documentation. (This documentation is generated from the file compiler/prelude/primops.txt.pp.)

If you want to mention any of the primitive data types or operations in your program, you must first import GHC.Prim to bring them into scope. Many of them have names ending in "#", and to mention such names you need the -XMagicHash extension (Section 7.3.2, “The magic hash”).

The primops make extensive use of unboxed types and unboxed tuples, which we briefly summarise here.

7.2.1. Unboxed types

Most types in GHC are boxed, which means that values of that type are represented by a pointer to a heap object. The representation of a Haskell Int, for example, is a two-word heap object. An unboxed type, however, is represented by the value itself, no pointers or heap allocation are involved.

Unboxed types correspond to the “raw machine” types you would use in C: Int# (long int), Double# (double), Addr# (void *), etc. The primitive operations (PrimOps) on these types are what you might expect; e.g., (+#) is addition on Int#s, and is the machine-addition that we all know and love—usually one instruction.

Primitive (unboxed) types cannot be defined in Haskell, and are therefore built into the language and compiler. Primitive types are always unlifted; that is, a value of a primitive type cannot be bottom. We use the convention (but it is only a convention) that primitive types, values, and operations have a # suffix (see Section 7.3.2, “The magic hash”). For some primitive types we have special syntax for literals, also described in the same section.

Primitive values are often represented by a simple bit-pattern, such as Int#, Float#, Double#. But this is not necessarily the case: a primitive value might be represented by a pointer to a heap-allocated object. Examples include Array#, the type of primitive arrays. A primitive array is heap-allocated because it is too big a value to fit in a register, and would be too expensive to copy around; in a sense, it is accidental that it is represented by a pointer. If a pointer represents a primitive value, then it really does point to that value: no unevaluated thunks, no indirections…nothing can be at the other end of the pointer than the primitive value. A numerically-intensive program using unboxed types can go a lot faster than its “standard” counterpart—we saw a threefold speedup on one example.

There are some restrictions on the use of primitive types:

The main restriction is that you can't pass a primitive value to a polymorphic function or store one in a polymorphic data type. This rules out things like [Int#] (i.e. lists of primitive integers). The reason for this restriction is that polymorphic arguments and constructor fields are assumed to be pointers: if an unboxed integer is stored in one of these, the garbage collector would attempt to follow it, leading to unpredictable space leaks. Or a seq operation on the polymorphic component may attempt to dereference the pointer, with disastrous results. Even worse, the unboxed value might be larger than a pointer (Double# for instance).
You cannot define a newtype whose representation type (the argument type of the data constructor) is an unboxed type. Thus, this is illegal:
```
  newtype A = MkA Int#
```
You cannot bind a variable with an unboxed type in a top-level binding.
You cannot bind a variable with an unboxed type in a recursive binding.
You may bind unboxed variables in a (non-recursive, non-top-level) pattern binding, but you must make any such pattern-match strict. For example, rather than:
```
  data Foo = Foo Int Int#

  f x = let (Foo a b, w) = ..rhs.. in ..body..
```
you must write:
```
  data Foo = Foo Int Int#

  f x = let !(Foo a b, w) = ..rhs.. in ..body..
```
since b has type Int#.

7.2.2. Unboxed Tuples

Unboxed tuples aren't really exported by GHC.Exts, they're available by default with -fglasgow-exts. An unboxed tuple looks like this:

(# e_1, ..., e_n #)

where e_1..e_n are expressions of any type (primitive or non-primitive). The type of an unboxed tuple looks the same.

Unboxed tuples are used for functions that need to return multiple values, but they avoid the heap allocation normally associated with using fully-fledged tuples. When an unboxed tuple is returned, the components are put directly into registers or on the stack; the unboxed tuple itself does not have a composite representation. Many of the primitive operations listed in primops.txt.pp return unboxed tuples. In particular, the IO and ST monads use unboxed tuples to avoid unnecessary allocation during sequences of operations.

There are some pretty stringent restrictions on the use of unboxed tuples:

Values of unboxed tuple types are subject to the same restrictions as other unboxed types; i.e. they may not be stored in polymorphic data structures or passed to polymorphic functions.

No variable can have an unboxed tuple type, nor may a constructor or function argument have an unboxed tuple type. The following are all illegal:

  data Foo = Foo (# Int, Int #)

  f :: (# Int, Int #) -> (# Int, Int #)
  f x = x

  g :: (# Int, Int #) -> Int
  g (# a,b #) = a

  h x = let y = (# x,x #) in ...

The typical use of unboxed tuples is simply to return multiple values, binding those multiple results with a case expression, thus:

  f x y = (# x+1, y-1 #)
  g x = case f x x of { (# a, b #) -> a + b }

You can have an unboxed tuple in a pattern binding, thus

  f x = let (# p,q #) = h x in ..body..

If the types of p and q are not unboxed, the resulting binding is lazy like any other Haskell pattern binding. The above example desugars like this:

  f x = let t = case h x o f{ (# p,q #) -> (p,q)
            p = fst t
            q = snd t
        in ..body..

Indeed, the bindings can even be recursive.