GHC is built on a raft of primitive data types and operations; "primitive" in the sense that they cannot be defined in Haskell itself. While you really can use this stuff to write fast code, we generally find it a lot less painful, and more satisfying in the long run, to use higher-level language features and libraries. With any luck, the code you write will be optimised to the efficient unboxed version in any case. And if it isn't, we'd like to know about it.
All these primitive data types and operations are exported by the
library GHC.Prim
, for which there is
detailed online documentation.
(This documentation is generated from the file compiler/prelude/primops.txt.pp
.)
If you want to mention any of the primitive data types or operations in your
program, you must first import GHC.Prim
to bring them
into scope. Many of them have names ending in "#", and to mention such
names you need the -XMagicHash
extension (Section 7.3.2, “The magic hash”).
The primops make extensive use of unboxed types and unboxed tuples, which we briefly summarise here.
Most types in GHC are boxed, which means
that values of that type are represented by a pointer to a heap
object. The representation of a Haskell Int
, for
example, is a two-word heap object. An unboxed
type, however, is represented by the value itself, no pointers or heap
allocation are involved.
Unboxed types correspond to the “raw machine” types you
would use in C: Int#
(long int),
Double#
(double), Addr#
(void *), etc. The primitive operations
(PrimOps) on these types are what you might expect; e.g.,
(+#)
is addition on
Int#
s, and is the machine-addition that we all
know and love—usually one instruction.
Primitive (unboxed) types cannot be defined in Haskell, and are
therefore built into the language and compiler. Primitive types are
always unlifted; that is, a value of a primitive type cannot be
bottom. We use the convention (but it is only a convention)
that primitive types, values, and
operations have a #
suffix (see Section 7.3.2, “The magic hash”).
For some primitive types we have special syntax for literals, also
described in the same section.
Primitive values are often represented by a simple bit-pattern, such
as Int#
, Float#
,
Double#
. But this is not necessarily the case:
a primitive value might be represented by a pointer to a
heap-allocated object. Examples include
Array#
, the type of primitive arrays. A
primitive array is heap-allocated because it is too big a value to fit
in a register, and would be too expensive to copy around; in a sense,
it is accidental that it is represented by a pointer. If a pointer
represents a primitive value, then it really does point to that value:
no unevaluated thunks, no indirections…nothing can be at the
other end of the pointer than the primitive value.
A numerically-intensive program using unboxed types can
go a lot faster than its “standard”
counterpart—we saw a threefold speedup on one example.
There are some restrictions on the use of primitive types:
The main restriction
is that you can't pass a primitive value to a polymorphic
function or store one in a polymorphic data type. This rules out
things like [Int#]
(i.e. lists of primitive
integers). The reason for this restriction is that polymorphic
arguments and constructor fields are assumed to be pointers: if an
unboxed integer is stored in one of these, the garbage collector would
attempt to follow it, leading to unpredictable space leaks. Or a
seq
operation on the polymorphic component may
attempt to dereference the pointer, with disastrous results. Even
worse, the unboxed value might be larger than a pointer
(Double#
for instance).
You cannot define a newtype whose representation type (the argument type of the data constructor) is an unboxed type. Thus, this is illegal:
newtype A = MkA Int#
You cannot bind a variable with an unboxed type in a top-level binding.
You cannot bind a variable with an unboxed type in a recursive binding.
You may bind unboxed variables in a (non-recursive, non-top-level) pattern binding, but you must make any such pattern-match strict. For example, rather than:
data Foo = Foo Int Int# f x = let (Foo a b, w) = ..rhs.. in ..body..
you must write:
data Foo = Foo Int Int# f x = let !(Foo a b, w) = ..rhs.. in ..body..
since b
has type Int#
.
Unboxed tuples aren't really exported by GHC.Exts
,
they're available by default with -fglasgow-exts
. An
unboxed tuple looks like this:
(# e_1, ..., e_n #)
where e_1..e_n
are expressions of any
type (primitive or non-primitive). The type of an unboxed tuple looks
the same.
Unboxed tuples are used for functions that need to return multiple
values, but they avoid the heap allocation normally associated with
using fully-fledged tuples. When an unboxed tuple is returned, the
components are put directly into registers or on the stack; the
unboxed tuple itself does not have a composite representation. Many
of the primitive operations listed in primops.txt.pp
return unboxed
tuples.
In particular, the IO
and ST
monads use unboxed
tuples to avoid unnecessary allocation during sequences of operations.
There are some pretty stringent restrictions on the use of unboxed tuples:
Values of unboxed tuple types are subject to the same restrictions as other unboxed types; i.e. they may not be stored in polymorphic data structures or passed to polymorphic functions.
No variable can have an unboxed tuple type, nor may a constructor or function argument have an unboxed tuple type. The following are all illegal:
data Foo = Foo (# Int, Int #) f :: (# Int, Int #) -> (# Int, Int #) f x = x g :: (# Int, Int #) -> Int g (# a,b #) = a h x = let y = (# x,x #) in ...
The typical use of unboxed tuples is simply to return multiple values,
binding those multiple results with a case
expression, thus:
f x y = (# x+1, y-1 #) g x = case f x x of { (# a, b #) -> a + b }
You can have an unboxed tuple in a pattern binding, thus
f x = let (# p,q #) = h x in ..body..
If the types of p
and q
are not unboxed,
the resulting binding is lazy like any other Haskell pattern binding. The
above example desugars like this:
f x = let t = case h x o f{ (# p,q #) -> (p,q) p = fst t q = snd t in ..body..
Indeed, the bindings can even be recursive.