Prev		Next

Chapter 6. GHC Language Features

Table of Contents
6.1. Unboxed types and primitive operations
6.2. Primitive state-transformer monad
6.3. Primitive arrays, mutable and otherwise
6.4. Pattern guards
6.5. The foreign interface
6.6. Multi-parameter type classes
6.7. Explicit universal quantification
6.8. Existentially quantified data constructors
6.9. Assertions
6.10. Scoped Type Variables
6.11. Pragmas
6.12. Rewrite rules
6.13. Concurrent and Parallel Haskell
6.14. Haskell 98 vs. Glasgow Haskell: language non-compliance

As with all known Haskell systems, GHC implements some extensions to the language. To use them, you'll need to give a -fglasgow-exts option.

Virtually all of the Glasgow extensions serve to give you access to the underlying facilities with which we implement Haskell. Thus, you can get at the Raw Iron, if you are willing to write some non-standard code at a more primitive level. You need not be “stuck” on performance because of the implementation costs of Haskell's “high-level” features—you can always code “under” them. In an extreme case, you can write all your time-critical code in C, and then just glue it together with Haskell!

Executive summary of our extensions:

Unboxed types and primitive operations:: You can get right down to the raw machine types and operations; included in this are “primitive arrays” (direct access to Big Wads of Bytes). Please see Section 6.1.1 and following.
Multi-parameter type classes:: GHC's type system supports extended type classes with multiple parameters. Please see Section 6.6.
Local universal quantification:: GHC's type system supports explicit universal quantification in constructor fields and function arguments. This is useful for things like defining runST from the state-thread world. See Section 6.7.
Extistentially quantification in data types:: Some or all of the type variables in a datatype declaration may be existentially quantified. More details in Section 6.8.
Scoped type variables:: Scoped type variables enable the programmer to supply type signatures for some nested declarations, where this would not be legal in Haskell 98. Details in Section 6.10.
Pattern guards: Instead of being a boolean expression, a guard is a list of qualifiers, exactly as in a list comprehension. See Section 6.4.
Foreign calling:: Just what it sounds like. We provide lots of rope that you can dangle around your neck. Please see Chapter 7.
Pragmas: Pragmas are special instructions to the compiler placed in the source file. The pragmas GHC supports are described in Section 6.11.
Rewrite rules:: The programmer can specify rewrite rules as part of the source program (in a pragma). GHC applies these rewrite rules wherever it can. Details in Section 6.12.

Before you get too carried away working at the lowest level (e.g., sloshing MutableByteArray#s around your program), you may wish to check if there are libraries that provide a “Haskellised veneer” over the features you want. See Haskell Libraries.

6.1. Unboxed types and primitive operations

This module defines all the types which are primitive in Glasgow Haskell, and the operations provided for them.

6.1.1. Unboxed types

Most types in GHC are boxed, which means that values of that type are represented by a pointer to a heap object. The representation of a Haskell Int, for example, is a two-word heap object. An unboxed type, however, is represented by the value itself, no pointers or heap allocation are involved.

Unboxed types correspond to the “raw machine” types you would use in C: Int# (long int), Double# (double), Addr# (void *), etc. The primitive operations (PrimOps) on these types are what you might expect; e.g., (+#) is addition on Int#s, and is the machine-addition that we all know and love—usually one instruction.

Primitive (unboxed) types cannot be defined in Haskell, and are therefore built into the language and compiler. Primitive types are always unlifted; that is, a value of a primitive type cannot be bottom. We use the convention that primitive types, values, and operations have a # suffix.

Primitive values are often represented by a simple bit-pattern, such as Int#, Float#, Double#. But this is not necessarily the case: a primitive value might be represented by a pointer to a heap-allocated object. Examples include Array#, the type of primitive arrays. A primitive array is heap-allocated because it is too big a value to fit in a register, and would be too expensive to copy around; in a sense, it is accidental that it is represented by a pointer. If a pointer represents a primitive value, then it really does point to that value: no unevaluated thunks, no indirections…nothing can be at the other end of the pointer than the primitive value.

There are some restrictions on the use of primitive types, the main one being that you can't pass a primitive value to a polymorphic function or store one in a polymorphic data type. This rules out things like [Int#] (i.e. lists of primitive integers). The reason for this restriction is that polymorphic arguments and constructor fields are assumed to be pointers: if an unboxed integer is stored in one of these, the garbage collector would attempt to follow it, leading to unpredictable space leaks. Or a seq operation on the polymorphic component may attempt to dereference the pointer, with disastrous results. Even worse, the unboxed value might be larger than a pointer (Double# for instance).

Nevertheless, A numerically-intensive program using unboxed types can go a lot faster than its “standard” counterpart—we saw a threefold speedup on one example.

6.1.2. Unboxed Tuples

Unboxed tuples aren't really exported by PrelGHC, they're available by default with -fglasgow-exts. An unboxed tuple looks like this:

(# e_1, ..., e_n #)

where e_1..e_n are expressions of any type (primitive or non-primitive). The type of an unboxed tuple looks the same.

Unboxed tuples are used for functions that need to return multiple values, but they avoid the heap allocation normally associated with using fully-fledged tuples. When an unboxed tuple is returned, the components are put directly into registers or on the stack; the unboxed tuple itself does not have a composite representation. Many of the primitive operations listed in this section return unboxed tuples.

There are some pretty stringent restrictions on the use of unboxed tuples:

Unboxed tuple types are subject to the same restrictions as other unboxed types; i.e. they may not be stored in polymorphic data structures or passed to polymorphic functions.
Unboxed tuples may only be constructed as the direct result of a function, and may only be deconstructed with a case expression. eg. the following are valid:
f x y = (# x+1, y-1 #) g x = case f x x of { (# a, b #) -> a + b }
but the following are invalid:
f x y = g (# x, y #) g (# x, y #) = x + y
No variable can have an unboxed tuple type. This is illegal:
f :: (# Int, Int #) -> (# Int, Int #) f x = x
because x has an unboxed tuple type.

Note: we may relax some of these restrictions in the future.

The IO and ST monads use unboxed tuples to avoid unnecessary allocation during sequences of operations.

6.1.3. Character and numeric types

There are the following obvious primitive types:

type Char# type Int# type Word# type Addr# type Float# type Double# type Int64# type Word64#

If you really want to know their exact equivalents in C, see ghc/includes/StgTypes.h in the GHC source tree.

Literals for these types may be written as follows:

1# an Int# 1.2# a Float# 1.34## a Double# 'a'# a Char#; for weird characters, use e.g. '\o<octal>'# "a"# an Addr# (a `char *'); only characters '\0'..'\255' allowed

6.1.4. Comparison operations

{>,>=,==,/=,<,<=}# :: Int# -> Int# -> Bool {gt,ge,eq,ne,lt,le}Char# :: Char# -> Char# -> Bool -- ditto for Word# and Addr#

6.1.5. Primitive-character operations

ord# :: Char# -> Int# chr# :: Int# -> Char#

6.1.6. Primitive-`Int` operations

{+,-,*,quotInt,remInt,gcdInt}# :: Int# -> Int# -> Int# negateInt# :: Int# -> Int# iShiftL#, iShiftRA#, iShiftRL# :: Int# -> Int# -> Int# -- shift left, right arithmetic, right logical addIntC#, subIntC#, mulIntC# :: Int# -> Int# -> (# Int#, Int# #) -- add, subtract, multiply with carry

Note: No error/overflow checking!

6.1.7. Primitive-`Double` and `Float` operations

{+,-,*,/}## :: Double# -> Double# -> Double# {<,<=,==,/=,>=,>}## :: Double# -> Double# -> Bool negateDouble# :: Double# -> Double# double2Int# :: Double# -> Int# int2Double# :: Int# -> Double# {plus,minux,times,divide}Float# :: Float# -> Float# -> Float# {gt,ge,eq,ne,lt,le}Float# :: Float# -> Float# -> Bool negateFloat# :: Float# -> Float# float2Int# :: Float# -> Int# int2Float# :: Int# -> Float#

And a full complement of trigonometric functions:

expDouble# :: Double# -> Double# logDouble# :: Double# -> Double# sqrtDouble# :: Double# -> Double# sinDouble# :: Double# -> Double# cosDouble# :: Double# -> Double# tanDouble# :: Double# -> Double# asinDouble# :: Double# -> Double# acosDouble# :: Double# -> Double# atanDouble# :: Double# -> Double# sinhDouble# :: Double# -> Double# coshDouble# :: Double# -> Double# tanhDouble# :: Double# -> Double# powerDouble# :: Double# -> Double# -> Double#

similarly for Float#.

There are two coercion functions for Float#/Double#:

float2Double# :: Float# -> Double# double2Float# :: Double# -> Float#

The primitive version of decodeDouble (encodeDouble is implemented as an external C function):

decodeDouble# :: Double# -> PrelNum.ReturnIntAndGMP

(And the same for Float#s.)

6.1.8. Operations on/for `Integers` (interface to GMP)

We implement Integers (arbitrary-precision integers) using the GNU multiple-precision (GMP) package (version 2.0.2).

The data type for Integer is either a small integer, represented by an Int, or a large integer represented using the pieces required by GMP's MP_INT in gmp.h (see gmp.info in ghc/includes/runtime/gmp). It comes out as:

data Integer = S# Int# -- small integers | J# Int# ByteArray# -- large integers
The primitive ops to support large Integers use the “pieces” of the representation, and are as follows:

negateInteger# :: Int# -> ByteArray# -> Integer {plus,minus,times}Integer#, gcdInteger#, quotInteger#, remInteger#, divExactInteger# :: Int# -> ByteArray# -> Int# -> ByteArray# -> (# Int#, ByteArray# #) cmpInteger# :: Int# -> ByteArray# -> Int# -> ByteArray# -> Int# -- -1 for <; 0 for ==; +1 for > cmpIntegerInt# :: Int# -> ByteArray# -> Int# -> Int# -- -1 for <; 0 for ==; +1 for > gcdIntegerInt# :: :: Int# -> ByteArray# -> Int# -> Int# divModInteger#, quotRemInteger# :: Int# -> ByteArray# -> Int# -> ByteArray# -> (# Int#, ByteArray#, Int#, ByteArray# #) integer2Int# :: Int# -> ByteArray# -> Int# int2Integer# :: Int# -> Integer -- NB: no error-checking on these two! word2Integer# :: Word# -> Integer addr2Integer# :: Addr# -> Integer -- the Addr# is taken to be a `char *' string -- to be converted into an Integer.

6.1.9. Words and addresses

A Word# is used for bit-twiddling operations. It is the same size as an Int#, but has no sign nor any arithmetic operations.
type Word# -- Same size/etc as Int# but *unsigned* type Addr# -- A pointer from outside the "Haskell world" (from C, probably); -- described under "arrays"

Word#s and Addr#s have the usual comparison operations. Other unboxed-Word ops (bit-twiddling and coercions):

{gt,ge,eq,ne,lt,le}Word# :: Word# -> Word# -> Bool and#, or#, xor# :: Word# -> Word# -> Word# -- standard bit ops. quotWord#, remWord# :: Word# -> Word# -> Word# -- word (i.e. unsigned) versions are different from int -- versions, so we have to provide these explicitly. not# :: Word# -> Word# shiftL#, shiftRL# :: Word# -> Int# -> Word# -- shift left, right logical int2Word# :: Int# -> Word# -- just a cast, really word2Int# :: Word# -> Int#

Unboxed-Addr ops (C casts, really):
{gt,ge,eq,ne,lt,le}Addr# :: Addr# -> Addr# -> Bool int2Addr# :: Int# -> Addr# addr2Int# :: Addr# -> Int# addr2Integer# :: Addr# -> (# Int#, ByteArray# #)

The casts between Int#, Word# and Addr# correspond to null operations at the machine level, but are required to keep the Haskell type checker happy.

Operations for indexing off of C pointers (Addr#s) to snatch values are listed under “arrays”.

6.1.10. Arrays

The type Array# elt is the type of primitive, unpointed arrays of values of type elt.

type Array# elt

Array# is more primitive than a Haskell array—indeed, the Haskell Array interface is implemented using Array#—in that an Array# is indexed only by Int#s, starting at zero. It is also more primitive by virtue of being unboxed. That doesn't mean that it isn't a heap-allocated object—of course, it is. Rather, being unboxed means that it is represented by a pointer to the array itself, and not to a thunk which will evaluate to the array (or to bottom). The components of an Array# are themselves boxed.

The type ByteArray# is similar to Array#, except that it contains just a string of (non-pointer) bytes.

type ByteArray#

Arrays of these types are useful when a Haskell program wishes to construct a value to pass to a C procedure. It is also possible to use them to build (say) arrays of unboxed characters for internal use in a Haskell program. Given these uses, ByteArray# is deliberately a bit vague about the type of its components. Operations are provided to extract values of type Char#, Int#, Float#, Double#, and Addr# from arbitrary offsets within a ByteArray#. (For type Foo#, the $i$th offset gets you the $i$th Foo#, not the Foo# at byte-position $i$. Mumble.) (If you want a Word#, grab an Int#, then coerce it.)

Lastly, we have static byte-arrays, of type Addr# [mentioned previously]. (Remember the duality between arrays and pointers in C.) Arrays of this types are represented by a pointer to an array in the world outside Haskell, so this pointer is not followed by the garbage collector. In other respects they are just like ByteArray#. They are only needed in order to pass values from C to Haskell.

6.1.11. Reading and writing

Primitive arrays are linear, and indexed starting at zero.

The size and indices of a ByteArray#, Addr#, and MutableByteArray# are all in bytes. It's up to the program to calculate the correct byte offset from the start of the array. This allows a ByteArray# to contain a mixture of values of different type, which is often needed when preparing data for and unpicking results from C. (Umm…not true of indices…WDP 95/09)

Should we provide some sizeOfDouble# constants?

Out-of-range errors on indexing should be caught by the code which uses the primitive operation; the primitive operations themselves do not check for out-of-range indexes. The intention is that the primitive ops compile to one machine instruction or thereabouts.

We use the terms “reading” and “writing” to refer to accessing mutable arrays (see Section 6.1.14), and “indexing” to refer to reading a value from an immutable array.

Immutable byte arrays are straightforward to index (all indices in bytes):
indexCharArray# :: ByteArray# -> Int# -> Char# indexIntArray# :: ByteArray# -> Int# -> Int# indexAddrArray# :: ByteArray# -> Int# -> Addr# indexFloatArray# :: ByteArray# -> Int# -> Float# indexDoubleArray# :: ByteArray# -> Int# -> Double# indexCharOffAddr# :: Addr# -> Int# -> Char# indexIntOffAddr# :: Addr# -> Int# -> Int# indexFloatOffAddr# :: Addr# -> Int# -> Float# indexDoubleOffAddr# :: Addr# -> Int# -> Double# indexAddrOffAddr# :: Addr# -> Int# -> Addr# -- Get an Addr# from an Addr# offset

The last of these, indexAddrOffAddr#, extracts an Addr# using an offset from another Addr#, thereby providing the ability to follow a chain of C pointers.

Something a bit more interesting goes on when indexing arrays of boxed objects, because the result is simply the boxed object. So presumably it should be entered—we never usually return an unevaluated object! This is a pain: primitive ops aren't supposed to do complicated things like enter objects. The current solution is to return a single element unboxed tuple (see Section 6.1.2).

indexArray# :: Array# elt -> Int# -> (# elt #)

6.1.12. The state type

The primitive type State# represents the state of a state transformer. It is parameterised on the desired type of state, which serves to keep states from distinct threads distinct from one another. But the only effect of this parameterisation is in the type system: all values of type State# are represented in the same way. Indeed, they are all represented by nothing at all! The code generator “knows” to generate no code, and allocate no registers etc, for primitive states.

type State# s

The type GHC.RealWorld is truly opaque: there are no values defined of this type, and no operations over it. It is “primitive” in that sense - but it is not unlifted! Its only role in life is to be the type which distinguishes the IO state transformer.

data RealWorld

6.1.13. State of the world

A single, primitive, value of type State# RealWorld is provided.

realWorld# :: State# RealWorld

(Note: in the compiler, not a PrimOp; just a mucho magic Id. Exported from GHC, though).

6.1.14. Mutable arrays

Corresponding to Array# and ByteArray#, we have the types of mutable versions of each. In each case, the representation is a pointer to a suitable block of (mutable) heap-allocated storage.

type MutableArray# s elt type MutableByteArray# s

6.1.14.1. Allocation

Mutable arrays can be allocated. Only pointer-arrays are initialised; arrays of non-pointers are filled in by “user code” rather than by the array-allocation primitive. Reason: only the pointer case has to worry about GC striking with a partly-initialised array.

newArray# :: Int# -> elt -> State# s -> (# State# s, MutableArray# s elt #) newCharArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #) newIntArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #) newAddrArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #) newFloatArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #) newDoubleArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)

The size of a ByteArray# is given in bytes.

6.1.14.2. Reading and writing

readArray# :: MutableArray# s elt -> Int# -> State# s -> (# State# s, elt #) readCharArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Char# #) readIntArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Int# #) readAddrArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Addr# #) readFloatArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Float# #) readDoubleArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Double# #) writeArray# :: MutableArray# s elt -> Int# -> elt -> State# s -> State# s writeCharArray# :: MutableByteArray# s -> Int# -> Char# -> State# s -> State# s writeIntArray# :: MutableByteArray# s -> Int# -> Int# -> State# s -> State# s writeAddrArray# :: MutableByteArray# s -> Int# -> Addr# -> State# s -> State# s writeFloatArray# :: MutableByteArray# s -> Int# -> Float# -> State# s -> State# s writeDoubleArray# :: MutableByteArray# s -> Int# -> Double# -> State# s -> State# s