GHC implements some major extensions to Haskell to support concurrent and parallel programming. Let us first establish terminology:
Parallelism means running a Haskell program on multiple processors, with the goal of improving performance. Ideally, this should be done invisibly, and with no semantic changes.
Concurrency means implementing a program by using multiple I/O-performing threads. While a concurrent Haskell program can run on a parallel machine, the primary goal of using concurrency is not to gain performance, but rather because that is the simplest and most direct way to write the program. Since the threads perform I/O, the semantics of the program is necessarily non-deterministic.
GHC supports both concurrency and parallelism.
Concurrent Haskell is the name given to GHC's concurrency extension. It is enabled by default, so no special flags are required. The Concurrent Haskell paper is still an excellent resource, as is Tackling the awkward squad.
To the programmer, Concurrent Haskell introduces no new language constructs; rather, it appears simply as a library, Control.Concurrent. The functions exported by this library include:
Forking and killing threads.
Synchronised mutable variables, called
Support for bound threads; see the paper Extending the FFI with concurrency.
GHC now supports a new way to coordinate the activities of Concurrent Haskell threads, called Software Transactional Memory (STM). The STM papers are an excellent introduction to what STM is, and how to use it.
The main library you need to use STM is Control.Concurrent.STM. The main features supported are these:
Operations for composing transactions:
All these features are described in the papers mentioned earlier.
GHC includes support for running Haskell programs in parallel
on symmetric, shared-memory multi-processor
By default GHC runs your program on one processor; if you
want it to run in parallel you must link your program
-threaded, and run it with the RTS
-N option; see Section 5.12, “Using SMP parallelism”).
The runtime will
schedule the running Haskell threads among the available OS
threads, running as many in parallel as you specified with the
-N RTS option.
GHC only supports parallelism on a shared-memory multiprocessor. Glasgow Parallel Haskell (GPH) supports running Parallel Haskell programs on both clusters of machines, and single multiprocessors. GPH is developed and distributed separately from GHC (see The GPH Page). However, the current version of GPH is based on a much older version of GHC (4.06).
Ordinary single-threaded Haskell programs will not benefit from
enabling SMP parallelism alone: you must expose parallelism to the
One way to do so is forking threads using Concurrent Haskell (Section 8.17.1, “Concurrent Haskell”), but the simplest mechanism for extracting parallelism from pure code is
to use the
par combinator, which is closely related to (and often used
seq. Both of these are available from
infixr 0 `par` infixr 1 `seq` par :: a -> b -> b seq :: a -> b -> b
(x `par` y)
sparks the evaluation of
(to weak head normal form) and returns
y. Sparks are
queued for execution in FIFO order, but are not executed immediately. If
the runtime detects that there is an idle CPU, then it may convert a
spark into a real thread, and run the new thread on the idle CPU. In
this way the available parallelism is spread amongst the real
For example, consider the following parallel version of our old
import Control.Parallel nfib :: Int -> Int nfib n | n <= 1 = 1 | otherwise = par n1 (seq n2 (n1 + n2 + 1)) where n1 = nfib (n-1) n2 = nfib (n-2)
For values of
n greater than 1, we use
par to spark a thread to evaluate
and then we use
seq to force the
parent thread to evaluate
nfib (n-2) before going on
to add together these two subexpressions. In this divide-and-conquer
approach, we only spark a new thread for one branch of the computation
(leaving the parent to evaluate the other branch). Also, we must use
seq to ensure that the parent will evaluate
in the expression
(n1 + n2 + 1). It is not sufficient
to reorder the expression as
(n2 + n1 + 1), because
the compiler may not generate code to evaluate the addends from left to
par, the general rule of thumb is that
the sparked computation should be required at a later time, but not too
soon. Also, the sparked computation should not be too small, otherwise
the cost of forking it in parallel will be too large relative to the
amount of parallelism gained. Getting these factors right is tricky in
More sophisticated combinators for expressing parallelism are
available from the
This module builds functionality around
expressing more elaborate patterns of parallel computation, such as