7.15. Parallel Haskell

There are two implementations of Parallel Haskell: SMP paralellism which is built-in to GHC (see Section 4.12, “Using SMP parallelism”) and supports running Parallel Haskell programs on a single multiprocessor machine, and Glasgow Parallel Haskell (GPH) which supports running Parallel Haskell programs on both clusters of machines or single multiprocessors. GPH is developed and distributed separately from GHC (see The GPH Page).

Ordinary single-threaded Haskell programs will not benefit from enabling SMP parallelism alone. You must expose parallelism to the compiler in one of the following two ways.

7.15.1. Running Concurrent Haskell programs in parallel

The first possibility is to use concurrent threads to structure your program, and make sure that you spread computation amongst the threads. The runtime will schedule the running Haskell threads among the available OS threads, running as many in parallel as you specified with the -N RTS option.

7.15.2. Annotating pure code for parallelism

The simplest mechanism for extracting parallelism from pure code is to use the par combinator, which is closely related to (and often used with) seq. Both of these are available from Control.Parallel:

infixr 0 `par`
infixr 1 `seq`

par :: a -> b -> b
seq :: a -> b -> b

The expression (x `par` y) sparks the evaluation of x (to weak head normal form) and returns y. Sparks are queued for execution in FIFO order, but are not executed immediately. If the runtime detects that there is an idle CPU, then it may convert a spark into a real thread, and run the new thread on the idle CPU. In this way the available parallelism is spread amongst the real CPUs.

For example, consider the following parallel version of our old nemesis, nfib:

import Control.Parallel

nfib :: Int -> Int
nfib n | n <= 1 = 1
       | otherwise = par n1 (seq n2 (n1 + n2 + 1))
                     where n1 = nfib (n-1)
                           n2 = nfib (n-2)

For values of n greater than 1, we use par to spark a thread to evaluate nfib (n-1), and then we use seq to force the parent thread to evaluate nfib (n-2) before going on to add together these two subexpressions. In this divide-and-conquer approach, we only spark a new thread for one branch of the computation (leaving the parent to evaluate the other branch). Also, we must use seq to ensure that the parent will evaluate n2 before n1 in the expression (n1 + n2 + 1). It is not sufficient to reorder the expression as (n2 + n1 + 1), because the compiler may not generate code to evaluate the addends from left to right.

When using par, the general rule of thumb is that the sparked computation should be required at a later time, but not too soon. Also, the sparked computation should not be too small, otherwise the cost of forking it in parallel will be too large relative to the amount of parallelism gained. Getting these factors right is tricky in practice.

More sophisticated combinators for expressing parallelism are available from the Control.Parallel.Strategies module. This module builds functionality around par, expressing more elaborate patterns of parallel computation, such as parallel map.