3.11. Using Parallel Haskell

[You won't be able to execute parallel Haskell programs unless PVM3 (Parallel Virtual Machine, version 3) is installed at your site.]

To compile a Haskell program for parallel execution under PVM, use the -parallel option, both when compiling and linking. You will probably want to import Parallel into your Haskell modules.

To run your parallel program, once PVM is going, just invoke it “as normal”. The main extra RTS option is -N<n>, to say how many PVM “processors” your program to run on. (For more details of all relevant RTS options, please see Section 3.11.4.)

In truth, running Parallel Haskell programs and getting information out of them (e.g., parallelism profiles) is a battle with the vagaries of PVM, detailed in the following sections.

3.11.1. Dummy's guide to using PVM

Before you can run a parallel program under PVM, you must set the required environment variables (PVM's idea, not ours); something like, probably in your .cshrc or equivalent:
setenv PVM_ROOT /wherever/you/put/it
setenv PVM_ARCH `$PVM_ROOT/lib/pvmgetarch`
setenv PVM_DPATH $PVM_ROOT/lib/pvmd

Creating and/or controlling your “parallel machine” is a purely-PVM business; nothing specific to Parallel Haskell.

You use the pvm command to start PVM on your machine. You can then do various things to control/monitor your “parallel machine;” the most useful being:

Control-Dexit pvm, leaving it running
haltkill off this “parallel machine” & exit
add <host>add <host> as a processor
delete <host>delete <host>
resetkill what's going, but leave PVM up
conflist the current configuration
psreport processes' status
pstat <pid>status of a particular process

The PVM documentation can tell you much, much more about pvm!

3.11.2. Parallelism profiles

With Parallel Haskell programs, we usually don't care about the results—only with “how parallel” it was! We want pretty pictures.

Parallelism profiles (à la hbcpp) can be generated with the -q RTS option. The per-processor profiling info is dumped into files named <full-path><program>.gr. These are then munged into a PostScript picture, which you can then display. For example, to run your program a.out on 8 processors, then view the parallelism profile, do:

% ./a.out +RTS -N8 -q
% grs2gr *.???.gr > temp.gr     # combine the 8 .gr files into one
% gr2ps -O temp.gr              # cvt to .ps; output in temp.ps
% ghostview -seascape temp.ps   # look at it!

The scripts for processing the parallelism profiles are distributed in ghc/utils/parallel/.

3.11.3. Other useful info about running parallel programs

The “garbage-collection statistics” RTS options can be useful for seeing what parallel programs are doing. If you do either +RTS -Sstderr or +RTS -sstderr, then you'll get mutator, garbage-collection, etc., times on standard error. The standard error of all PE's other than the `main thread' appears in /tmp/pvml.nnn, courtesy of PVM.

Whether doing +RTS -Sstderr or not, a handy way to watch what's happening overall is: tail -f /tmp/pvml.nnn.

3.11.4. RTS options for Concurrent/Parallel Haskell

Besides the usual runtime system (RTS) options (Section 3.12), there are a few options particularly for concurrent/parallel execution.

-N<N>:

(PARALLEL ONLY) Use <N> PVM processors to run this program; the default is 2.

-C[<us>]:

Sets the context switch interval to <us> microseconds. A context switch will occur at the next heap allocation after the timer expires. With -C0 or -C, context switches will occur as often as possible (at every heap allocation). By default, context switches occur every 10 milliseconds. Note that many interval timers are only capable of 10 millisecond granularity, so the default setting may be the finest granularity possible, short of a context switch at every heap allocation.

[NOTE: this option currently has no effect (version 4.00). Context switches happen when the current heap block is full, i.e. every 4k of allocation].

-q[v]:

(PARALLEL ONLY) Produce a quasi-parallel profile of thread activity, in the file <program>.qp. In the style of hbcpp, this profile records the movement of threads between the green (runnable) and red (blocked) queues. If you specify the verbose suboption (-qv), the green queue is split into green (for the currently running thread only) and amber (for other runnable threads). We do not recommend that you use the verbose suboption if you are planning to use the hbcpp profiling tools or if you are context switching at every heap check (with -C).

-t<num>:

(PARALLEL ONLY) Limit the number of concurrent threads per processor to <num>. The default is 32. Each thread requires slightly over 1K words in the heap for thread state and stack objects. (For 32-bit machines, this translates to 4K bytes, and for 64-bit machines, 8K bytes.)

-d:

(PARALLEL ONLY) Turn on debugging. It pops up one xterm (or GDB, or something…) per PVM processor. We use the standard debugger script that comes with PVM3, but we sometimes meddle with the debugger2 script. We include ours in the GHC distribution, in ghc/utils/pvm/.

-e<num>:

(PARALLEL ONLY) Limit the number of pending sparks per processor to <num>. The default is 100. A larger number may be appropriate if your program generates large amounts of parallelism initially.

-Q<num>:

(PARALLEL ONLY) Set the size of packets transmitted between processors to <num>. The default is 1024 words. A larger number may be appropriate if your machine has a high communication cost relative to computation speed.