GHC is a command-line compiler: in order to compile a Haskell program,
GHC must be invoked on the source file(s) by typing a command to the
shell. The steps involved in compiling a program can be automated
using the make
tool (this is especially useful if the program
consists of multiple source files which depend on each other). This
section describes how to use GHC from the command-line.
An invocation of GHC takes the following form:
ghc [argument...]
Command-line arguments are either options or file names.
Command-line options begin with -
. They may not be
grouped: -vO
is different from -v -O
. Options need not
precede filenames: e.g., ghc *.o -o foo
. All options are
processed and then applied to all files; you cannot, for example, invoke
ghc -c -O1 Foo.hs -O2 Bar.hs
to apply different optimisation
levels to the files Foo.hs
and Bar.hs
. For conflicting
options, e.g., -c -S
, we reserve the right to do anything we
want. (Usually, the last one applies.)
File names with ``meaningful'' suffixes (e.g., .lhs
or .o
)
cause the ``right thing'' to happen to those files.
.lhs
:A ``literate Haskell'' module.
.hs
:A not-so-literate Haskell module.
.hi
:A Haskell interface file, probably compiler-generated.
.hc
:Intermediate C file produced by the Haskell compiler.
.c
:A C file not produced by the Haskell compiler.
.s
:An assembly-language source file, usually produced by the compiler.
.o
:An object file, produced by an assembler.
Files with other suffixes (or without suffixes) are passed straight to the linker.
A good option to start with is the -help
(or -?
) option.
GHC spews a long message to standard output and then exits.
The -v
option makes GHC verbose: it
reports its version number and shows (on stderr) exactly how it invokes each
phase of the compilation system. Moreover, it passes
the -v
flag to most phases; each reports
its version number (and possibly some other information).
Please, oh please, use the -v
option when reporting bugs!
Knowing that you ran the right bits in the right order is always the
first thing we want to verify.
If you're just interested in the compiler version number, the
--version
option prints out a
one-line string containing the requested info.
The basic task of the ghc
driver is to run each input file
through the right phases (parsing, linking, etc.).
The first phase to run is determined by the input-file suffix, and the last phase is determined by a flag. If no relevant flag is present, then go all the way through linking. This table summarises:
Phase of the
Thus, a common invocation would be: ghc -c Foo.hs
Note: What the Haskell compiler proper produces depends on whether a native-code generator is used (producing assembly language) or not (producing C).
The option -cpp
must be given for the C
pre-processor phase to be run, that is, the pre-processor will be run
over your Haskell source file before continuing.
The option -E
runs just the pre-processing
passes of the compiler, outputting the result on stdout before
stopping. If used in conjunction with -cpp, the output is the
code blocks of the original (literal) source after having put it
through the grinder that is the C pre-processor. Sans -cpp
, the
output is the de-litted version of the original source.
The option -optcpp-E
runs just the
pre-processing stage of the C-compiling phase, sending the result to
stdout. (For debugging or obfuscation contests, usually.)
GHC's compiled output normally goes into a .hc
, .o
, etc., file,
depending on the last-run compilation phase. The option -o
foo
re-directs the output of that last-run
phase to file foo
.
Note: this ``feature'' can be counterintuitive:
ghc -C -o foo.o foo.hs
will put the intermediate C code in the
file foo.o
, name notwithstanding!
EXOTICA: But the -o
option isn't of much use if you have
several input files... Non-interface output files are
normally put in the same directory as their corresponding input file
came from. You may specify that they be put in another directory
using the -odir <dir>
(the
``Oh, dear'' option). For example:
% ghc -c parse/Foo.hs parse/Bar.hs gurgle/Bumble.hs -odir `arch`
The output files, Foo.o
, Bar.o
, and Bumble.o
would be
put into a subdirectory named after the architecture of the executing
machine (sun4
, mips
, etc). The directory must already
exist; it won't be created.
Note that the -odir
option does not affect where the
interface files are put. In the above example, they would still be
put in parse/Foo.hi
, parse/Bar.hi
, and gurgle/Bumble.hi
.
MORE EXOTICA: The -osuf <suffix>
will change the .o
file suffix for object files to
whatever you specify. (We use this in compiling the prelude.).
Similarly, the -hisuf <suffix>
will change the .hi
file suffix for non-system
interface files (see Section
Other options related to interface files).
The -hisuf
/-osuf
game is useful if you want to compile a program
with both GHC and HBC (say) in the same directory. Let HBC use the
standard .hi
/.o
suffixes; add -hisuf g_hi -osuf g_o
to your
make
rule for GHC compiling...
FURTHER EXOTICA: If you are doing a normal .hs
-to-.o
compilation
but would like to hang onto the intermediate .hc
C file, just
throw in a -keep-hc-file-too
option
.
If you would like to look at the assembler output, toss in a
-keep-s-file-too
,
too.
Sometimes, you may cause GHC to be rather chatty on standard error;
with -dshow-rn-trace
, for example. You can instruct GHC to
append this output to a particular log file with a -odump
<blah>
option.
If you have trouble because of running out of space in /tmp
(or
wherever your installation thinks temporary files should go), you may
use the -tmpdir <dir>
option
to specify an alternate directory. For example, -tmpdir .
says to
put temporary files in the current working directory.
Alternatively, use your TMPDIR
environment variable.
Set it to the name of the directory where
temporary files should be put. GCC and other programs will honour the
TMPDIR
variable as well.
Even better idea: Set the TMPDIR
variable when building GHC, and
never worry about TMPDIR
again. (see the build documentation).
GHC has a number of options that select which types of non-fatal error
messages, otherwise known as warnings, can be generated during
compilation. By default, you get a standard set of warnings which are
generally likely to indicate bugs in your program. These are:
-fwarn-overlapping-patterns
, -fwarn-duplicate-exports
, and
-fwarn-missing-methods
. The following flags are simple ways to
select standard ``packages'' of warnings:
-Wnot
:
Turns off all warnings, including the standard ones.
-w
:
Synonym for -Wnot
.
-W
:
Provides the standard warnings plus -fwarn-incomplete-patterns
,
-fwarn-unused-imports
and -fwarn-unused-binds
.
-Wall
:
Turns on all warning options.
The full set of warning options is described below. To turn off any
warning, simply give the corresponding -fno-warn-...
option on
the command line.
-fwarn-name-shadowing
:
This option causes a warning to be emitted whenever an inner-scope
value has the same name as an outer-scope value, i.e. the inner value
shadows the outer one. This can catch typographical errors that turn
into hard-to-find bugs, e.g., in the inadvertent cyclic definition
let x = ... x ... in
.
Consequently, this option does not allow cyclic recursive definitions.
-fwarn-hi-shadowing
:
Warns you about shadowing of interface files along the supplied import path.
For instance, assuming you invoke ghc
with the import path
-iutils:src
and Utils.hi
exist in both the utils
and src
directories, -fwarn-hi-shadowing
will warn you that utils/Utils.hi
shadows src/Utils.hi
.
-fwarn-overlapping-patterns
:
By default, the compiler will warn you if a set of patterns are either incomplete (i.e., you're only matching on a subset of an algebraic data type's constructors), or overlapping, i.e.,
f :: String -> Int
f [] = 0
f (_:xs) = 1
f "2" = 2
g [] = 2
where the last pattern match in f
won't ever be reached, as the
second pattern overlaps it. More often than not, redundant patterns
is a programmer mistake/error, so this option is enabled by default.
-fwarn-incomplete-patterns
:
Similarly for incomplete patterns, the function g
will fail when
applied to non-empty lists, so the compiler will emit a warning about
this when this option is enabled.
-fwarn-missing-methods
:
This option is on by default, and warns you whenever an instance declaration is missing one or more methods, and the corresponding class declaration has no default declaration for them.
-fwarn-unused-imports
:
Report any objects that are explicitly imported but never used.
-fwarn-unused-binds
:
Report any function definitions (and local bindings) which are unused. For top-level functions, the warning is only given if the binding is not exported.
-fwarn-unused-matches
:
Report all unused variables which arise from pattern matches,
including patterns consisting of a single variable. For instance f x
y = []
would report x
and y
as unused. To eliminate the warning,
all unused variables can be replaced with wildcards.
-fwarn-duplicate-exports
:
Have the compiler warn about duplicate entries in export lists. This is useful information if you maintain large export lists, and want to avoid the continued export of a definition after you've deleted (one) mention of it in the export list.
This option is on by default.
If you would like GHC to check that every top-level value has a type
signature, use the -fsignatures-required
option.
If you're feeling really paranoid, the -dcore-lint
option
is a good choice. It turns on
heavyweight intra-pass sanity-checking within GHC. (It checks GHC's
sanity, not yours.)
This section describes how GHC supports separate compilation.
When GHC compiles a source file F
which contains a module A
, say,
it generates an object F.o
, and a companion interface
file A.hi
.
NOTE: Having the name of the interface file follow the module name and
not the file name, means that working with tools such as make(1)
become harder. make
implicitly assumes that any output files
produced by processing a translation unit will have file names that
can be derived from the file name of the translation unit. For
instance, pattern rules becomes unusable. For this reason, we
recommend you stick to using the same file name as the module name.
The interface file for A
contains information needed by the compiler
when it compiles any module B
that imports A
, whether directly or
indirectly. When compiling B
, GHC will read A.hi
to find the
details that it needs to know about things defined in A
.
Furthermore, when compiling module C
which imports B
, GHC may
decide that it needs to know something about A
--- for example, B
might export a function that involves a type defined in A
. In this
case, GHC will go and read A.hi
even though C
does not explicitly
import A
at all.
The interface file may contain all sorts of things that aren't
explicitly exported from A
by the programmer. For example, even
though a data type is exported abstractly, A.hi
will contain the
full data type definition. For small function definitions, A.hi
will contain the complete definition of the function. For bigger
functions, A.hi
will contain strictness information about the
function. And so on. GHC puts much more information into .hi
files
when optimisation is turned on with the -O
flag. Without -O
it
puts in just the minimum; with -O
it lobs in a whole pile of stuff.
A.hi
should really be thought of as a compiler-readable version of
A.o
. If you use a .hi
file that wasn't generated by the same
compilation run that generates the .o
file the compiler may assume
all sorts of incorrect things about A
, resulting in core dumps and
other unpleasant happenings.
In your program, you import a module Foo
by saying
import Foo
. GHC goes looking for an interface file, Foo.hi
.
It has a builtin list of directories (notably including .
) where
it looks.
-i<dirs>
This flag
prepends a colon-separated list of dirs
to the ``import
directories'' list.
-i
resets the ``import directories'' list back to nothing.
-fno-implicit-prelude
GHC normally imports Prelude.hi
files for you. If you'd rather it
didn't, then give it a -fno-implicit-prelude
option. You are
unlikely to get very far without a Prelude, but, hey, it's a free
country.
-syslib <lib>
If you are using a system-supplied non-Prelude library (e.g., the
POSIX library), just use a -syslib posix
option (for example). The
right interface files should then be available. Section
The GHC Prelude and Libraries lists the
libraries available by this mechanism.
-I<dir>
Once a Haskell module has been compiled to C (.hc
file), you may
wish to specify where GHC tells the C compiler to look for .h
files.
(Or, if you are using the -cpp
option
, where
it tells the C pre-processor to look...) For this purpose, use a -I
option in the usual C-ish way.
The interface output may be directed to another file
bar2/Wurble.iface
with the option -ohi bar2/Wurble.iface
(not recommended).
To avoid generating an interface file at all, use a -nohi
option.
The compiler does not overwrite an existing .hi
interface file if
the new one is byte-for-byte the same as the old one; this is friendly
to make
. When an interface does change, it is often enlightening to
be informed. The -hi-diffs
option will
make ghc
run diff
on the old and new .hi
files. You can also
record the difference in the interface file itself, the
-keep-hi-diffs
option takes care of that.
The .hi
files from GHC contain ``usage'' information which changes
often and uninterestingly. If you really want to see these changes
reported, you need to use the
-hi-diffs-with-usages
option.
Interface files are normally jammed full of compiler-produced
pragmas, which record arities, strictness info, etc. If you
think these pragmas are messing you up (or you are doing some kind of
weird experiment), you can tell GHC to ignore them with the
-fignore-interface-pragmas
option.
When compiling without optimisations on, the compiler is extra-careful
about not slurping in data constructors and instance declarations that
it will not need. If you believe it is getting it wrong and not
importing stuff which you think it should, this optimisation can be
turned off with -fno-prune-tydecls
and -fno-prune-instdecls
.
See also Section Linking and consistency-checking, which describes how the linker finds standard Haskell libraries.
In the olden days, GHC compared the newly-generated .hi
file with
the previous version; if they were identical, it left the old one
alone and didn't change its modification date. In consequence,
importers of a module with an unchanged output .hi
file were not
recompiled.
This doesn't work any more. In our earlier example, module C
does
not import module A
directly, yet changes to A.hi
should force a
recompilation of C
. And some changes to A
(changing the
definition of a function that appears in an inlining of a function
exported by B
, say) may conceivably not change B.hi
one jot. So
now...
GHC keeps a version number on each interface file, and on each type
signature within the interface file. It also keeps in every interface
file a list of the version numbers of everything it used when it last
compiled the file. If the source file's modification date is earlier
than the .o
file's date (i.e. the source hasn't changed since the
file was last compiled), and you give GHC the -recomp
flag, then GHC will be clever. It compares the version
numbers on the things it needs this time with the version numbers on
the things it needed last time (gleaned from the interface file of the
module being compiled); if they are all the same it stops compiling
rather early in the process saying ``Compilation IS NOT required''.
What a beautiful sight!
It's still an experimental feature (that's why -recomp
is off by
default), so tell us if you think it doesn't work.
Patrick Sansom has a workshop paper about how all this is done. Ask him (email: sansom@dcs.gla.ac.uk) if you want a copy.
make
It is reasonably straightforward to set up a Makefile
to use with
GHC, assuming you name your source files the same as your modules.
Thus:
HC = ghc
HC_OPTS = -cpp $(EXTRA_HC_OPTS)
SRCS = Main.lhs Foo.lhs Bar.lhs
OBJS = Main.o Foo.o Bar.o
.SUFFIXES : .o .hi .lhs .hc .s
cool_pgm : $(OBJS)
rm $@
$(HC) -o $@ $(HC_OPTS) $(OBJS)
# Standard suffix rules
.o.hi:
@:
.lhs.o:
$(HC) -c $< $(HC_OPTS)
.hs.o:
$(HC) -c $< $(HC_OPTS)
# Inter-module dependencies
Foo.o Foo.hc Foo.s : Baz.hi # Foo imports Baz
Main.o Main.hc Main.s : Foo.hi Baz.hi # Main imports Foo and Baz
(Sophisticated make
variants may achieve some of the above more
elegantly. Notably, gmake
's pattern rules let you write the more
comprehensible:
%.o : %.lhs
$(HC) -c $< $(HC_OPTS)
What we've shown should work with any make
.)
Note the cheesy .o.hi
rule: It records the dependency of the
interface (.hi
) file on the source. The rule says a .hi
file can
be made from a .o
file by doing... nothing. Which is true.
Note the inter-module dependencies at the end of the Makefile, which take the form
Foo.o Foo.hc Foo.s : Baz.hi # Foo imports Baz
They tell make
that if any of Foo.o
, Foo.hc
or Foo.s
have an
earlier modification date than Baz.hi
, then the out-of-date file
must be brought up to date. To bring it up to date, make
looks for
a rule to do so; one of the preceding suffix rules does the job
nicely.
Putting inter-dependencies of the form Foo.o : Bar.hi
into your
Makefile
by hand is rather error-prone. ghc
offers you a helping
hand with it's -M
option. To automatically generate
inter-dependencies, add the following to your Makefile
:
depend :
$(HC) -M $(HC_OPTS) $(SRCS)
Now, before you start compiling, and any time you change the imports
in your program, do make depend
before you do make cool_pgm
.
ghc -M
will then append the needed dependencies to your Makefile
.
The dependencies are actually generated by another utility,
mkdependHS
, which ghc -M
just calls upon. mkdependHS
is
distributed with GHC and is documented in Section
Makefile dependencies in Haskell: using mkdependHS.
A few caveats about this simple scheme:
make Bar.o
to create Bar.hi
).
make
more than once for the dependencies
to have full effect. However, a make
run that does nothing
does mean ``everything's up-to-date.''
Currently, the compiler does not have proper support for dealing with mutually recursive modules:
module A where
import B
newtype A = A Int
f :: B -> A
f (B x) = A x
--------
module B where
import A
data B = B !Int
g :: A -> B
g (A x) = B x
When compiling either module A and B, the compiler will try (in vain)
to look for the interface file of the other. So, to get mutually
recursive modules off the ground, you need to hand write an interface
file for A or B, so as to break the loop. These hand-written
interface files are called hi-boot
files, and are placed in a file
called <module>.hi-boot
. To import from an hi-boot
file instead
of the standard .hi
file, use the following syntax in the importing module:
import {-# SOURCE #-} A
The hand-written interface need only contain the bare minimum of
information needed to get the bootstrapping process started. For
example, it doesn't need to contain declarations for everything
that module A
exports, only the things required by the module that
imports A
recursively.
For the example at hand, the boot interface file for A would like the following:
_interface_ A 1
_exports_
A(A);
_declarations_
1 newtype A = A PrelBase.Int ;
The syntax is essentially the same as a normal .hi
file
(unfortunately), but you can usually tailor an existing .hi
file to
make a .hi-boot
file.
Notice that we only put the declaration for the newtype A
in the
hi-boot
file, not the signature for f
, since f
isn't used by
B
.
The number ``1'' at the beginning of a declaration is the version
number of that declaration: for the purposes of .hi-boot
files
these can all be set to 1. All names must be fully qualified with the
original module that an object comes from: for example, the
reference to Int
in the interface for A
comes from PrelBase
,
which is a module internal to GHC's prelude. It's a pain, but that's
the way it is.
Note: This is all a temporary solution, a version of the compiler that handles mutually recursive properly without the manual construction of interface file, is in the works.
The -O*
options specify convenient ``packages'' of optimisation
flags; the -f*
options described later on specify
individual optimisations to be turned on/off; the -m*
options specify machine-specific optimisations to be turned
on/off.
-O*
: convenient ``packages'' of optimisation flags.
There are many options that affect the quality of code produced by GHC. Most people only have a general goal, something like ``Compile quickly'' or ``Make my program run like greased lightning.'' The following ``packages'' of optimisations (or lack thereof) should suffice.
Once you choose a -O*
``package,'' stick with it---don't chop and
change. Modules' interfaces will change with a shift to a new
-O*
option, and you may have to recompile a large chunk of all
importing modules before your program can again be run
safely (see Section
The recompilation checker).
-O*
-type option specified:
This is taken to mean: ``Please compile quickly; I'm not over-bothered
about compiled-code quality.'' So, for example: ghc -c Foo.hs
-O
or -O1
:
Means: ``Generate good-quality code without taking too long about it.''
Thus, for example: ghc -c -O Main.lhs
-O2
:Means: ``Apply every non-dangerous optimisation, even if it means significantly longer compile times.''
The avoided ``dangerous'' optimisations are those that can make runtime or space worse if you're unlucky. They are normally turned on or off individually.
At the moment, -O2
is unlikely to produce
better code than -O
.
-O2-for-C
:
Says to run GCC with -O2
, which may be worth a few percent in
execution speed. Don't forget -fvia-C
, lest you use the native-code
generator and bypass GCC altogether!
-Onot
:
This option will make GHC ``forget'' any -Oish options it has seen so
far. Sometimes useful; for example: make all EXTRA_HC_OPTS=-Onot
.
-Ofile <file>
:
For those who need absolute control over exactly
what options are used (e.g., compiler writers, sometimes :-), a list
of options can be put in a file and then slurped in with -Ofile
.
In that file, comments are of the #
-to-end-of-line variety; blank
lines and most whitespace is ignored.
Please ask if you are baffled and would like an example of -Ofile
!
At Glasgow, we don't use a -O*
flag for day-to-day work. We use
-O
to get respectable speed; e.g., when we want to measure
something. When we want to go for broke, we tend to use -O -fvia-C
-O2-for-C
(and we go for lots of coffee breaks).
The easiest way to see what -O
(etc) ``really mean'' is to run with
-v
, then stand back in amazement. Alternatively, just look at the
HsC_minus<blah>
lists in the ghc
driver script.
-f*
: platform-independent flags
Flags can be turned off individually. (NB: I hope you have a
good reason for doing this....) To turn off the -ffoo
flag, just use
the -fno-foo
flag.
So, for
example, you can say -O2 -fno-strictness
, which will then drop out
any running of the strictness analyser.
The options you are most likely to want to turn off are:
-fno-strictness
(strictness
analyser [because it is sometimes slow]),
-fno-specialise
(automatic
specialisation of overloaded functions [because it makes your code
bigger]) [US spelling also accepted], and
-fno-update-analyser
(update analyser, because it sometimes takes a long time).
Should you wish to turn individual flags on, you are advised
to use the -Ofile
option, described above. Because the order in
which optimisation passes are run is sometimes crucial, it's quite
hard to do with command-line options.
Here are some ``dangerous'' optimisations you might want to try:
-fvia-C
:
Compile via C, and don't use the native-code generator. (There are
many cases when GHC does this on its own.) You might pick up a little
bit of speed by compiling via C. If you use _ccall_gc_
s or
_casm_
s, you probably have to use -fvia-C
.
The lower-case incantation, -fvia-c
, is synonymous.
-funfolding-creation-threshold<n>
:(Default: 30) By raising or lowering this number, you can raise or lower the amount of pragmatic junk that gets spewed into interface files. (An unfolding has a ``size'' that reflects the cost in terms of ``code bloat'' of expanding that unfolding in another module. A bigger Core expression would be assigned a bigger cost.)
-funfolding-use-threshold<n>
:(Default: 3) By raising or lowering this number, you can make the compiler more or less keen to expand unfoldings.
OK, folks, these magic numbers `30' and `3' are mildly arbitrary; they are of the ``seem to be OK'' variety. The `3' is the more critical one; it's what determines how eager GHC is about expanding unfoldings.
W hen deciding what unfoldings from a module should be made available
-fsemi-tagging
:This option (which does not work with the native-code generator) tells the compiler to add extra code to test for already-evaluated values. You win if you have lots of such values during a run of your program, you lose otherwise. (And you pay in extra code space.)
We have not played with -fsemi-tagging
enough to recommend it.
(For all we know, it doesn't even work anymore... Sigh.)
-m*
: platform-specific flags
Some flags only make sense for particular target platforms.
-mv8
:(SPARC machines)
Means to pass the like-named option to GCC; it says to use the
Version 8 SPARC instructions, notably integer multiply and divide.
The similiar -m*
GCC options for SPARC also work, actually.
-mlong-calls
:(HPPA machines) Means to pass the like-named option to GCC. Required for Very Big modules, maybe. (Probably means you're in trouble...)
-monly-[32]-regs
:(iX86 machines) GHC tries to ``steal'' four registers from GCC, for performance reasons; it almost always works. However, when GCC is compiling some modules with four stolen registers, it will crash, probably saying:
Foo.hc:533: fixed or forbidden register was spilled.
This may be due to a compiler bug or to impossible asm
statements or clauses.
Just give some registers back with -monly-N-regs
. Try `3' first,
then `2'. If `2' doesn't work, please report the bug to us.
The C compiler (GCC) is run with -O
turned on. (It has
to be, actually).
If you want to run GCC with -O2
---which may be worth a few
percent in execution speed---you can give a
-O2-for-C
option.
The C pre-processor cpp
is run over your Haskell code only if the
-cpp
option
is given. Unless you are
building a large system with significant doses of conditional
compilation, you really shouldn't need it.
-D<foo>
:
Define macro <foo>
in the usual way. NB: does not affect
-D
macros passed to the C compiler when compiling via C! For
those, use the -optc-Dfoo
hack...
-U<foo>
:
Undefine macro <foo>
in the usual way.
-I<dir>
:
Specify a directory in which to look for #include
files, in
the usual C way.
The ghc
driver pre-defines several macros:
__HASKELL1__
:If defined to $n$, that means GHC supports the Haskell language defined in the Haskell report version $1.n$. Currently 4.
NB: This macro is set both when pre-processing Haskell source and
when pre-processing generated C (.hc
) files.
__GLASGOW_HASKELL__
:
For version $n$ of the GHC system, this will be #define
d to
$100 \times n$. So, for version 3.00, it is 300.
This macro is only set when pre-processing Haskell source. (Not when pre-processing generated C.)
With any luck, __GLASGOW_HASKELL__
will be undefined in all other
implementations that support C-style pre-processing.
(For reference: the comparable symbols for other systems are:
__HUGS__
for Hugs and __HBC__
for Chalmers.)
__CONCURRENT_HASKELL__
:
Only defined when -concurrent
is in use!
This symbol is defined when pre-processing Haskell (input) and
pre-processing C (GHC output).
__PARALLEL_HASKELL__
:
Only defined when -parallel
is in use! This symbol is defined when
pre-processing Haskell (input) and pre-processing C (GHC output).
Options other than the above can be forced through to the C
pre-processor with the -opt
flags (see
Section
Forcing options to a particular phase.).
A small word of warning: -cpp
is not friendly to ``string
gaps''.
. In
other words, strings such as the following:
strmod = "\
\ p \
\ "
don't work with -cpp
; /usr/bin/cpp
elides the
backslash-newline pairs.
However, it appears that if you add a space at the end of the line,
then cpp
(at least GNU cpp
and possibly other cpp
s)
leaves the backslash-space pairs alone and the string gap works as
expected.
At the moment, quite a few common C-compiler options are passed on quietly to the C compilation of Haskell-compiler-generated C files. THIS MAY CHANGE. Meanwhile, options so sent are:
-ansi
-pedantic
-dgcc-lint
If you are compiling with lots of ccalls
, etc., you may need to
tell the C compiler about some #include
files. There is no real
pretty way to do this, but you can use this hack from the
command-line:
% ghc -c '-#include <X/Xlib.h>' Xstuff.lhs
GHC has to link your code with various libraries, possibly including:
user-supplied, GHC-supplied, and system-supplied (-lm
math
library, for example).
-l<FOO>
:
Link in a library named lib<FOO>.a
which resides somewhere on the
library directories path.
Because of the sad state of most UNIX linkers, the order of such
options does matter. Thus: ghc -lbar *.o
is almost certainly
wrong, because it will search libbar.a
before it has
collected unresolved symbols from the *.o
files.
ghc *.o -lbar
is probably better.
The linker will of course be informed about some GHC-supplied libraries automatically; these are:
-l equivalent
-lHSrts,-lHSclib
-lHS
-lHS_cbits
-lgmp
-syslib <name>
:
If you are using a Haskell ``system library'' (e.g., the POSIX
library), just use the -syslib posix
option, and the correct code
should be linked in.
-L<dir>
:
Where to find user-supplied libraries... Prepend the directory
<dir>
to the library directories path.
-static
:Tell the linker to avoid shared libraries.
-no-link-chk
and -link-chk
:
By default, immediately after linking an executable, GHC verifies that
the pieces that went into it were compiled with compatible flags; a
``consistency check''.
(This is to avoid mysterious failures caused by non-meshing of
incompatibly-compiled programs; e.g., if one .o
file was compiled
for a parallel machine and the others weren't.) You may turn off this
check with -no-link-chk
. You can turn it (back) on with
-link-chk
(the default).
To compile a program as Concurrent Haskell, use the -concurrent
option,
both when compiling and
linking. You will probably need the -fglasgow-exts
option, too.
Three RTS options are provided for modifying the behaviour of the
threaded runtime system. See the descriptions of -C[<us>]
, -q
,
and -t<num>
in Section
RTS options for Concurrent/Parallel Haskell.
The main thread in a Concurrent Haskell program is given its own
private stack space, but all other threads are given stack space from
the heap. Stack space for the main thread can be
adjusted as usual with the -K
RTS
option,
but if this
private stack space is exhausted, the main thread will switch to stack
segments in the heap, just like any other thread. Thus, problems
which would normally result in stack overflow in ``sequential Haskell''
can be expected to result in heap overflow when using threads.
The concurrent runtime system uses black holes as synchronisation points for subexpressions which are shared among multiple threads. In ``sequential Haskell'', a black hole indicates a cyclic data dependency, which is a fatal error. However, in concurrent execution, a black hole may simply indicate that the desired expression is being evaluated by another thread. Therefore, when a thread encounters a black hole, it simply blocks and waits for the black hole to be updated. Cyclic data dependencies will result in deadlock, and the program will fail to terminate.
Because the concurrent runtime system uses black holes as
synchronisation points, it is not possible to disable black-holing
with the -N
RTS option.
Therefore, the use
of signal handlers (including timeouts) with the concurrent runtime
system can lead to problems if a thread attempts to enter a black hole
that was created by an abandoned computation. The use of signal
handlers in conjunction with threads is strongly discouraged.
[You won't be able to execute parallel Haskell programs unless PVM3 (Parallel Virtual Machine, version 3) is installed at your site.]
To compile a Haskell program for parallel execution under PVM, use the
-parallel
option,
both when compiling
and linking. You will probably want to import Parallel
into your Haskell modules.
To run your parallel program, once PVM is going, just invoke it ``as
normal''. The main extra RTS option is -N<n>
, to say how many
PVM ``processors'' your program to run on. (For more details of
all relevant RTS options, please see Section
RTS options for Concurrent/Parallel Haskell.)
In truth, running Parallel Haskell programs and getting information out of them (e.g., parallelism profiles) is a battle with the vagaries of PVM, detailed in the following sections.
Before you can run a parallel program under PVM, you must set the
required environment variables (PVM's idea, not ours); something like,
probably in your .cshrc
or equivalent:
setenv PVM_ROOT /wherever/you/put/it
setenv PVM_ARCH `$PVM_ROOT/lib/pvmgetarch`
setenv PVM_DPATH $PVM_ROOT/lib/pvmd
Creating and/or controlling your ``parallel machine'' is a purely-PVM business; nothing specific to Parallel Haskell.
You use the pvm
command to start PVM on your
machine. You can then do various things to control/monitor your
``parallel machine;'' the most useful being:
\begin{tabular}{ll}
Control-D
& exit pvm
, leaving it running \\
halt
& kill off this ``parallel machine'' \& exit \\
add <host>
& add <host>
as a processor \\
delete <host>
& delete <host>
\\
reset
& kill what's going, but leave PVM up \\
conf
& list the current configuration \\
ps
& report processes' status \\
pstat <pid>
& status of a particular process \\
\end{tabular}
The PVM documentation can tell you much, much more about pvm
!
With Parallel Haskell programs, we usually don't care about the results---only with ``how parallel'' it was! We want pretty pictures.
Parallelism profiles (\`a la hbcpp
) can be generated with the
-q
RTS option. The
per-processor profiling info is dumped into files named
<full-path><program>.gr
. These are then munged into a PostScript picture,
which you can then display. For example, to run your program
a.out
on 8 processors, then view the parallelism profile, do:
% ./a.out +RTS -N8 -q
% grs2gr *.???.gr > temp.gr # combine the 8 .gr files into one
% gr2ps -O temp.gr # cvt to .ps; output in temp.ps
% ghostview -seascape temp.ps # look at it!
The scripts for processing the parallelism profiles are distributed
in ghc/utils/parallel/
.
The ``garbage-collection statistics'' RTS options can be useful for
seeing what parallel programs are doing. If you do either
+RTS -Sstderr
or +RTS -sstderr
, then
you'll get mutator, garbage-collection, etc., times on standard
error. The standard error of all PE's other than the `main thread'
appears in /tmp/pvml.nnn
, courtesy of PVM.
Whether doing +RTS -Sstderr
or not, a handy way to watch
what's happening overall is: tail -f /tmp/pvml.nnn
.
Besides the usual runtime system (RTS) options (Section Running a compiled program), there are a few options particularly for concurrent/parallel execution.
-N<N>
:
(PARALLEL ONLY) Use <N>
PVM processors to run this program;
the default is 2.
-C[<us>]
:
Sets the context switch interval to <us>
microseconds. A context
switch will occur at the next heap allocation after the timer expires.
With -C0
or -C
, context switches will occur as often as
possible (at every heap allocation). By default, context switches
occur every 10 milliseconds. Note that many interval timers are only
capable of 10 millisecond granularity, so the default setting may be
the finest granularity possible, short of a context switch at every
heap allocation.
-q[v]
:
Produce a quasi-parallel profile of thread activity, in the file
<program>.qp
. In the style of hbcpp
, this profile records
the movement of threads between the green (runnable) and red (blocked)
queues. If you specify the verbose suboption (-qv
), the green
queue is split into green (for the currently running thread only) and
amber (for other runnable threads). We do not recommend that you use
the verbose suboption if you are planning to use the hbcpp
profiling tools or if you are context switching at every heap check
(with -C
).
-t<num>
:
Limit the number of concurrent threads per processor to <num>
.
The default is 32. Each thread requires slightly over 1K words
in the heap for thread state and stack objects. (For 32-bit machines,
this translates to 4K bytes, and for 64-bit machines, 8K bytes.)
-d
:
(PARALLEL ONLY) Turn on debugging. It pops up one xterm (or GDB, or
something...) per PVM processor. We use the standard debugger
script that comes with PVM3, but we sometimes meddle with the
debugger2
script. We include ours in the GHC distribution,
in ghc/utils/pvm/
.
-e<num>
:
(PARALLEL ONLY) Limit the number of pending sparks per processor to
<num>
. The default is 100. A larger number may be appropriate if
your program generates large amounts of parallelism initially.
-Q<num>
:
(PARALLEL ONLY) Set the size of packets transmitted between processors
to <num>
. The default is 1024 words. A larger number may be
appropriate if your machine has a high communication cost relative to
computation speed.
The ``Potential problems'' for Concurrent Haskell also apply for Parallel Haskell. Please see Section Potential problems with Concurrent Haskell.
To make an executable program, the GHC system compiles your code and then links it with a non-trivial runtime system (RTS), which handles storage management, profiling, etc.
You have some control over the behaviour of the RTS, by giving special command-line arguments to your program.
When your Haskell program starts up, its RTS extracts command-line
arguments bracketed between +RTS
and
-RTS
as its own. For example:
% ./a.out -f +RTS -pT -S -RTS -h foo bar
The RTS will snaffle -pT -S
for itself, and the remaining arguments
-f -h foo bar
will be handed to your program if/when it calls
System.getArgs
.
No -RTS
option is required if the runtime-system options extend to
the end of the command line, as in this example:
% hls -ltr /usr/etc +RTS -H5m
If you absolutely positively want all the rest of the options in a
command line to go to the program (and not the RTS), use a
--RTS
.
As always, for RTS options that take <size>
s: If the last
character of size
is a K or k, multiply by 1000; if an M or m, by
1,000,000; if a G or G, by 1,000,000,000. (And any wraparound in the
counters is your fault!)
Giving a +RTS -f
option will print out the
RTS options actually available in your program (which vary, depending
on how you compiled).
The most important RTS options are:
-H<size>
:
Set the heap size to <size>
bytes
[default: 4M].
-K<size>
:
Set the stack size to <size>
bytes [default: 64K].
For concurrent/parallel programs, it is the stack size of the main
thread; generally speaking, c/p stacks are in heap.
Note: if your program seems to be consuming infinite stack space, it is probably in a loop :-) Of course, if stacks are in the heap, make that infinite heap space...
-s<file>
or -S<file>
:
Write modest (-s
) or verbose (-S
) garbage-collector
statistics into file <file>
. The default <file>
is
<program>@.stat
. The <file>
stderr
is treated
specially, with the output really being sent to stderr
.
The amount of heap allocation will typically increase as the total heap size is reduced. The reason for this odd behaviour is that updates of promoted-to-old-generation objects may require the extra allocation of a new-generation object to ensure that there are never any pointers from the old generation to the new generation.
For some garbage collectors (not including the default one, sadly),
you can convert the -S
output into a residency graph (in
PostScript), using the stat2resid
utility in
the GHC distribution (ghc/utils/stat2resid
).
-N
:Normally, the garbage collector black-holes closures which are being evaluated, as a space-saving measure. That's exactly what you want for ordinary Haskell programs.
When signal handlers are present, however, a computation may be
abandoned prematurely, leaving black holes behind. If the signal
handler shares one of these black-holed closures, disaster can result.
Use the -N
option to prevent black-holing by the garbage collector
if you suspect that your signal handlers may share any
subexpressions with the top-level computation. Expect your heap usage
to increase, since the lifetimes of some closures may be extended.
Besides the -H
(set heap size) and -S
/-s
(GC stats) RTS
options, there are several options to give you precise control over
garbage collection.
-M<n>
:Minimum \ The default is 3\
-A<size>
:
Sets a limit on the size of the allocation area for generational
garbage collection to <size>
bytes (-A
gives default of 64k). If
a negative size is given the size of the allocation is fixed to
-<size>
. For non-generational collectors, it fixes the minimum
heap which must be available after a collection, overriding the
-M<n>
RTS option.
-G<size>
:
Sets the percentage of free space to be promoted before a major
collection is invoked to <size>
\
negative size is given it fixes the size of major generation threshold
to -<size>
bytes.
-F2s
:Forces a program compiled for generational GC to use two-space copying collection. The two-space collector may outperform the generational collector for programs which have a very low heap residency. It can also be used to generate a statistics file from which a basic heap residency profile can be produced (see Section stat2resid - residency info from GC stats).
There will still be a small execution overhead imposed by the generational compilation as the test for old generation updates will still be executed (of course none will actually happen). This overhead is typically less than 1\
-j<size>
:
Force a major garbage collection every <size>
bytes. (Normally
used because you're keen on getting major-GC stats, notably heap residency
info.)
The RTS options related to profiling are described in Section How to control your profiled program at runtime; and those for concurrent/parallel stuff, in Section RTS options for Concurrent/Parallel Haskell.
These RTS options might be used (a) to avoid a GHC bug, (b) to see ``what's really happening'', or (c) because you feel like it. Not recommended for everyday use!
-B
:Sound the bell at the start of each (major) garbage collection.
Oddly enough, people really do use this option! Our pal in Durham (England), Paul Callaghan, writes: ``Some people here use it for a variety of purposes---honestly!---e.g., confirmation that the code/machine is doing something, infinite loop detection, gauging cost of recently added code. Certain people can even tell what stage [the program] is in by the beep pattern. But the major use is for annoying others in the same office...''
-r<file>
:
Produce ``ticky-ticky'' statistics at the end of the program run.
The <file>
business works just like on the -S
RTS option (above).
``Ticky-ticky'' statistics are counts of various program actions
(updates, enters, etc.)
The program must have been compiled using
-fstg-reduction-counts
(a.k.a. ``ticky-ticky profiling''), and, for it to be really useful,
linked with suitable system libraries. Not a trivial undertaking:
consult the installation guide on how to set things up for
easy ``ticky-ticky'' profiling.
-T<num>
:
An RTS debugging flag; varying quantities of output depending on which bits
are set in <num>
.
-Z
:Turn off ``update-frame squeezing'' at garbage-collection time. (There's no particularly good reason to turn it off.)
GHC lets you exercise rudimentary control over the RTS settings for any given program, by compiling in a ``hook'' that is called by the run-time system. The RTS contains stub definitions for all these hooks, but by writing your own version and linking it on the GHC command line, you can override the defaults.
The function defaultsHook
lets you change various
RTS options. The commonest use for this is to give your program a
default heap and/or stack size that is greater than the default. For
example, to set -H8m -K1m
:
#include "rtsdefs.h"
void defaultsHook (void) {
RTSflags.GcFlags.stksSize = 1000002 / sizeof(W_);
RTSflags.GcFlags.heapSize = 8000002 / sizeof(W_);
}
Don't use powers of two for heap/stack sizes: these are more likely to
interact badly with direct-mapped caches. The full set of flags is
defined in ghc/includes/RtsFlags.lh
the the GHC source tree.
You can also change the messages printed when the runtime system ``blows up,'' e.g., on stack overflow. The hooks for these are as follows:
void ErrorHdrHook (FILE *)
:
What's printed out before the message from error
.
void OutOfHeapHook (unsigned long, unsigned long)
:The heap-overflow message.
void StackOverflowHook (long int)
:The stack-overflow message.
void MallocFailHook (long int)
:
The message printed if malloc
fails.
void PatErrorHdrHook (FILE *)
:The message printed if a pattern-match fails (the failures that were not handled by the Haskell programmer).
void PreTraceHook (FILE *)
:
What's printed out before a trace
message.
void PostTraceHook (FILE *)
:
What's printed out after a trace
message.
For example, here is the ``hooks'' code used by GHC itself:
#include <stdio.h>
#define W_ unsigned long int
#define I_ long int
void
ErrorHdrHook (FILE *where)
{
fprintf(where, "\n"); /* no "Fail: " */
}
void
OutOfHeapHook (W_ request_size, W_ heap_size) /* both sizes in bytes */
{
fprintf(stderr, "GHC's heap exhausted;\nwhile trying to
allocate %lu bytes in a %lu-byte heap;\nuse the `-H<size>'
option to increase the total heap size.\n",
request_size,
heap_size);
}
void
StackOverflowHook (I_ stack_size) /* in bytes */
{
fprintf(stderr, "GHC stack-space overflow: current size
%ld bytes.\nUse the `-K<size>' option to increase it.\n",
stack_size);
}
void
PatErrorHdrHook (FILE *where)
{
fprintf(where, "\n*** Pattern-matching error within GHC!\n\n
This is a compiler bug; please report it to
glasgow-haskell-bugs@dcs.gla.ac.uk.\n\nFail: ");
}
void
PreTraceHook (FILE *where)
{
fprintf(where, "\n"); /* not "Trace On" */
}
void
PostTraceHook (FILE *where)
{
fprintf(where, "\n"); /* not "Trace Off" */
}
Glasgow Haskell comes with a time and space profiling system. Its purpose is to help you improve your understanding of your program's execution behaviour, so you can improve it.
Any comments, suggestions and/or improvements you have to are welcome. Recommended ``profiling tricks'' would be especially cool!
The GHC approach to profiling is very simple: annotate the expressions you consider ``interesting'' with cost centre labels (strings); so, for example, you might have:
f x y
= let
output1 = _scc_ "Pass1" ( pass1 x )
output2 = _scc_ "Pass2" ( pass2 output1 y )
output3 = _scc_ "Pass3" ( pass3 (output2 `zip` [1 .. ]) )
in concat output3
The costs of the evaluating the expressions bound to output1
,
output2
and output3
will be attributed to the ``cost
centres'' Pass1
, Pass2
and Pass3
, respectively.
The costs of evaluating other expressions, e.g., concat output4
,
will be inherited by the scope which referenced the function f
.
You can put in cost-centres via _scc_
constructs by hand, as in the
example above. Perfectly cool. That's probably what you
would do if your program divided into obvious ``passes'' or
``phases'', or whatever.
If your program is large or you have no clue what might be gobbling
all the time, you can get GHC to mark all functions with _scc_
constructs, automagically. Add an -auto
compilation flag to the
usual -prof
option.
Once you start homing in on the Guilty Suspects, you may well switch from automagically-inserted cost-centres to a few well-chosen ones of your own.
To use profiling, you must compile and run with special options. (We usually forget the ``run'' magic!---Do as we say, not as we do...) Details follow.
If you're serious about this profiling game, you should probably read one or more of the Sansom/Peyton Jones papers about the GHC profiling system. Just visit the Glasgow FP group web page...
To make use of the cost centre profiling system all modules must
be compiled and linked with the -prof
option.
Any _scc_
constructs you've put in your source will spring to life.
Without a -prof
option, your _scc_
s are ignored; so you can
compiled _scc_
-laden code without changing it.
There are a few other profiling-related compilation options. Use them
in addition to -prof
. These do not have to be used
consistently for all modules in a program.
-auto
:
GHC will automatically add _scc_
constructs for
all top-level, exported functions.
-auto-all
:
All top-level functions, exported or not, will be automatically
_scc_
'd.
-caf-all
:The costs of all CAFs in a module are usually attributed to one ``big'' CAF cost-centre. With this option, all CAFs get their own cost-centre. An ``if all else fails'' option...
-ignore-scc
:
Ignore any _scc_
constructs,
so a module which already has _scc_
s can be
compiled for profiling with the annotations ignored.
-G<group>
:
Specifies the <group>
to be attached to all the cost-centres
declared in the module. If no group is specified it defaults to the
module name.
In addition to the -prof
option your system might be setup to enable
you to compile and link with the -prof-details
option instead. This enables additional detailed counts
to be reported with the -P
RTS option.
It isn't enough to compile your program for profiling with -prof
!
When you run your profiled program, you must tell the runtime system (RTS) what you want to profile (e.g., time and/or space), and how you wish the collected data to be reported. You also may wish to set the sampling interval used in time profiling.
Executive summary: ./a.out +RTS -pT
produces a time profile in
a.out.prof
; ./a.out +RTS -hC
produces space-profiling
info which can be mangled by hp2ps
and viewed with ghostview
(or equivalent).
Profiling runtime flags are passed to your program between the usual
+RTS
and -RTS
options.
-p<sort>
or -P<sort>
:
The -p?
option produces a standard time profile report.
It is written into the file <program>@.prof
.
The -P?
option produces a more detailed report containing the
actual time and allocation data as well. (Not used much.)
The <sort>
indicates how the cost centres are to be sorted in the
report. Valid <sort>
options are:
T
:by time, largest first (the default);
A
:by bytes allocated, largest first;
C
:alphabetically by group, module and cost centre.
-i<secs>
:
Set the profiling (sampling) interval to <secs>
seconds (the default is 1 second). Fractions are allowed: for example
-i0.2
will get 5 samples per second.
-h<break-down>
:
Produce a detailed space profile of the heap occupied by live
closures. The profile is written to the file <program>@.hp
from
which a PostScript graph can be produced using hp2ps
(see Section
hp2ps - heap profile to PostScript).
The heap space profile may be broken down by different criteria:
-hC
:cost centre which produced the closure (the default).
-hM
:cost centre module which produced the closure.
-hG
:cost centre group which produced the closure.
-hD
:closure description --- a string describing the closure.
-hY
:closure type --- a string describing the closure's type.
Heap (space) profiling uses hash tables. If these tables
should fill the run will abort. The
-z<tbl><size>
option is used to
increase the size of the relevant hash table (C
, M
,
G
, D
or Y
, defined as for <break-down>
above). The
actual size used is the next largest power of 2.
The heap profile can be restricted to particular closures of interest. The closures of interest can selected by the attached cost centre (module:label, module and group), closure category (description, type, and kind) using the following options:
-c{<mod>:<lab>,<mod>:<lab>...
}:RTS option (profiling)} Selects individual cost centre(s).
-m{<mod>,<mod>...
}:RTS option (profiling)} Selects all cost centres from the module(s) specified.
-g{<grp>,<grp>...
}:RTS option (profiling)} Selects all cost centres from the groups(s) specified.
-d{<des>,<des>...
}:RTS option (profiling)} Selects closures which have one of the specified descriptions.
-y{<typ>,<typ>...
}:RTS option (profiling)} Selects closures which have one of the specified type descriptions.
-k{<knd>,<knd>...
}:
RTS option (profiling)}
Selects closures which are of one of the specified closure kinds.
Valid closure kinds are CON
(constructor), FN
(manifest
function), PAP
(partial application), BH
(black hole) and
THK
(thunk).
The space occupied by a closure will be reported in the heap profile if the closure satisfies the following logical expression:
([-c] or [-m] or [-g]) and ([-d] or [-y] or [-k])
where a particular option is true if the closure (or its attached cost centre) is selected by the option (or the option is not specified).
When you run your profiled program with the -p
RTS option
, you get the following information about your ``cost
centres'':
COST CENTRE
:The cost-centre's name.
MODULE
:The module associated with the cost-centre; important mostly if you have identically-named cost-centres in different modules.
scc
:How many times this cost-centre was entered; think
of it as ``I got to the _scc_
construct this many times...''
%time
:What part of the time was spent in this cost-centre (see also ``ticks,'' below).
%alloc
:What part of the memory allocation was done in this cost-centre (see also ``bytes,'' below).
inner
:How many times this cost-centre ``passed control'' to an inner
cost-centre; for example, scc=4
plus subscc=8
means
``This _scc_
was entered four times, but went out to
other _scc_s
eight times.''
cafs
:How many CAFs this cost centre evaluated.
dicts
:How many dictionaries this cost centre evaluated.
In addition you can use the -P
RTS option
to get the following additional information:
ticks
:The raw number of time ``ticks'' which were
attributed to this cost-centre; from this, we get the %time
figure mentioned above.
bytes
:Number of bytes allocated in the heap while in
this cost-centre; again, this is the raw number from which we
get the %alloc
figure mentioned above.
Finally if you built your program with -prof-details
the -P
RTS option will also
produce the following information:
closures
:How many heap objects were allocated; these objects may be of varying size. If you divide the number of bytes (mentioned below) by this number of ``closures'', then you will get the average object size. (Not too interesting, but still...)
thunks
:How many times we entered (evaluated) a thunk---an unevaluated object in the heap---while we were in this cost-centre.
funcs
:How many times we entered (evaluated) a function while we we in this cost-centre. (In Haskell, functions are first-class values and may be passed as arguments, returned as results, evaluated, and generally manipulated just like data values)
PAPs
:
How many times we entered (evaluated) a partial application (PAP), i.e.,
a function applied to fewer arguments than it needs. For example, Int
addition applied to one argument would be a PAP. A PAP is really
just a particular form for a function.
Utility programs which produce graphical profiles.
hp2ps
--heap profile to PostScript
Usage:
hp2ps [flags] [<file>[.stat]]
The program hp2ps
converts a heap profile
as produced by the -h<break-down>
runtime option into a PostScript graph of the heap
profile. By convention, the file to be processed by hp2ps
has a
.hp
extension. The PostScript output is written to <file>@.ps
. If
<file>
is omitted entirely, then the program behaves as a filter.
hp2ps
is distributed in ghc/utils/hp2ps
in a GHC source
distribution. It was originally developed by Dave Wakeling as part of
the HBC/LML heap profiler.
The flags are:
-d
In order to make graphs more readable, hp2ps
sorts the shaded
bands for each identifier. The default sort ordering is for the bands
with the largest area to be stacked on top of the smaller ones. The
-d
option causes rougher bands (those representing series of
values with the largest standard deviations) to be stacked on top of
smoother ones.
-b
Normally, hp2ps
puts the title of the graph in a small box at the
top of the page. However, if the JOB string is too long to fit in a
small box (more than 35 characters), then
hp2ps
will choose to use a big box instead. The -b
option forces hp2ps
to use a big box.
-e<float>[in|mm|pt]
Generate encapsulated PostScript suitable for inclusion in LaTeX
documents. Usually, the PostScript graph is drawn in landscape mode
in an area 9 inches wide by 6 inches high, and hp2ps
arranges
for this area to be approximately centred on a sheet of a4 paper.
This format is convenient of studying the graph in detail, but it is
unsuitable for inclusion in LaTeX documents. The -e
option
causes the graph to be drawn in portrait mode, with float specifying
the width in inches, millimetres or points (the default). The
resulting PostScript file conforms to the Encapsulated PostScript
(EPS) convention, and it can be included in a LaTeX document using
Rokicki's dvi-to-PostScript converter dvips
.
-g
Create output suitable for the gs
PostScript previewer (or
similar). In this case the graph is printed in portrait mode without
scaling. The output is unsuitable for a laser printer.
-l
Normally a profile is limited to 20 bands with additional identifiers
being grouped into an OTHER
band. The -l
flag removes this
20 band and limit, producing as many bands as necessary. No key is
produced as it won't fit!. It is useful for creation time profiles
with many bands.
-m<int>
Normally a profile is limited to 20 bands with additional identifiers
being grouped into an OTHER
band. The -m
flag specifies an
alternative band limit (the maximum is 20).
-m0
requests the band limit to be removed. As many bands as
necessary are produced. However no key is produced as it won't fit! It
is useful for displaying creation time profiles with many bands.
-p
Use previous parameters. By default, the PostScript graph is
automatically scaled both horizontally and vertically so that it fills
the page. However, when preparing a series of graphs for use in a
presentation, it is often useful to draw a new graph using the same
scale, shading and ordering as a previous one. The -p
flag causes
the graph to be drawn using the parameters determined by a previous
run of hp2ps
on file
. These are extracted from
file@.aux
.
-s
Use a small box for the title.
-t<float>
Normally trace elements which sum to a total of less than 1\
profile are removed from the profile. The -t
option allows this
percentage to be modified (maximum 5\
-t0
requests no trace elements to be removed from the profile,
ensuring that all the data will be displayed.
-?
Print out usage information.
-c
Fill in the bands with colours rather than shades of grey. Some people find colour plots easier to read (especially when viewed on a non-monochrome medium ;-)
stat2resid
---residency info from GC stats
Usage:
stat2resid [<file>[.stat] [<outfile>]]
The program stat2resid
converts a detailed
garbage collection statistics file produced by the
-S
runtime option into a PostScript heap
residency graph. The garbage collection statistics file can be
produced without compiling your program for profiling.
By convention, the file to be processed by stat2resid
has a
.stat
extension. If the <outfile>
is not specified the
PostScript will be written to <file>@.resid.ps
. If
<file>
is omitted entirely, then the program behaves as a filter.
The plot can not be produced from the statistics file for a
generational collector, though a suitable stats file can be produced
using the -F2s
runtime option when the
program has been compiled for generational garbage collection (the
default).
stat2resid
is distributed in ghc/utils/stat2resid
in a GHC source
distribution.
(ToDo: document properly.)
It is possible to compile Glasgow Haskell programs so that they will count lots and lots of interesting things, e.g., number of updates, number of data constructors entered, etc., etc. We call this ``ticky-ticky'' profiling, because that's the sound a Sun4 makes when it is running up all those counters (slowly).
Ticky-ticky profiling is mainly intended for implementors; it is quite separate from the main ``cost-centre'' profiling system, intended for all users everywhere.
To be able to use ticky-ticky profiling, you will need to have built appropriate libraries and things when you made the system. See ``Customising what libraries to build,'' in the installation guide.
To get your compiled program to spit out the ticky-ticky numbers, use
a -r
RTS option
.
HACKER TERRITORY. HACKER TERRITORY. (You were warned.)
You may specify that a different program
be used for one of the phases of the compilation system, in place of
whatever the driver ghc
has wired into it. For example, you
might want to try a different assembler. The
-pgm<phase-code><program-name>
option to
ghc
will cause it to use <program-name>
for phase
<phase-code>
, where the codes to indicate the phases are:
code
The preceding sections describe driver options that are mostly
applicable to one particular phase. You may also force a
specific option <option>
to be passed to a particular phase
<phase-code>
by feeding the driver the option
-opt<phase-code><option>
.
The
codes to indicate the phases are the same as in the previous section.
So, for example, to force an -Ewurble
option to the assembler, you
would tell the driver -opta-Ewurble
(the dash before the E is
required).
Besides getting options to the Haskell compiler with -optC<blah>
,
you can get options through to its runtime system with
-optCrts<blah>
.
So, for example: when I want to use my normal driver but with my profiled compiler binary, I use this script:
#! /bin/sh
exec /local/grasp_tmp3/simonpj/ghc-BUILDS/working-alpha/ghc/driver/ghc \
-pgmC/local/grasp_tmp3/simonpj/ghc-BUILDS/working-hsc-prof/hsc \
-optCrts-i0.5 \
-optCrts-PT \
"$@"
-noC
:
Don't bother generating C output or an interface file. Usually
used in conjunction with one or more of the -ddump-*
options; for
example: ghc -noC -ddump-simpl Foo.hs
-hi
:
Do generate an interface file (on stdout
.) This would
normally be used in conjunction with -noC
, which turns off interface
generation; thus: -noC -hi
.
-hi-with-<section>
:
Generate just the specified section of an interface file. In case you're
only interested in a subset of what -hi
outputs, -hi-with-<section>
is just the ticket. For instance
-noC -hi-with-declarations -hi-with-exports
will output the sections containing the exports and the
declarations. Legal sections are: declarations
, exports
,
instances
, instance_modules
, usages
, fixities
, and
interface
.
-dshow-passes
:Prints a message to stderr as each pass starts. Gives a warm but undoubtedly misleading feeling that GHC is telling you what's happening.
-ddump-<pass>
:
Make a debugging dump after pass <pass>
(may be common enough to
need a short form...). Some of the most useful ones are:
-ddump-rdr
-ddump-rn
-ddump-tc
-ddump-deriv
-ddump-ds
-ddump-simpl
-ddump-stranal
-ddump-occur-anal
-ddump-spec
-ddump-stg
-ddump-absC
-ddump-flatC
-ddump-realC
-ddump-asm
-dverbose-simpl
and -dverbose-stg
:Show the output of the intermediate Core-to-Core and STG-to-STG passes, respectively. (Lots of output!) So: when we're really desperate:
% ghc -noC -O -ddump-simpl -dverbose-simpl -dcore-lint Foo.hs
-dppr-{user,debug,all
}:Debugging output is in one of several ``styles.'' Take the printing of types, for example. In the ``user'' style, the compiler's internal ideas about types are presented in Haskell source-level syntax, insofar as possible. In the ``debug'' style (which is the default for debugging output), the types are printed in the most-often-desired form, with explicit foralls, etc. In the ``show all'' style, very verbose information about the types (e.g., the Uniques on the individual type variables) is displayed.
-ddump-raw-asm
:Dump out the assembly-language stuff, before the ``mangler'' gets it.
-ddump-rn-trace
:Make the renamer be *real* chatty about what it is upto.
-dshow-rn-stats
:Print out summary of what kind of information the renamer had to bring in.
-dshow-unused-imports
:Have the renamer report what imports does not contribute.
-ddump-*
flags)
Let's do this by commenting an example. It's from doing
-ddump-ds
on this code:
skip2 m = m : skip2 (m+2)
Before we jump in, a word about names of things. Within GHC,
variables, type constructors, etc., are identified by their
``Uniques.'' These are of the form `letter' plus `number' (both
loosely interpreted). The `letter' gives some idea of where the
Unique came from; e.g., _
means ``built-in type variable'';
t
means ``from the typechecker''; s
means ``from the
simplifier''; and so on. The `number' is printed fairly compactly in
a `base-62' format, which everyone hates except me (WDP).
Remember, everything has a ``Unique'' and it is usually printed out when debugging, in some form or another. So here we go...
Desugared:
Main.skip2{-r1L6-} :: _forall_ a$_4 =>{{Num a$_4}} -> a$_4 -> [a$_4]
--# `r1L6' is the Unique for Main.skip2;
--# `_4' is the Unique for the type-variable (template) `a'
--# `{{Num a$_4}}' is a dictionary argument
_NI_
--# `_NI_' means "no (pragmatic) information" yet; it will later
--# evolve into the GHC_PRAGMA info that goes into interface files.
Main.skip2{-r1L6-} =
/\ _4 -> \ d.Num.t4Gt ->
let {
{- CoRec -}
+.t4Hg :: _4 -> _4 -> _4
_NI_
+.t4Hg = (+{-r3JH-} _4) d.Num.t4Gt
fromInt.t4GS :: Int{-2i-} -> _4
_NI_
fromInt.t4GS = (fromInt{-r3JX-} _4) d.Num.t4Gt
--# The `+' class method (Unique: r3JH) selects the addition code
--# from a `Num' dictionary (now an explicit lamba'd argument).
--# Because Core is 2nd-order lambda-calculus, type applications
--# and lambdas (/\) are explicit. So `+' is first applied to a
--# type (`_4'), then to a dictionary, yielding the actual addition
--# function that we will use subsequently...
--# We play the exact same game with the (non-standard) class method
--# `fromInt'. Unsurprisingly, the type `Int' is wired into the
--# compiler.
lit.t4Hb :: _4
_NI_
lit.t4Hb =
let {
ds.d4Qz :: Int{-2i-}
_NI_
ds.d4Qz = I#! 2#
} in fromInt.t4GS ds.d4Qz
--# `I# 2#' is just the literal Int `2'; it reflects the fact that
--# GHC defines `data Int = I# Int#', where Int# is the primitive
--# unboxed type. (see relevant info about unboxed types elsewhere...)
--# The `!' after `I#' indicates that this is a *saturated*
--# application of the `I#' data constructor (i.e., not partially
--# applied).
skip2.t3Ja :: _4 -> [_4]
_NI_
skip2.t3Ja =
\ m.r1H4 ->
let { ds.d4QQ :: [_4]
_NI_
ds.d4QQ =
let {
ds.d4QY :: _4
_NI_
ds.d4QY = +.t4Hg m.r1H4 lit.t4Hb
} in skip2.t3Ja ds.d4QY
} in
:! _4 m.r1H4 ds.d4QQ
{- end CoRec -}
} in skip2.t3Ja
(``It's just a simple functional language'' is an unregisterised trademark of Peyton Jones Enterprises, plc.)
Sometimes it is useful to make the connection between a source file
and the command-line options it requires quite tight. For instance,
if a (Glasgow) Haskell source file uses casm
s, the C back-end
often needs to be told about which header files to include. Rather than
maintaining the list of files the source depends on in a
Makefile
(using the -#include
command-line option), it is
possible to do this directly in the source file using the OPTIONS
pragma
:
{-# OPTIONS -#include "foo.h" #-}
module X where
...
OPTIONS
pragmas are only looked for at the top of your source
files, upto the first (non-literate,non-empty) line not containing
OPTIONS
. Multiple OPTIONS
pragmas are recognised. Note
that your command shell does not get to the source file options, they
are just included literally in the array of command-line arguments
the compiler driver maintains internally, so you'll be desperately
disappointed if you try to glob etc. inside OPTIONS
.
NOTE: the contents of OPTIONS are prepended to the command-line options, so you *do* have the ability to override OPTIONS settings via the command line.
It is not recommended to move all the contents of your Makefiles into
your source files, but in some circumstances, the OPTIONS
pragma
is the Right Thing. (If you use -keep-hc-file-too
and have OPTION
flags in your module, the OPTIONS will get put into the generated .hc
file).