3.13. Debugging the compiler

HACKER TERRITORY. HACKER TERRITORY. (You were warned.)

3.13.1. Replacing the program for one or more phases.

You may specify that a different program be used for one of the phases of the compilation system, in place of whatever the driver ghc has wired into it. For example, you might want to try a different assembler. The -pgm<phase-code><program-name> option to ghc will cause it to use <program-name> for phase <phase-code>, where the codes to indicate the phases are:

code phase
L literate pre-processor
P C pre-processor (if -cpp only)
C Haskell compiler
c C compiler
a assembler
l linker
dep Makefile dependency generator

3.13.2. Forcing options to a particular phase.

The preceding sections describe driver options that are mostly applicable to one particular phase. You may also force a specific option <option> to be passed to a particular phase <phase-code> by feeding the driver the option -opt<phase-code><option>. The codes to indicate the phases are the same as in the previous section.

So, for example, to force an -Ewurble option to the assembler, you would tell the driver -opta-Ewurble (the dash before the E is required).

Besides getting options to the Haskell compiler with -optC<blah>, you can get options through to its runtime system with -optCrts<blah>.

So, for example: when I want to use my normal driver but with my profiled compiler binary, I use this script:
#! /bin/sh
exec /local/grasp_tmp3/simonpj/ghc-BUILDS/working-alpha/ghc/driver/ghc \
     -pgmC/local/grasp_tmp3/simonpj/ghc-BUILDS/working-hsc-prof/hsc \
     -optCrts-i0.5 \
     -optCrts-PT \
     "$@"

3.13.3. Dumping out compiler intermediate structures

-noC:

Don't bother generating C output or an interface file. Usually used in conjunction with one or more of the -ddump-* options; for example: ghc -noC -ddump-simpl Foo.hs

-hi:

Do generate an interface file. This would normally be used in conjunction with -noC, which turns off interface generation; thus: -noC -hi.

-dshow-passes:

Prints a message to stderr as each pass starts. Gives a warm but undoubtedly misleading feeling that GHC is telling you what's happening.

-ddump-<pass>:

Make a debugging dump after pass <pass> (may be common enough to need a short form…). You can get all of these at once (lots of output) by using -ddump-all, or most of them with -ddump-most. Some of the most useful ones are:

-ddump-parsed:

parser output

-ddump-rn:

renamer output

-ddump-tc:

typechecker output

-ddump-types:

Dump a type signature for each value defined at the top level of the module. The list is sorted alphabetically. Using -dppr-debug dumps a type signature for all the imported and system-defined things as well; useful for debugging the compiler.

-ddump-deriv:

derived instances

-ddump-ds:

desugarer output

-ddump-spec:

output of specialisation pass

-ddump-rules:

dumps all rewrite rules (including those generated by the specialisation pass)

-ddump-simpl:

simplifer output (Core-to-Core passes)

-ddump-usagesp:

UsageSP inference pre-inf and output

-ddump-cpranal:

CPR analyser output

-ddump-stranal:

strictness analyser output

-ddump-workwrap:

worker/wrapper split output

-ddump-occur-anal:

`occurrence analysis' output

-ddump-stg:

output of STG-to-STG passes

-ddump-absC:

unflattened Abstract C

-ddump-flatC:

flattened Abstract C

-ddump-realC:

same as what goes to the C compiler

-ddump-asm:

assembly language from the native-code generator

-dverbose-simpl and -dverbose-stg:

Show the output of the intermediate Core-to-Core and STG-to-STG passes, respectively. (Lots of output!) So: when we're really desperate:
% ghc -noC -O -ddump-simpl -dverbose-simpl -dcore-lint Foo.hs

-ddump-simpl-iterations:

Show the output of each iteration of the simplifier (each run of the simplifier has a maximum number of iterations, normally 4). Used when even -dverbose-simpl doesn't cut it.

-dppr-{user,debug}:

Debugging output is in one of several “styles.” Take the printing of types, for example. In the “user” style, the compiler's internal ideas about types are presented in Haskell source-level syntax, insofar as possible. In the “debug” style (which is the default for debugging output), the types are printed in with explicit foralls, and variables have their unique-id attached (so you can check for things that look the same but aren't).

-ddump-simpl-stats:

Dump statistics about how many of each kind of transformation too place. If you add -dppr-debug you get more detailed information.

-ddump-raw-asm:

Dump out the assembly-language stuff, before the “mangler” gets it.

-ddump-rn-trace:

Make the renamer be *real* chatty about what it is upto.

-dshow-rn-stats:

Print out summary of what kind of information the renamer had to bring in.

-dshow-unused-imports:

Have the renamer report what imports does not contribute.

3.13.4. Checking for consistency

-dcore-lint:

Turn on heavyweight intra-pass sanity-checking within GHC, at Core level. (It checks GHC's sanity, not yours.)

-dstg-lint:

Ditto for STG level.

-dusagesp-lint:

Turn on checks around UsageSP inference (-fusagesp). This verifies various simple properties of the results of the inference, and also warns if any identifier with a used-once annotation before the inference has a used-many annotation afterwards; this could indicate a non-worksafe transformation is being applied.

3.13.5. How to read Core syntax (from some -ddump-* flags)

Let's do this by commenting an example. It's from doing -ddump-ds on this code:
skip2 m = m : skip2 (m+2)

Before we jump in, a word about names of things. Within GHC, variables, type constructors, etc., are identified by their “Uniques.” These are of the form `letter' plus `number' (both loosely interpreted). The `letter' gives some idea of where the Unique came from; e.g., _ means “built-in type variable”; t means “from the typechecker”; s means “from the simplifier”; and so on. The `number' is printed fairly compactly in a `base-62' format, which everyone hates except me (WDP).

Remember, everything has a “Unique” and it is usually printed out when debugging, in some form or another. So here we go…

Desugared:
Main.skip2{-r1L6-} :: _forall_ a$_4 =>{{Num a$_4}} -> a$_4 -> [a$_4]

--# `r1L6' is the Unique for Main.skip2;
--# `_4' is the Unique for the type-variable (template) `a'
--# `{{Num a$_4}}' is a dictionary argument

_NI_

--# `_NI_' means "no (pragmatic) information" yet; it will later
--# evolve into the GHC_PRAGMA info that goes into interface files.

Main.skip2{-r1L6-} =
    /\ _4 -> \ d.Num.t4Gt ->
        let {
          {- CoRec -}
          +.t4Hg :: _4 -> _4 -> _4
          _NI_
          +.t4Hg = (+{-r3JH-} _4) d.Num.t4Gt

          fromInt.t4GS :: Int{-2i-} -> _4
          _NI_
          fromInt.t4GS = (fromInt{-r3JX-} _4) d.Num.t4Gt

--# The `+' class method (Unique: r3JH) selects the addition code
--# from a `Num' dictionary (now an explicit lamba'd argument).
--# Because Core is 2nd-order lambda-calculus, type applications
--# and lambdas (/\) are explicit.  So `+' is first applied to a
--# type (`_4'), then to a dictionary, yielding the actual addition
--# function that we will use subsequently...

--# We play the exact same game with the (non-standard) class method
--# `fromInt'.  Unsurprisingly, the type `Int' is wired into the
--# compiler.

          lit.t4Hb :: _4
          _NI_
          lit.t4Hb =
              let {
                ds.d4Qz :: Int{-2i-}
                _NI_
                ds.d4Qz = I#! 2#
              } in  fromInt.t4GS ds.d4Qz

--# `I# 2#' is just the literal Int `2'; it reflects the fact that
--# GHC defines `data Int = I# Int#', where Int# is the primitive
--# unboxed type.  (see relevant info about unboxed types elsewhere...)

--# The `!' after `I#' indicates that this is a *saturated*
--# application of the `I#' data constructor (i.e., not partially
--# applied).

          skip2.t3Ja :: _4 -> [_4]
          _NI_
          skip2.t3Ja =
              \ m.r1H4 ->
                  let { ds.d4QQ :: [_4]
                        _NI_
                        ds.d4QQ =
                    let {
                      ds.d4QY :: _4
                      _NI_
                      ds.d4QY = +.t4Hg m.r1H4 lit.t4Hb
                    } in  skip2.t3Ja ds.d4QY
                  } in
                  :! _4 m.r1H4 ds.d4QQ

          {- end CoRec -}
        } in  skip2.t3Ja

(“It's just a simple functional language” is an unregisterised trademark of Peyton Jones Enterprises, plc.)

3.13.6. Command line options in source files

Sometimes it is useful to make the connection between a source file and the command-line options it requires quite tight. For instance, if a (Glasgow) Haskell source file uses casms, the C back-end often needs to be told about which header files to include. Rather than maintaining the list of files the source depends on in a Makefile (using the -#include command-line option), it is possible to do this directly in the source file using the OPTIONS pragma :

{-# OPTIONS -#include "foo.h" #-}
module X where

...

OPTIONS pragmas are only looked for at the top of your source files, upto the first (non-literate,non-empty) line not containing OPTIONS. Multiple OPTIONS pragmas are recognised. Note that your command shell does not get to the source file options, they are just included literally in the array of command-line arguments the compiler driver maintains internally, so you'll be desperately disappointed if you try to glob etc. inside OPTIONS.

NOTE: the contents of OPTIONS are prepended to the command-line options, so you *do* have the ability to override OPTIONS settings via the command line.

It is not recommended to move all the contents of your Makefiles into your source files, but in some circumstances, the OPTIONS pragma is the Right Thing. (If you use -keep-hc-file-too and have OPTION flags in your module, the OPTIONS will get put into the generated .hc file).