6. The Makefile architecture

make is great if everything works—you type gmake install and lo! the right things get compiled and installed in the right places. Our goal is to make this happen often, but somehow it often doesn't; instead some weird error message eventually emerges from the bowels of a directory you didn't know existed.

The purpose of this section is to give you a road-map to help you figure out what is going right and what is going wrong.

6.1. A small project

To get started, let us look at the Makefile for an imaginary small fptools project, small. Each project in fptools has its own directory in FPTOOLS_TOP, so the small project will have its own directory FPOOLS_TOP/small/. Inside the small/ directory there will be a Makefile, looking something like this:

#     Makefile for fptools project "small"

TOP = ..
include $(TOP)/mk/boilerplate.mk

SRCS = $(wildcard *.lhs) $(wildcard *.c)
HS_PROG = small

include $(TOP)/target.mk

This Makefile has three sections:

  1. The first section includes [1] a file of ``boilerplate'' code from the level above (which in this case will be FPTOOLS_TOP/mk/boilerplate.mk). As its name suggests, boilerplate.mk consists of a large quantity of standard Makefile code. We discuss this boilerplate in more detail in Section 6.4. Before the include statement, you must define the make variable TOP to be the directory containing the mk directory in which the boilerplate.mk file is. It is not OK to simply say
    include ../mk/boilerplate.mk  # NO NO NO
    Why? Because the boilerplate.mk file needs to know where it is, so that it can, in turn, include other files. (Unfortunately, when an included file does an include, the filename is treated relative to the directory in which gmake is being run, not the directory in which the included sits.) In general, every file foo.mk assumes that $(TOP)/mk/foo.mk refers to itself. It is up to the Makefile doing the include to ensure this is the case. Files intended for inclusion in other Makefiles are written to have the following property: after foo.mk is included, it leaves TOP containing the same value as it had just before the include statement. In our example, this invariant guarantees that the include for target.mk will look in the same directory as that for boilerplate.mk.

  2. The second section defines the following standard make variables: SRCS (the source files from which is to be built), and HS_PROG (the executable binary to be built). We will discuss in more detail what the ``standard variables'' are, and how they affect what happens, in Section 6.6. The definition for SRCS uses the useful GNU make construct $(wildcard $pat$), which expands to a list of all the files matching the pattern pat in the current directory. In this example, SRCS is set to the list of all the .lhs and .c files in the directory. (Let's suppose there is one of each, Foo.lhs and Baz.c.)

  3. The last section includes a second file of standard code, called target.mk. It contains the rules that tell gmake how to make the standard targets (Section 5.6). Why, you ask, can't this standard code be part of boilerplate.mk? Good question. We discuss the reason later, in Section 6.3. You do not have to include the target.mk file. Instead, you can write rules of your own for all the standard targets. Usually, though, you will find quite a big payoff from using the canned rules in target.mk; the price tag is that you have to understand what canned rules get enabled, and what they do (Section 6.6).

In our example Makefile, most of the work is done by the two included files. When you say gmake all, the following things happen:

All Makefiles should follow the above three-section format.

6.2. A larger project

Larger projects are usually structured into a number of sub-directories, each of which has its own Makefile. (In very large projects, this sub-structure might be iterated recursively, though that is rare.) To give you the idea, here's part of the directory structure for the (rather large) GHC project:

$(FPTOOLS_TOP)/ghc/
  Makefile
  mk/
    boilerplate.mk
    rules.mk
   docs/
    Makefile
    ...source files for documentation...
   driver/
    Makefile
    ...source files for driver...
   compiler/
    Makefile
    parser/...source files for parser...
    renamer/...source files for renamer...
    ...etc...

The sub-directories docs, driver, compiler, and so on, each contains a sub-component of GHC, and each has its own Makefile. There must also be a Makefile in $(FPTOOLS_TOP)/ghc. It does most of its work by recursively invoking gmake on the Makefiles in the sub-directories. We say that ghc/Makefile is a non-leaf Makefile, because it does little except organise its children, while the Makefiles in the sub-directories are all leaf Makefiles. (In principle the sub-directories might themselves contain a non-leaf Makefile and several sub-sub-directories, but that does not happen in GHC.)

The Makefile in ghc/compiler is considered a leaf Makefile even though the ghc/compiler has sub-directories, because these sub-directories do not themselves have Makefiles in them. They are just used to structure the collection of modules that make up GHC, but all are managed by the single Makefile in ghc/compiler.

You will notice that ghc/ also contains a directory ghc/mk/. It contains GHC-specific Makefile boilerplate code. More precisely:

So these two files are the place to look for GHC-wide customisation of the standard boilerplate.

6.3. Boilerplate architecture

Every Makefile includes a boilerplate.mk file at the top, and target.mk file at the bottom. In this section we discuss what is in these files, and why there have to be two of them. In general:

6.4. The main mk/boilerplate.mk file

If you look at $(FPTOOLS_TOP)/mk/boilerplate.mk you will find that it consists of the following sections, each held in a separate file:

config.mk

is the build configuration file we discussed at length in Section 5.3.

paths.mk

defines make variables for pathnames and file lists. In particular, it gives definitions for:

SRCS:

all source files in the current directory.

HS_SRCS:

all Haskell source files in the current directory. It is derived from $(SRCS), so if you override SRCS with a new value HS_SRCS will follow suit.

C_SRCS:

similarly for C source files.

HS_OBJS:

the .o files derived from $(HS_SRCS).

C_OBJS:

similarly for $(C_SRCS).

OBJS:

the concatenation of $(HS_OBJS) and $(C_OBJS).

Any or all of these definitions can easily be overriden by giving new definitions in your Makefile. For example, if there are things in the current directory that look like source files but aren't, then you'll need to set SRCS manually in your Makefile. The other definitions will then work from this new definition.

What, exactly, does paths.mk consider a ``source file'' to be? It's based on the file's suffix (e.g. .hs, .lhs, .c, .lc, etc), but this is the kind of detail that changes, so rather than enumerate the source suffices here the best thing to do is to look in paths.mk.

opts.mk

defines make variables for option strings to pass to each program. For example, it defines HC_OPTS, the option strings to pass to the Haskell compiler. See Section 6.5.

suffix.mk

defines standard pattern rules—see Section 6.5.

Any of the variables and pattern rules defined by the boilerplate file can easily be overridden in any particular Makefile, because the boilerplate include comes first. Definitions after this include directive simply override the default ones in boilerplate.mk.

6.5. Pattern rules and options

The file suffix.mk defines standard pattern rules that say how to build one kind of file from another, for example, how to build a .o file from a .c file. (GNU make's pattern rules are more powerful and easier to use than Unix make's suffix rules.)

Almost all the rules look something like this:

%.o : %.c
      $(RM) $@
      $(CC) $(CC_OPTS) -c $< -o $@

Here's how to understand the rule. It says that something.o (say Foo.o) can be built from something.c (Foo.c), by invoking the C compiler (path name held in $(CC)), passing to it the options $(CC_OPTS) and the rule's dependent file of the rule $< (Foo.c in this case), and putting the result in the rule's target $@ (Foo.o in this case).

Every program is held in a make variable defined in mk/config.mk—look in mk/config.mk for the complete list. One important one is the Haskell compiler, which is called $(HC).

Every program's options are are held in a make variables called <prog>_OPTS. the <prog>_OPTS variables are defined in mk/opts.mk. Almost all of them are defined like this:

CC_OPTS = $(SRC_CC_OPTS) $(WAY$(_way)_CC_OPTS) $($*_CC_OPTS) $(EXTRA_CC_OPTS)

The four variables from which CC_OPTS is built have the following meaning:

SRC_CC_OPTS:

options passed to all C compilations.

WAY_<way>_CC_OPTS:

options passed to C compilations for way <way>. For example, WAY_mp_CC_OPTS gives options to pass to the C compiler when compiling way mp. The variable WAY_CC_OPTS holds options to pass to the C compiler when compiling the standard way. (Section 6.8 dicusses multi-way compilation.)

<module>_CC_OPTS:

options to pass to the C compiler that are specific to module <module>. For example, SMap_CC_OPTS gives the specific options to pass to the C compiler when compiling SMap.c.

EXTRA_CC_OPTS:

extra options to pass to all C compilations. This is intended for command line use, thus:

gmake libHS.a EXTRA_CC_OPTS="-v"

6.6. The main mk/target.mk file

target.mk contains canned rules for all the standard targets described in Section 5.6. It is complicated by the fact that you don't want all of these rules to be active in every Makefile. Rather than have a plethora of tiny files which you can include selectively, there is a single file, target.mk, which selectively includes rules based on whether you have defined certain variables in your Makefile. This section explains what rules you get, what variables control them, and what the rules do. Hopefully, you will also get enough of an idea of what is supposed to happen that you can read and understand any weird special cases yourself.

HS_PROG.

If HS_PROG is defined, you get rules with the following targets:

HS_PROG

itself. This rule links $(OBJS) with the Haskell runtime system to get an executable called $(HS_PROG).

install

installs $(HS_PROG) in $(bindir).

C_PROG

is similar to HS_PROG, except that the link step links $(C_OBJS) with the C runtime system.

LIBRARY

is similar to HS_PROG, except that it links $(LIB_OBJS) to make the library archive $(LIBRARY), and install installs it in $(libdir).

LIB_DATA

LIB_EXEC

HS_SRCS, C_SRCS.

If HS_SRCS is defined and non-empty, a rule for the target depend is included, which generates dependency information for Haskell programs. Similarly for C_SRCS.

All of these rules are ``double-colon'' rules, thus

install :: $(HS_PROG)
      ...how to install it...

GNU make treats double-colon rules as separate entities. If there are several double-colon rules for the same target it takes each in turn and fires it if its dependencies say to do so. This means that you can, for example, define both HS_PROG and LIBRARY, which will generate two rules for install. When you type gmake install both rules will be fired, and both the program and the library will be installed, just as you wanted.

6.7. Recursion

In leaf Makefiles the variable SUBDIRS is undefined. In non-leaf Makefiles, SUBDIRS is set to the list of sub-directories that contain subordinate Makefiles. It is up to you to set SUBDIRS in the Makefile. There is no automation here—SUBDIRS is too important to automate.

When SUBDIRS is defined, target.mk includes a rather neat rule for the standard targets (Section 5.6 that simply invokes make recursively in each of the sub-directories.

These recursive invocations are guaranteed to occur in the order in which the list of directories is specified in SUBDIRS. This guarantee can be important. For example, when you say gmake boot it can be important that the recursive invocation of make boot is done in one sub-directory (the include files, say) before another (the source files). Generally, put the most independent sub-directory first, and the most dependent last.

6.8. Way management

We sometimes want to build essentially the same system in several different ``ways''. For example, we want to build GHC's Prelude libraries with and without profiling, with and without concurrency, and so on, so that there is an appropriately-built library archive to link with when the user compiles his program. It would be possible to have a completely separate build tree for each such ``way'', but it would be horribly bureaucratic, especially since often only parts of the build tree need to be constructed in multiple ways.

Instead, the target.mk contains some clever magic to allow you to build several versions of a system; and to control locally how many versions are built and how they differ. This section explains the magic.

The files for a particular way are distinguished by munging the suffix. The ``normal way'' is always built, and its files have the standard suffices .o, .hi, and so on. In addition, you can build one or more extra ways, each distinguished by a way tag. The object files and interface files for one of these extra ways are distinguished by their suffix. For example, way mp has files .mp_o and .mp_hi. Library archives have their way tag the other side of the dot, for boring reasons; thus, libHS_mp.a.

A make variable called way holds the current way tag. way is only ever set on the command line of a recursive invocation of gmake. It is never set inside a Makefile. So it is a global constant for any one invocation of gmake. Two other make variables, way_ and _way are immediately derived from $(way) and never altered. If way is not set, then neither are way_ and _way, and the invocation of make will build the ``normal way''. If way is set, then the other two variables are set in sympathy. For example, if $(way) is ``mp'', then way_ is set to ``mp_'' and _way is set to ``_mp''. These three variables are then used when constructing file names.

So how does make ever get recursively invoked with way set? There are two ways in which this happens:

6.9. When the canned rule isn't right

Sometimes the canned rule just doesn't do the right thing. For example, in the nofib suite we want the link step to print out timing information. The thing to do here is not to define HS_PROG or C_PROG, and instead define a special purpose rule in your own Makefile. By using different variable names you will avoid the canned rules being included, and conflicting with yours.

Notes

[1]

One of the most important features of GNU make that we use is the ability for a Makefile to include another named file, very like cpp's #include directive.