Table of Contents
Andrew Tolmach, Tim Chevalier ({apt,tjc}@cs.pdx.edu) and The GHC Team
Abstract
This document provides a precise definition for the GHC Core language, so that it can be used to communicate between GHC and new stand-alone compilation tools such as back-ends or optimizers.[15] The definition includes a formal grammar and an informal semantics. An executable typechecker and interpreter (in Haskell), which formally embody the static and dynamic semantics, are available separately.
The Glasgow Haskell Compiler (GHC) uses an intermediate language, called “Core,” as its internal program representation within the compiler’s simplification phase. Core resembles a subset of Haskell, but with explicit type annotations in the style of the polymorphic lambda calculus (Fω).
GHC’s front-end translates full Haskell 98 (plus some extensions) into Core. The GHC optimizer then repeatedly transforms Core programs while preserving their meaning. A “Core Lint” pass in GHC typechecks Core in between transformation passes (at least when the user enables linting by setting a compiler flag), verifying that transformations preserve type-correctness. Finally, GHC’s back-end translates Core into STG-machine code [stg-machine] and then into C or native code.
Two existing papers discuss the original rationale for the design and use of Core [ghc-inliner,comp-by-trans-scp], although the (two different) idealized versions of Core described therein differ in significant ways from the actual Core language in current GHC. In particular, with the advent of GHC support for generalized algebraic datatypes (GADTs) [gadts] Core was extended beyond its previous Fω-style incarnation to support type equality constraints and safe coercions, and is now based on a system known as FC [system-fc].
Researchers interested in writing just part of a Haskell compiler, such as a new back-end or a new optimizer pass, might like to use GHC to provide the other parts of the compiler. For example, they might like to use GHC’s front-end to parse, desugar, and type-check source Haskell, then feeding the resulting code to their own back-end tool. As another example, they might like to use Core as the target language for a front-end compiler of their own design, feeding externally synthesized Core into GHC in order to take advantage of GHC’s optimizer, code generator, and run-time system. Without external Core, there are two ways for compiler writers to do this: they can link their code into the GHC executable, which is an arduous process, or they can use the GHC API [ghc-api] to do the same task more cleanly. Both ways require new code to be written in Haskell.
We present a precisely specified external format for Core files. The external format is text-based and human-readable, to promote interoperability and ease of use. We hope this format will make it easier for external developers to use GHC in a modular way.
It has long been true that GHC prints an ad-hoc textual representation of Core if you set certain compiler flags. But this representation is intended to be read by people who are debugging the compiler, not by other programs. Making Core into a machine-readable, bi-directional communication format requires:
The first two facilities will let developers couple GHC’s front-end (parser, type-checker, desugarer), and optionally its optimizer, with new back-end tools. The last facility will let developers write new Core-to-Core transformations as an external tool and integrate them into GHC. It will also allow new front-ends to generate Core that can be fed into GHC’s optimizer or back-end.
However, because there are many (undocumented) idiosyncracies in the way GHC produces Core from source Haskell, it will be hard for an external tool to produce Core that can be integrated with GHC-produced Core (e.g., for the Prelude), and we don’t aim to support this. Indeed, for the time being, we aim to support only the first two facilities and not the third: we define and implement Core as an external format that GHC can use to communicate with external back-end tools, and defer the larger task of extending GHC to support reading this external format back in.
This document addresses the first requirement, a formal Core definition, by proposing a formal grammar for an external representation of Core, and an informal semantics.
GHC supports many type system extensions; the External Core printer built into GHC only supports some of them. However, External Core should be capable of representing any Haskell 98 program, and may be able to represent programs that require certain type system extensions as well. If a program uses unsupported features, GHC may fail to compile it to Core when the -fext-core flag is set, or GHC may successfully compile it to Core, but the external tools will not be able to typecheck or interpret it.
Formal static and dynamic semantics in the form of an executable
typechecker and interpreter are available separately in the GHC
source tree
[16]
under utils/ext-core
.
References
[ghc-user-guide] “The Glorious Glasgow Haskell Compilation System User's Guide, Version 6.8.2”. 2008. http://www.haskell.org/ghc/docs/latest/html/users_guide/index.html.
[ghc-fc-commentary] “System FC: equality constraints and coercions”. 2006. http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/FC.
[ghc-api] “Using GHC as a library”. 2007. http://haskell.org/haskellwiki/GHC/As_a_library.
[haskell98] “Haskell 98 Language and Libraries: The Revised Report”. Cambridge University Press. Cambridge> UK . 2003.
[system-fc] “System F with type equality coercions”. ACM. New York NY USA . 53-66. 2007. http://portal.acm.org/citation.cfm?id=1190324.
[gadts] “Simple unification-based type inference for GADTs”. ACM. New York NY USA . 50-61. 2006. http://research.microsoft.com/Users/simonpj/papers/gadt/index.htm.
[Launchbury94] “Lazy Functional State Threads”. 24-35. 1994. http://citeseer.ist.psu.edu/article/launchbury93lazy.html.
[pj:unboxed] “Unboxed Values as First Class Citizens in a Non-strict Functional Language”. Springer-Verlag LNCS523. Cambridge Massachussetts USA . 636-666. 1991, August 26-28. http://citeseer.ist.psu.edu/jones91unboxed.html.
[ghc-inliner] “Secrets of the Glasgow Haskell Compiler inliner”. 1999. Paris France . http://research.microsoft.com/Users/simonpj/Papers/inlining/inline.pdf.
[comp-by-trans-scp] “A transformation-based optimiser for Haskell”. Science of Computer Programming. 1-3. 3-47. 1998. http://citeseer.ist.psu.edu/peytonjones98transformationbased.html.
[stg-machine] “Implementing Lazy Functional Languages on Stock Hardware: The Spineless Tagless G-Machine”. Journal of Functional Programming. 2. 127-202. 1992. http://citeseer.ist.psu.edu/peytonjones92implementing.html.
[launchbury93natural] “A Natural Semantics for Lazy Evaluation”. 144-154. Charleston South Carolina . 1993. citeseer.ist.psu.edu/launchbury93natural.html.
[ghcprim] “Library documentation: GHC.Prim”. 2008. http://www.haskell.org/ghc/docs/latest/html/libraries/base/GHC-Prim.html.
[15] This is a draft document, which attempts to describe GHC’s current behavior as precisely as possible. Working notes scattered throughout indicate areas where further work is needed. Constructive comments are very welcome, both on the presentation, and on ways in which GHC could be improved in order to simplify the Core story.
Support for generating external Core (post-optimization) was
originally introduced in GHC 5.02. The definition of external Core in
this document reflects the version of external Core generated by the
HEAD (unstable) branch of GHC as of May 3, 2008 (version 6.9), using
the compiler flag -fext-core
. We expect that GHC 6.10 will be
consistent with this definition.