GHC supports a small extension to the syntax of module
names: a module name is allowed to contain a dot
‘.’
. This is also known as the
“hierarchical module namespace” extension, because
it extends the normally flat Haskell module namespace into a
more flexible hierarchy of modules.
This extension has very little impact on the language
itself; modules names are always fully
qualified, so you can just think of the fully qualified module
name as “the module name”. In particular, this
means that the full module name must be given after the
module
keyword at the beginning of the
module; for example, the module A.B.C
must
begin
module A.B.C
It is a common strategy to use the as
keyword to save some typing when using qualified names with
hierarchical modules. For example:
import qualified Control.Monad.ST.Strict as ST
For details on how GHC searches for source and interface files in the presence of hierarchical modules, see Section 5.6.3, “The search path”.
GHC comes with a large collection of libraries arranged hierarchically; see the accompanying library documentation. More libraries to install are available from HackageDB.
The discussion that follows is an abbreviated version of Simon Peyton Jones's original proposal. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.)
Suppose we have an abstract data type of finite maps, with a lookup operation:
lookup :: FiniteMap -> Int -> Maybe Int
The lookup returns Nothing
if the supplied key is not in the domain of the mapping, and (Just v)
otherwise,
where v
is the value that the key maps to. Now consider the following definition:
clunky env var1 var2 | ok1 && ok2 = val1 + val2 | otherwise = var1 + var2 where m1 = lookup env var1 m2 = lookup env var2 ok1 = maybeToBool m1 ok2 = maybeToBool m2 val1 = expectJust m1 val2 = expectJust m2
The auxiliary functions are
maybeToBool :: Maybe a -> Bool maybeToBool (Just x) = True maybeToBool Nothing = False expectJust :: Maybe a -> a expectJust (Just x) = x expectJust Nothing = error "Unexpected Nothing"
What is clunky
doing? The guard ok1 &&
ok2
checks that both lookups succeed, using
maybeToBool
to convert the Maybe
types to booleans. The (lazily evaluated) expectJust
calls extract the values from the results of the lookups, and binds the
returned values to val1
and val2
respectively. If either lookup fails, then clunky takes the
otherwise
case and returns the sum of its arguments.
This is certainly legal Haskell, but it is a tremendously verbose and un-obvious way to achieve the desired effect. Arguably, a more direct way to write clunky would be to use case expressions:
clunky env var1 var2 = case lookup env var1 of Nothing -> fail Just val1 -> case lookup env var2 of Nothing -> fail Just val2 -> val1 + val2 where fail = var1 + var2
This is a bit shorter, but hardly better. Of course, we can rewrite any set
of pattern-matching, guarded equations as case expressions; that is
precisely what the compiler does when compiling equations! The reason that
Haskell provides guarded equations is because they allow us to write down
the cases we want to consider, one at a time, independently of each other.
This structure is hidden in the case version. Two of the right-hand sides
are really the same (fail
), and the whole expression
tends to become more and more indented.
Here is how I would write clunky:
clunky env var1 var2 | Just val1 <- lookup env var1 , Just val2 <- lookup env var2 = val1 + val2 ...other equations for clunky...
The semantics should be clear enough. The qualifiers are matched in order.
For a <-
qualifier, which I call a pattern guard, the
right hand side is evaluated and matched against the pattern on the left.
If the match fails then the whole guard fails and the next equation is
tried. If it succeeds, then the appropriate binding takes place, and the
next qualifier is matched, in the augmented environment. Unlike list
comprehensions, however, the type of the expression to the right of the
<-
is the same as the type of the pattern to its
left. The bindings introduced by pattern guards scope over all the
remaining guard qualifiers, and over the right hand side of the equation.
Just as with list comprehensions, boolean expressions can be freely mixed with among the pattern guards. For example:
f x | [y] <- x , y > 3 , Just z <- h y = ...
Haskell's current guards therefore emerge as a special case, in which the qualifier list has just one element, a boolean expression.
The recursive do-notation (also known as mdo-notation) is implemented as described in A recursive do for Haskell, by Levent Erkok, John Launchbury, Haskell Workshop 2002, pages: 29-37. Pittsburgh, Pennsylvania. This paper is essential reading for anyone making non-trivial use of mdo-notation, and we do not repeat it here.
The do-notation of Haskell does not allow recursive bindings, that is, the variables bound in a do-expression are visible only in the textually following code block. Compare this to a let-expression, where bound variables are visible in the entire binding group. It turns out that several applications can benefit from recursive bindings in the do-notation, and this extension provides the necessary syntactic support.
Here is a simple (yet contrived) example:
import Control.Monad.Fix justOnes = mdo xs <- Just (1:xs) return xs
As you can guess justOnes
will evaluate to Just [1,1,1,...
.
The Control.Monad.Fix library introduces the MonadFix
class. It's definition is:
class Monad m => MonadFix m where mfix :: (a -> m a) -> m a
The function mfix
dictates how the required recursion operation should be performed. For example,
justOnes
desugars as follows:
justOnes = mfix (\xs' -> do { xs <- Just (1:xs'); return xs }
For full details of the way in which mdo is typechecked and desugared, see the paper A recursive do for Haskell. In particular, GHC implements the segmentation technique described in Section 3.2 of the paper.
If recursive bindings are required for a monad,
then that monad must be declared an instance of the MonadFix
class.
The following instances of MonadFix
are automatically provided: List, Maybe, IO.
Furthermore, the Control.Monad.ST and Control.Monad.ST.Lazy modules provide the instances of the MonadFix class
for Haskell's internal state monad (strict and lazy, respectively).
Here are some important points in using the recursive-do notation:
The recursive version of the do-notation uses the keyword mdo
(rather
than do
).
It is enabled with the flag -XRecursiveDo
, which is in turn implied by
-fglasgow-exts
.
Unlike ordinary do-notation, but like let
and where
bindings,
name shadowing is not allowed; that is, all the names bound in a single mdo
must
be distinct (Section 3.3 of the paper).
Variables bound by a let
statement in an mdo
are monomorphic in the mdo
(Section 3.1 of the paper). However
GHC breaks the mdo
into segments to enhance polymorphism,
and improve termination (Section 3.2 of the paper).
The web page: http://www.cse.ogi.edu/PacSoft/projects/rmb/ contains up to date information on recursive monadic bindings.
Historical note: The old implementation of the mdo-notation (and most
of the existing documents) used the name
MonadRec
for the class and the corresponding library.
This name is not supported by GHC.
Parallel list comprehensions are a natural extension to list comprehensions. List comprehensions can be thought of as a nice syntax for writing maps and filters. Parallel comprehensions extend this to include the zipWith family.
A parallel list comprehension has multiple independent branches of qualifier lists, each separated by a `|' symbol. For example, the following zips together two lists:
[ (x, y) | x <- xs | y <- ys ]
The behavior of parallel list comprehensions follows that of zip, in that the resulting list will have the same length as the shortest branch.
We can define parallel list comprehensions by translation to regular comprehensions. Here's the basic idea:
Given a parallel comprehension of the form:
[ e | p1 <- e11, p2 <- e12, ... | q1 <- e21, q2 <- e22, ... ... ]
This will be translated to:
[ e | ((p1,p2), (q1,q2), ...) <- zipN [(p1,p2) | p1 <- e11, p2 <- e12, ...] [(q1,q2) | q1 <- e21, q2 <- e22, ...] ... ]
where `zipN' is the appropriate zip for the given number of branches.
GHC allows most kinds of built-in syntax to be rebound by
the user, to facilitate replacing the Prelude
with a home-grown version, for example.
You may want to define your own numeric class
hierarchy. It completely defeats that purpose if the
literal "1" means "Prelude.fromInteger
1
", which is what the Haskell Report specifies.
So the -XNoImplicitPrelude
flag causes
the following pieces of built-in syntax to refer to
whatever is in scope, not the Prelude
versions:
An integer literal 368
means
"fromInteger (368::Integer)
", rather than
"Prelude.fromInteger (368::Integer)
".
Fractional literals are handed in just the same way,
except that the translation is
fromRational (3.68::Rational)
.
The equality test in an overloaded numeric pattern
uses whatever (==)
is in scope.
The subtraction operation, and the
greater-than-or-equal test, in n+k
patterns
use whatever (-)
and (>=)
are in scope.
Negation (e.g. "- (f x)
")
means "negate (f x)
", both in numeric
patterns, and expressions.
"Do" notation is translated using whatever
functions (>>=)
,
(>>)
, and fail
,
are in scope (not the Prelude
versions). List comprehensions, mdo (Section 8.3.3, “The recursive do-notation
”), and parallel array
comprehensions, are unaffected.
Arrow
notation (see Section 8.9, “Arrow notation
”)
uses whatever arr
,
(>>>)
, first
,
app
, (|||)
and
loop
functions are in scope. But unlike the
other constructs, the types of these functions must match the
Prelude types very closely. Details are in flux; if you want
to use this, ask!
In all cases (apart from arrow notation), the static semantics should be that of the desugared form,
even if that is a little unexpected. For example, the
static semantics of the literal 368
is exactly that of fromInteger (368::Integer)
; it's fine for
fromInteger
to have any of the types:
fromInteger :: Integer -> Integer fromInteger :: forall a. Foo a => Integer -> a fromInteger :: Num a => a -> Integer fromInteger :: Integer -> Bool -> Bool
Be warned: this is an experimental facility, with
fewer checks than usual. Use -dcore-lint
to typecheck the desugared program. If Core Lint is happy
you should be all right.
GHC allows a small extension to the syntax of left operator sections, which allows you to define postfix operators. The extension is this: the left section
(e !)
is equivalent (from the point of view of both type checking and execution) to the expression
((!) e)
(for any expression e
and operator (!)
.
The strict Haskell 98 interpretation is that the section is equivalent to
(\y -> (!) e y)
That is, the operator must be a function of two arguments. GHC allows it to take only one argument, and that in turn allows you to write the function postfix.
Since this extension goes beyond Haskell 98, it should really be enabled by a flag; but in fact it is enabled all the time. (No Haskell 98 programs change their behaviour, of course.)
The extension does not extend to the left-hand side of function definitions; you must define such a function in prefix form.
In record construction and record pattern matching it is entirely unambiguous which field is referred to, even if there are two different data types in scope with a common field name. For example:
module M where data S = MkS { x :: Int, y :: Bool } module Foo where import M data T = MkT { x :: Int } ok1 (MkS { x = n }) = n+1 -- Unambiguous ok2 n = MkT { x = n+1 } -- Unambiguous bad1 k = k { x = 3 } -- Ambiguous bad2 k = x k -- Ambiguous
Even though there are two x
's in scope,
it is clear that the x
in the pattern in the
definition of ok1
can only mean the field
x
from type S
. Similarly for
the function ok2
. However, in the record update
in bad1
and the record selection in bad2
it is not clear which of the two types is intended.
Haskell 98 regards all four as ambiguous, but with the
-fdisambiguate-record-fields
flag, GHC will accept
the former two. The rules are precisely the same as those for instance
declarations in Haskell 98, where the method names on the left-hand side
of the method bindings in an instance declaration refer unambiguously
to the method of that class (provided they are in scope at all), even
if there are other variables in scope with the same name.
This reduces the clutter of qualified names when you import two
records from different modules that use the same field name.