The language
extension -XUnicodeSyntax
enables Unicode characters to be used to stand for certain ASCII
character sequences. The following alternatives are provided:
ASCII | Unicode alternative | Code point | Name |
---|---|---|---|
:: | :: | 0x2237 | PROPORTION |
=> | ⇒ | 0x21D2 | RIGHTWARDS DOUBLE ARROW |
forall | ∀ | 0x2200 | FOR ALL |
-> | → | 0x2192 | RIGHTWARDS ARROW |
<- | ← | 0x2190 | LEFTWARDS ARROW |
-< | ↢ | 0x2919 | LEFTWARDS ARROW-TAIL |
>- | ↣ | 0x291A | RIGHTWARDS ARROW-TAIL |
-<< | 0x291B | LEFTWARDS DOUBLE ARROW-TAIL | |
>>- | 0x291C | RIGHTWARDS DOUBLE ARROW-TAIL | |
* | ★ | 0x2605 | BLACK STAR |
The language extension -XMagicHash
allows "#" as a
postfix modifier to identifiers. Thus, "x#" is a valid variable, and "T#" is
a valid type constructor or data constructor.
The hash sign does not change semantics at all. We tend to use variable
names ending in "#" for unboxed values or types (e.g. Int#
),
but there is no requirement to do so; they are just plain ordinary variables.
Nor does the -XMagicHash
extension bring anything into scope.
For example, to bring Int#
into scope you must
import GHC.Prim
(see Section 7.2, “Unboxed types and primitive operations”);
the -XMagicHash
extension
then allows you to refer to the Int#
that is now in scope. Note that with this option, the meaning of x#y = 0
is changed: it defines a function x#
taking a single argument y
;
to define the operator #
, put a space: x # y = 0
.
The -XMagicHash
also enables some new forms of literals (see Section 7.2.1, “Unboxed types”):
'x'#
has type Char#
"foo"#
has type Addr#
3#
has type Int#
. In general,
any Haskell integer lexeme followed by a #
is an Int#
literal, e.g.
-0x3A#
as well as 32#
.
3##
has type Word#
. In general,
any non-negative Haskell integer lexeme followed by ##
is a Word#
.
3.2#
has type Float#
.
3.2##
has type Double#
The literal -123
is, according to
Haskell98 and Haskell 2010, desugared as
negate (fromInteger 123)
.
The language extension -XNegativeLiterals
means that it is instead desugared as
fromInteger (-123)
.
This can make a difference when the positive and negative range of
a numeric data type don't match up. For example,
in 8-bit arithmetic -128 is representable, but +128 is not.
So negate (fromInteger 128)
will elicit an
unexpected integer-literal-overflow message.
Haskell 2010 and Haskell 98 define floating literals with
the syntax 1.2e6
. These literals have the
type Fractional a => a
.
The language extension -XNumDecimals
allows
you to also use the floating literal syntax for instances of
Integral
, and have values like
(1.2e6 :: Num a => a)
GHC supports a small extension to the syntax of module
names: a module name is allowed to contain a dot
‘.’
. This is also known as the
“hierarchical module namespace” extension, because
it extends the normally flat Haskell module namespace into a
more flexible hierarchy of modules.
This extension has very little impact on the language
itself; modules names are always fully
qualified, so you can just think of the fully qualified module
name as “the module name”. In particular, this
means that the full module name must be given after the
module
keyword at the beginning of the
module; for example, the module A.B.C
must
begin
module A.B.C
It is a common strategy to use the as
keyword to save some typing when using qualified names with
hierarchical modules. For example:
import qualified Control.Monad.ST.Strict as ST
For details on how GHC searches for source and interface files in the presence of hierarchical modules, see Section 4.7.3, “The search path”.
GHC comes with a large collection of libraries arranged hierarchically; see the accompanying library documentation. More libraries to install are available from HackageDB.
The discussion that follows is an abbreviated version of Simon Peyton Jones's original proposal. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.)
Suppose we have an abstract data type of finite maps, with a lookup operation:
lookup :: FiniteMap -> Int -> Maybe Int
The lookup returns Nothing
if the supplied key is not in the domain of the mapping, and (Just v)
otherwise,
where v
is the value that the key maps to. Now consider the following definition:
clunky env var1 var2 | ok1 && ok2 = val1 + val2 | otherwise = var1 + var2 where m1 = lookup env var1 m2 = lookup env var2 ok1 = maybeToBool m1 ok2 = maybeToBool m2 val1 = expectJust m1 val2 = expectJust m2
The auxiliary functions are
maybeToBool :: Maybe a -> Bool maybeToBool (Just x) = True maybeToBool Nothing = False expectJust :: Maybe a -> a expectJust (Just x) = x expectJust Nothing = error "Unexpected Nothing"
What is clunky
doing? The guard ok1 &&
ok2
checks that both lookups succeed, using
maybeToBool
to convert the Maybe
types to booleans. The (lazily evaluated) expectJust
calls extract the values from the results of the lookups, and binds the
returned values to val1
and val2
respectively. If either lookup fails, then clunky takes the
otherwise
case and returns the sum of its arguments.
This is certainly legal Haskell, but it is a tremendously verbose and un-obvious way to achieve the desired effect. Arguably, a more direct way to write clunky would be to use case expressions:
clunky env var1 var2 = case lookup env var1 of Nothing -> fail Just val1 -> case lookup env var2 of Nothing -> fail Just val2 -> val1 + val2 where fail = var1 + var2
This is a bit shorter, but hardly better. Of course, we can rewrite any set
of pattern-matching, guarded equations as case expressions; that is
precisely what the compiler does when compiling equations! The reason that
Haskell provides guarded equations is because they allow us to write down
the cases we want to consider, one at a time, independently of each other.
This structure is hidden in the case version. Two of the right-hand sides
are really the same (fail
), and the whole expression
tends to become more and more indented.
Here is how I would write clunky:
clunky env var1 var2 | Just val1 <- lookup env var1 , Just val2 <- lookup env var2 = val1 + val2 ...other equations for clunky...
The semantics should be clear enough. The qualifiers are matched in order.
For a <-
qualifier, which I call a pattern guard, the
right hand side is evaluated and matched against the pattern on the left.
If the match fails then the whole guard fails and the next equation is
tried. If it succeeds, then the appropriate binding takes place, and the
next qualifier is matched, in the augmented environment. Unlike list
comprehensions, however, the type of the expression to the right of the
<-
is the same as the type of the pattern to its
left. The bindings introduced by pattern guards scope over all the
remaining guard qualifiers, and over the right hand side of the equation.
Just as with list comprehensions, boolean expressions can be freely mixed with among the pattern guards. For example:
f x | [y] <- x , y > 3 , Just z <- h y = ...
Haskell's current guards therefore emerge as a special case, in which the qualifier list has just one element, a boolean expression.
View patterns are enabled by the flag -XViewPatterns
.
More information and examples of view patterns can be found on the
Wiki
page.
View patterns are somewhat like pattern guards that can be nested inside of other patterns. They are a convenient way of pattern-matching against values of abstract types. For example, in a programming language implementation, we might represent the syntax of the types of the language as follows:
type Typ data TypView = Unit | Arrow Typ Typ view :: Typ -> TypView -- additional operations for constructing Typ's ...
The representation of Typ is held abstract, permitting implementations to use a fancy representation (e.g., hash-consing to manage sharing). Without view patterns, using this signature a little inconvenient:
size :: Typ -> Integer size t = case view t of Unit -> 1 Arrow t1 t2 -> size t1 + size t2
It is necessary to iterate the case, rather than using an equational
function definition. And the situation is even worse when the matching
against t
is buried deep inside another pattern.
View patterns permit calling the view function inside the pattern and matching against the result:
size (view -> Unit) = 1 size (view -> Arrow t1 t2) = size t1 + size t2
That is, we add a new form of pattern, written
expression
->
pattern
that means "apply the expression to
whatever we're trying to match against, and then match the result of
that application against the pattern". The expression can be any Haskell
expression of function type, and view patterns can be used wherever
patterns are used.
The semantics of a pattern (
exp
->
pat
)
are as follows:
The variables bound by the view pattern are the variables bound by
pat
.
Any variables in exp
are bound occurrences,
but variables bound "to the left" in a pattern are in scope. This
feature permits, for example, one argument to a function to be used in
the view of another argument. For example, the function
clunky
from Section 7.3.6, “Pattern guards” can be
written using view patterns as follows:
clunky env (lookup env -> Just val1) (lookup env -> Just val2) = val1 + val2 ...other equations for clunky...
More precisely, the scoping rules are:
In a single pattern, variables bound by patterns to the left of a view pattern expression are in scope. For example:
example :: Maybe ((String -> Integer,Integer), String) -> Bool example Just ((f,_), f -> 4) = True
Additionally, in function definitions, variables bound by matching earlier curried arguments may be used in view pattern expressions in later arguments:
example :: (String -> Integer) -> String -> Bool example f (f -> 4) = True
That is, the scoping is the same as it would be if the curried arguments were collected into a tuple.
In mutually recursive bindings, such as let
,
where
, or the top level, view patterns in one
declaration may not mention variables bound by other declarations. That
is, each declaration must be self-contained. For example, the following
program is not allowed:
let {(x -> y) = e1 ; (y -> x) = e2 } in x
(For some amplification on this design choice see Trac #4061.)
Typing: If exp
has type
T1
->
T2
and pat
matches
a T2
, then the whole view pattern matches a
T1
.
Matching: To the equations in Section 3.17.3 of the Haskell 98 Report, add the following:
case v of { (e -> p) -> e1 ; _ -> e2 } = case (e v) of { p -> e1 ; _ -> e2 }
That is, to match a variable v
against a pattern
(
exp
->
pat
)
, evaluate (
exp
v
)
and match the result against
pat
.
Efficiency: When the same view function is applied in
multiple branches of a function definition or a case expression (e.g.,
in size
above), GHC makes an attempt to collect these
applications into a single nested case expression, so that the view
function is only applied once. Pattern compilation in GHC follows the
matrix algorithm described in Chapter 4 of The
Implementation of Functional Programming Languages. When the
top rows of the first column of a matrix are all view patterns with the
"same" expression, these patterns are transformed into a single nested
case. This includes, for example, adjacent view patterns that line up
in a tuple, as in
f ((view -> A, p1), p2) = e1 f ((view -> B, p3), p4) = e2
The current notion of when two view pattern expressions are "the
same" is very restricted: it is not even full syntactic equality.
However, it does include variables, literals, applications, and tuples;
e.g., two instances of view ("hi", "there")
will be
collected. However, the current implementation does not compare up to
alpha-equivalence, so two instances of (x, view x ->
y)
will not be coalesced.
Pattern synonyms are enabled by the flag
-XPatternSynonyms
, which is required for both
defining them and using them. More information
and examples of view patterns can be found on the Wiki
page.
Pattern synonyms enable giving names to parametrized pattern schemes. They can also be thought of as abstract constructors that don't have a bearing on data representation. For example, in a programming language implementation, we might represent types of the language as follows:
data Type = App String [Type]
Here are some examples of using said representation.
Consider a few types of the Type
universe encoded
like this:
App "->" [t1, t2] -- t1 -> t2 App "Int" [] -- Int App "Maybe" [App "Int" []] -- Maybe Int
This representation is very generic in that no types are given special
treatment. However, some functions might need to handle some known
types specially, for example the following two functions collect all
argument types of (nested) arrow types, and recognize the
Int
type, respectively:
collectArgs :: Type -> [Type] collectArgs (App "->" [t1, t2]) = t1 : collectArgs t2 collectArgs _ = [] isInt :: Type -> Bool isInt (App "Int" []) = True isInt _ = False
Matching on App
directly is both hard to read and
error prone to write. And the situation is even worse when the
matching is nested:
isIntEndo :: Type -> Bool isIntEndo (App "->" [App "Int" [], App "Int" []]) = True isIntEndo _ = False
Pattern synonyms permit abstracting from the representation to expose
matchers that behave in a constructor-like manner with respect to
pattern matching. We can create pattern synonyms for the known types
we care about, without committing the representation to them (note
that these don't have to be defined in the same module as the
Type
type):
pattern Arrow t1 t2 = App "->" [t1, t2] pattern Int = App "Int" [] pattern Maybe t = App "Maybe" [t]
Which enables us to rewrite our functions in a much cleaner style:
collectArgs :: Type -> [Type] collectArgs (Arrow t1 t2) = t1 : collectArgs t2 collectArgs _ = [] isInt :: Type -> Bool isInt Int = True isInt _ = False isIntEndo :: Type -> Bool isIntEndo (Arrow Int Int) = True isIntEndo _ = False
Note that in this example, the pattern synonyms
Int
and Arrow
can also be used
as expressions (they are bidirectional). This
is not necessarily the case: unidirectional
pattern synonyms can also be declared with the following syntax:
pattern Head x <- x:xs
In this case, Head
x
cannot be used in expressions, only patterns, since it wouldn't
specify a value for the xs
on the
right-hand side.
The semantics of a unidirectional pattern synonym declaration and usage are as follows:
A pattern synonym declaration can be either unidirectional or bidirectional. The syntax for unidirectional pattern synonyms is:
pattern Name args <- pat
and the syntax for bidirectional pattern synonyms is:
pattern Name args = pat
Pattern synonym declarations can only occur in the top level of a module. In particular, they are not allowed as local definitions. Currently, they also don't work in GHCi, but that is a technical restriction that will be lifted in later versions.
The name of the pattern synonym itself is in the same namespace as
proper data constructors. Either prefix or infix syntax can be
used. In export/import specifications, you have to prefix pattern
names with the pattern
keyword, e.g.:
module Example (pattern Single) where pattern Single x = [x]
The variables in the left-hand side of the definition are bound by the pattern on the right-hand side. For bidirectional pattern synonyms, all the variables of the right-hand side must also occur on the left-hand side; also, wildcard patterns and view patterns are not allowed. For unidirectional pattern synonyms, there is no restriction on the right-hand side pattern.
Pattern synonyms cannot be defined recursively.
Given a pattern synonym definition of the form
pattern P var1 var2 ... varN <- pat
it is assigned a pattern type of the form
pattern CProv => P t1 t2 ... tN :: CReq => t
where CProv
and
CReq
are type contexts, and
t1
, t2
, ...,
tN
and t
are
types.
A pattern synonym of this type can be used in a pattern if the
instatiated (monomorphic) type satisfies the constraints of
CReq
. In this case, it extends the context
available in the right-hand side of the match with
CProv
, just like how an existentially-typed
data constructor can extend the context.
For example, in the following program:
{-# LANGUAGE PatternSynonyms, GADTs #-} module ShouldCompile where data T a where MkT :: (Show b) => a -> b -> T a pattern ExNumPat x = MkT 42 x
the pattern type of ExNumPat
is
pattern (Show b) => ExNumPat b :: (Num a, Eq a) => T a
and so can be used in a function definition like the following:
f :: (Num t, Eq t) => T t -> String f (ExNumPat x) = show x
For bidirectional pattern synonyms, uses as expressions have the type
(CProv, CReq) => t1 -> t2 -> ... -> tN -> t
So in the previous example, ExNumPat
,
when used in an expression, has type
ExNumPat :: (Show b, Num a, Eq a) => b -> T t
A pattern synonym occurrence in a pattern is evaluated by first
matching against the pattern synonym itself, and then on the argument
patterns. For example, in the following program, f
and f'
are equivalent:
pattern Pair x y <- [x, y] f (Pair True True) = True f _ = False f' [x, y] | True <- x, True <- y = True f' _ = False
Note that the strictness of f
differs from that
of g
defined below:
g [True, True] = True g _ = False *Main> f (False:undefined) *** Exception: Prelude.undefined *Main> g (False:undefined) False
Traditional record syntax, such as C {f = x}
, is enabled by default.
To disable it, you can use the -XNoTraditionalRecordSyntax
flag.
The do-notation of Haskell 98 does not allow recursive bindings, that is, the variables bound in a do-expression are visible only in the textually following code block. Compare this to a let-expression, where bound variables are visible in the entire binding group.
It turns out that such recursive bindings do indeed make sense for a variety of monads, but
not all. In particular, recursion in this sense requires a fixed-point operator for the underlying
monad, captured by the mfix
method of the MonadFix
class, defined in Control.Monad.Fix
as follows:
class Monad m => MonadFix m where mfix :: (a -> m a) -> m a
Haskell's
Maybe
, []
(list), ST
(both strict and lazy versions),
IO
, and many other monads have MonadFix
instances. On the negative
side, the continuation monad, with the signature (a -> r) -> r
, does not.
For monads that do belong to the MonadFix
class, GHC provides
an extended version of the do-notation that allows recursive bindings.
The -XRecursiveDo
(language pragma: RecursiveDo
)
provides the necessary syntactic support, introducing the keywords mdo
and
rec
for higher and lower levels of the notation respectively. Unlike
bindings in a do
expression, those introduced by mdo
and rec
are recursively defined, much like in an ordinary let-expression. Due to the new
keyword mdo
, we also call this notation the mdo-notation.
Here is a simple (albeit contrived) example:
{-# LANGUAGE RecursiveDo #-} justOnes = mdo { xs <- Just (1:xs) ; return (map negate xs) }
or equivalently
{-# LANGUAGE RecursiveDo #-} justOnes = do { rec { xs <- Just (1:xs) } ; return (map negate xs) }
As you can guess justOnes
will evaluate to Just [-1,-1,-1,...
.
GHC's implementation the mdo-notation closely follows the original translation as described in the paper
A recursive do for Haskell, which
in turn is based on the work Value Recursion
in Monadic Computations. Furthermore, GHC extends the syntax described in the former paper
with a lower level syntax flagged by the rec
keyword, as we describe next.
The flag -XRecursiveDo
also introduces a new keyword rec
, which wraps a
mutually-recursive group of monadic statements inside a do
expression, producing a single statement.
Similar to a let
statement inside a do
, variables bound in
the rec
are visible throughout the rec
group, and below it. For example, compare
do { a <- getChar do { a <- getChar ; let { r1 = f a r2 ; rec { r1 <- f a r2 ; ; r2 = g r1 } ; ; r2 <- g r1 } ; return (r1 ++ r2) } ; return (r1 ++ r2) }
In both cases, r1
and r2
are available both throughout
the let
or rec
block, and in the statements that follow it.
The difference is that let
is non-monadic, while rec
is monadic.
(In Haskell let
is really letrec
, of course.)
The semantics of rec
is fairly straightforward. Whenever GHC finds a rec
group, it will compute its set of bound variables, and will introduce an appropriate call
to the underlying monadic value-recursion operator mfix
, belonging to the
MonadFix
class. Here is an example:
rec { b <- f a c ===> (b,c) <- mfix (\ ~(b,c) -> do { b <- f a c ; c <- f b a } ; c <- f b a ; return (b,c) })
As usual, the meta-variables b
, c
etc., can be arbitrary patterns.
In general, the statement rec
is desugared to the statement
ss
vs
<- mfix (\ ~vs
-> do {ss
; returnvs
})
where vs
is a tuple of the variables bound by ss
.
Note in particular that the translation for a rec
block only involves wrapping a call
to mfix
: it performs no other analysis on the bindings. The latter is the task
for the mdo
notation, which is described next.
A rec
-block tells the compiler where precisely the recursive knot should be tied. It turns out that
the placement of the recursive knots can be rather delicate: in particular, we would like the knots to be wrapped
around as minimal groups as possible. This process is known as segmentation, and is described
in detail in Secton 3.2 of A recursive do for
Haskell. Segmentation improves polymorphism and reduces the size of the recursive knot. Most importantly, it avoids
unnecessary interference caused by a fundamental issue with the so-called right-shrinking
axiom for monadic recursion. In brief, most monads of interest (IO, strict state, etc.) do not
have recursion operators that satisfy this axiom, and thus not performing segmentation can cause unnecessary
interference, changing the termination behavior of the resulting translation.
(Details can be found in Sections 3.1 and 7.2.2 of
Value Recursion in Monadic Computations.)
The mdo
notation removes the burden of placing
explicit rec
blocks in the code. Unlike an
ordinary do
expression, in which variables bound by
statements are only in scope for later statements, variables bound in
an mdo
expression are in scope for all statements
of the expression. The compiler then automatically identifies minimal
mutually recursively dependent segments of statements, treating them as
if the user had wrapped a rec
qualifier around them.
The definition is syntactic:
A generator g
depends on a textually following generator
g'
, if
g'
defines a variable that
is used by g
, or
g'
textually appears between
g
and
g''
, where g
depends on g''
.
A segment of a given
mdo
-expression is a minimal sequence of generators
such that no generator of the sequence depends on an outside
generator. As a special case, although it is not a generator,
the final expression in an mdo
-expression is
considered to form a segment by itself.
Segments in this sense are related to strongly-connected components analysis, with the exception that bindings in a segment cannot be reordered and must be contiguous.
Here is an example mdo
-expression, and its translation to rec
blocks:
mdo { a <- getChar ===> do { a <- getChar ; b <- f a c ; rec { b <- f a c ; c <- f b a ; ; c <- f b a } ; z <- h a b ; z <- h a b ; d <- g d e ; rec { d <- g d e ; e <- g a z ; ; e <- g a z } ; putChar c } ; putChar c }
Note that a given mdo
expression can cause the creation of multiple rec
blocks.
If there are no recursive dependencies, mdo
will introduce no rec
blocks. In this
latter case an mdo
expression is precisely the same as a do
expression, as one
would expect.
In summary, given an mdo
expression, GHC first performs segmentation, introducing
rec
blocks to wrap over minimal recursive groups. Then, each resulting
rec
is desugared, using a call to Control.Monad.Fix.mfix
as described
in the previous section. The original mdo
-expression typechecks exactly when the desugared
version would do so.
Here are some other important points in using the recursive-do notation:
It is enabled with the flag -XRecursiveDo
, or the LANGUAGE RecursiveDo
pragma. (The same flag enables both mdo
-notation, and the use of rec
blocks inside do
expressions.)
rec
blocks can also be used inside mdo
-expressions, which will be
treated as a single statement. However, it is good style to either use mdo
or
rec
blocks in a single expression.
If recursive bindings are required for a monad, then that monad must be declared an instance of
the MonadFix
class.
The following instances of MonadFix
are automatically provided: List, Maybe, IO.
Furthermore, the Control.Monad.ST
and Control.Monad.ST.Lazy
modules provide the instances of the MonadFix
class for Haskell's internal
state monad (strict and lazy, respectively).
Like let
and where
bindings, name shadowing is not allowed within
an mdo
-expression or a rec
-block; that is, all the names bound in
a single rec
must be distinct. (GHC will complain if this is not the case.)
Parallel list comprehensions are a natural extension to list comprehensions. List comprehensions can be thought of as a nice syntax for writing maps and filters. Parallel comprehensions extend this to include the zipWith family.
A parallel list comprehension has multiple independent branches of qualifier lists, each separated by a `|' symbol. For example, the following zips together two lists:
[ (x, y) | x <- xs | y <- ys ]
The behaviour of parallel list comprehensions follows that of zip, in that the resulting list will have the same length as the shortest branch.
We can define parallel list comprehensions by translation to regular comprehensions. Here's the basic idea:
Given a parallel comprehension of the form:
[ e | p1 <- e11, p2 <- e12, ... | q1 <- e21, q2 <- e22, ... ... ]
This will be translated to:
[ e | ((p1,p2), (q1,q2), ...) <- zipN [(p1,p2) | p1 <- e11, p2 <- e12, ...] [(q1,q2) | q1 <- e21, q2 <- e22, ...] ... ]
where `zipN' is the appropriate zip for the given number of branches.
Generalised list comprehensions are a further enhancement to the list comprehension syntactic sugar to allow operations such as sorting and grouping which are familiar from SQL. They are fully described in the paper Comprehensive comprehensions: comprehensions with "order by" and "group by", except that the syntax we use differs slightly from the paper.
The extension is enabled with the flag -XTransformListComp
.
Here is an example:
employees = [ ("Simon", "MS", 80) , ("Erik", "MS", 100) , ("Phil", "Ed", 40) , ("Gordon", "Ed", 45) , ("Paul", "Yale", 60)] output = [ (the dept, sum salary) | (name, dept, salary) <- employees , then group by dept using groupWith , then sortWith by (sum salary) , then take 5 ]
In this example, the list output
would take on
the value:
[("Yale", 60), ("Ed", 85), ("MS", 180)]
There are three new keywords: group
, by
, and using
.
(The functions sortWith
and groupWith
are not keywords; they are ordinary
functions that are exported by GHC.Exts
.)
There are five new forms of comprehension qualifier,
all introduced by the (existing) keyword then
:
then fThis statement requires that
f
have the type
forall a. [a] -> [a]
. You can see an example of its use in the
motivating example, as this form is used to apply take 5
.
then f by e
This form is similar to the previous one, but allows you to create a function
which will be passed as the first argument to f. As a consequence f must have
the type forall a. (a -> t) -> [a] -> [a]
. As you can see
from the type, this function lets f "project out" some information
from the elements of the list it is transforming.
An example is shown in the opening example, where sortWith
is supplied with a function that lets it find out the sum salary
for any item in the list comprehension it transforms.
then group by e using f
This is the most general of the grouping-type statements. In this form,
f is required to have type forall a. (a -> t) -> [a] -> [[a]]
.
As with the then f by e
case above, the first argument
is a function supplied to f by the compiler which lets it compute e on every
element of the list being transformed. However, unlike the non-grouping case,
f additionally partitions the list into a number of sublists: this means that
at every point after this statement, binders occurring before it in the comprehension
refer to lists of possible values, not single values. To help understand
this, let's look at an example:
-- This works similarly to groupWith in GHC.Exts, but doesn't sort its input first groupRuns :: Eq b => (a -> b) -> [a] -> [[a]] groupRuns f = groupBy (\x y -> f x == f y) output = [ (the x, y) | x <- ([1..3] ++ [1..2]) , y <- [4..6] , then group by x using groupRuns ]
This results in the variable output
taking on the value below:
[(1, [4, 5, 6]), (2, [4, 5, 6]), (3, [4, 5, 6]), (1, [4, 5, 6]), (2, [4, 5, 6])]
Note that we have used the the
function to change the type
of x from a list to its original numeric type. The variable y, in contrast, is left
unchanged from the list form introduced by the grouping.
then group using f
With this form of the group statement, f is required to simply have the type
forall a. [a] -> [[a]]
, which will be used to group up the
comprehension so far directly. An example of this form is as follows:
output = [ x | y <- [1..5] , x <- "hello" , then group using inits]
This will yield a list containing every prefix of the word "hello" written out 5 times:
["","h","he","hel","hell","hello","helloh","hellohe","hellohel","hellohell","hellohello","hellohelloh",...]
Monad comprehensions generalise the list comprehension notation, including parallel comprehensions (Section 7.3.12, “Parallel List Comprehensions”) and transform comprehensions (Section 7.3.13, “Generalised (SQL-Like) List Comprehensions”) to work for any monad.
Monad comprehensions support:
Bindings:
[ x + y | x <- Just 1, y <- Just 2 ]
Bindings are translated with the (>>=)
and
return
functions to the usual do-notation:
do x <- Just 1 y <- Just 2 return (x+y)
Guards:
[ x | x <- [1..10], x <= 5 ]
Guards are translated with the guard
function,
which requires a MonadPlus
instance:
do x <- [1..10] guard (x <= 5) return x
Transform statements (as with -XTransformListComp
):
[ x+y | x <- [1..10], y <- [1..x], then take 2 ]
This translates to:
do (x,y) <- take 2 (do x <- [1..10] y <- [1..x] return (x,y)) return (x+y)
Group statements (as with -XTransformListComp
):
[ x | x <- [1,1,2,2,3], then group by x using GHC.Exts.groupWith ] [ x | x <- [1,1,2,2,3], then group using myGroup ]
Parallel statements (as with -XParallelListComp
):
[ (x+y) | x <- [1..10] | y <- [11..20] ]
Parallel statements are translated using the
mzip
function, which requires a
MonadZip
instance defined in
Control.Monad.Zip
:
do (x,y) <- mzip (do x <- [1..10] return x) (do y <- [11..20] return y) return (x+y)
All these features are enabled by default if the
MonadComprehensions
extension is enabled. The types
and more detailed examples on how to use comprehensions are explained
in the previous chapters Section 7.3.13, “Generalised (SQL-Like) List Comprehensions” and Section 7.3.12, “Parallel List Comprehensions”. In general you just have
to replace the type [a]
with the type
Monad m => m a
for monad comprehensions.
Note: Even though most of these examples are using the list monad,
monad comprehensions work for any monad.
The base
package offers all necessary instances for
lists, which make MonadComprehensions
backward
compatible to built-in, transform and parallel list comprehensions.
More formally, the desugaring is as follows. We write D[ e | Q]
to mean the desugaring of the monad comprehension [ e | Q]
:
Expressions: e Declarations: d Lists of qualifiers: Q,R,S -- Basic forms D[ e | ] = return e D[ e | p <- e, Q ] = e >>= \p -> D[ e | Q ] D[ e | e, Q ] = guard e >> \p -> D[ e | Q ] D[ e | let d, Q ] = let d in D[ e | Q ] -- Parallel comprehensions (iterate for multiple parallel branches) D[ e | (Q | R), S ] = mzip D[ Qv | Q ] D[ Rv | R ] >>= \(Qv,Rv) -> D[ e | S ] -- Transform comprehensions D[ e | Q then f, R ] = f D[ Qv | Q ] >>= \Qv -> D[ e | R ] D[ e | Q then f by b, R ] = f (\Qv -> b) D[ Qv | Q ] >>= \Qv -> D[ e | R ] D[ e | Q then group using f, R ] = f D[ Qv | Q ] >>= \ys -> case (fmap selQv1 ys, ..., fmap selQvn ys) of Qv -> D[ e | R ] D[ e | Q then group by b using f, R ] = f (\Qv -> b) D[ Qv | Q ] >>= \ys -> case (fmap selQv1 ys, ..., fmap selQvn ys) of Qv -> D[ e | R ] where Qv is the tuple of variables bound by Q (and used subsequently) selQvi is a selector mapping Qv to the ith component of Qv Operator Standard binding Expected type -------------------------------------------------------------------- return GHC.Base t1 -> m t2 (>>=) GHC.Base m1 t1 -> (t2 -> m2 t3) -> m3 t3 (>>) GHC.Base m1 t1 -> m2 t2 -> m3 t3 guard Control.Monad t1 -> m t2 fmap GHC.Base forall a b. (a->b) -> n a -> n b mzip Control.Monad.Zip forall a b. m a -> m b -> m (a,b)
The comprehension should typecheck when its desugaring would typecheck.
Monad comprehensions support rebindable syntax (Section 7.3.15, “Rebindable syntax and the implicit Prelude import”).
Without rebindable
syntax, the operators from the "standard binding" module are used; with
rebindable syntax, the operators are looked up in the current lexical scope.
For example, parallel comprehensions will be typechecked and desugared
using whatever "mzip
" is in scope.
The rebindable operators must have the "Expected type" given in the table above. These types are surprisingly general. For example, you can use a bind operator with the type
(>>=) :: T x y a -> (a -> T y z b) -> T x z b
In the case of transform comprehensions, notice that the groups are
parameterised over some arbitrary type n
(provided it
has an fmap
, as well as
the comprehension being over an arbitrary monad.
GHC normally imports
Prelude.hi
files for you. If you'd
rather it didn't, then give it a
-XNoImplicitPrelude
option. The idea is
that you can then import a Prelude of your own. (But don't
call it Prelude
; the Haskell module
namespace is flat, and you must not conflict with any
Prelude module.)
Suppose you are importing a Prelude of your own
in order to define your own numeric class
hierarchy. It completely defeats that purpose if the
literal "1" means "Prelude.fromInteger
1
", which is what the Haskell Report specifies.
So the -XRebindableSyntax
flag causes
the following pieces of built-in syntax to refer to
whatever is in scope, not the Prelude
versions:
An integer literal 368
means
"fromInteger (368::Integer)
", rather than
"Prelude.fromInteger (368::Integer)
".
Fractional literals are handed in just the same way,
except that the translation is
fromRational (3.68::Rational)
.
The equality test in an overloaded numeric pattern
uses whatever (==)
is in scope.
The subtraction operation, and the
greater-than-or-equal test, in n+k
patterns
use whatever (-)
and (>=)
are in scope.
Negation (e.g. "- (f x)
")
means "negate (f x)
", both in numeric
patterns, and expressions.
Conditionals (e.g. "if
e1 then
e2 else
e3")
means "ifThenElse
e1 e2 e3". However case
expressions are unaffected.
"Do" notation is translated using whatever
functions (>>=)
,
(>>)
, and fail
,
are in scope (not the Prelude
versions). List comprehensions, mdo (Section 7.3.11, “The recursive do-notation
”), and parallel array
comprehensions, are unaffected.
Arrow
notation (see Section 7.17, “Arrow notation
”)
uses whatever arr
,
(>>>)
, first
,
app
, (|||)
and
loop
functions are in scope. But unlike the
other constructs, the types of these functions must match the
Prelude types very closely. Details are in flux; if you want
to use this, ask!
-XRebindableSyntax
implies -XNoImplicitPrelude
.
In all cases (apart from arrow notation), the static semantics should be that of the desugared form,
even if that is a little unexpected. For example, the
static semantics of the literal 368
is exactly that of fromInteger (368::Integer)
; it's fine for
fromInteger
to have any of the types:
fromInteger :: Integer -> Integer fromInteger :: forall a. Foo a => Integer -> a fromInteger :: Num a => a -> Integer fromInteger :: Integer -> Bool -> Bool
Be warned: this is an experimental facility, with
fewer checks than usual. Use -dcore-lint
to typecheck the desugared program. If Core Lint is happy
you should be all right.
The -XPostfixOperators
flag enables a small
extension to the syntax of left operator sections, which allows you to
define postfix operators. The extension is this: the left section
(e !)
is equivalent (from the point of view of both type checking and execution) to the expression
((!) e)
(for any expression e
and operator (!)
.
The strict Haskell 98 interpretation is that the section is equivalent to
(\y -> (!) e y)
That is, the operator must be a function of two arguments. GHC allows it to take only one argument, and that in turn allows you to write the function postfix.
The extension does not extend to the left-hand side of function definitions; you must define such a function in prefix form.
The -XTupleSections
flag enables Python-style partially applied
tuple constructors. For example, the following program
(, True)
is considered to be an alternative notation for the more unwieldy alternative
\x -> (x, True)
You can omit any combination of arguments to the tuple, as in the following
(, "I", , , "Love", , 1337)
which translates to
\a b c d -> (a, "I", b, c, "Love", d, 1337)
If you have unboxed tuples enabled, tuple sections will also be available for them, like so
(# , True #)
Because there is no unboxed unit tuple, the following expression
(# #)
continues to stand for the unboxed singleton tuple data constructor.
The -XLambdaCase
flag enables expressions of the form
\case { p1 -> e1; ...; pN -> eN }
which is equivalent to
\freshName -> case freshName of { p1 -> e1; ...; pN -> eN }
Note that \case
starts a layout, so you can write
\case p1 -> e1 ... pN -> eN
The -XEmptyCase
flag enables
case expressions, or lambda-case expressions, that have no alternatives,
thus:
case e of { } -- No alternatives or \case { } -- -XLambdaCase is also required
This can be useful when you know that the expression being scrutinised has no non-bottom values. For example:
data Void f :: Void -> Int f x = case x of { }
With dependently-typed features it is more useful
(see Trac).
For example, consider these two candidate definitions of absurd
:
data a :==: b where Refl :: a :==: a absurd :: True :~: False -> a absurd x = error "absurd" -- (A) absurd x = case x of {} -- (B)
We much prefer (B). Why? Because GHC can figure out that (True :~: False)
is an empty type. So (B) has no partiality and GHC should be able to compile with
-fwarn-incomplete-patterns
. (Though the pattern match checking is not
yet clever enough to do that.)
On the other hand (A) looks dangerous, and GHC doesn't check to make
sure that, in fact, the function can never get called.
With -XMultiWayIf
flag GHC accepts conditional expressions
with multiple branches:
if | guard1 -> expr1 | ... | guardN -> exprN
which is roughly equivalent to
case () of _ | guard1 -> expr1 ... _ | guardN -> exprN
Multi-way if expressions introduce a new layout context. So the example above is equivalent to:
if { | guard1 -> expr1 ; | ... ; | guardN -> exprN }
The following behaves as expected:
if | guard1 -> if | guard2 -> expr2 | guard3 -> expr3 | guard4 -> expr4
because layout translates it as
if { | guard1 -> if { | guard2 -> expr2 ; | guard3 -> expr3 } ; | guard4 -> expr4 }
Layout with multi-way if works in the same way as other layout contexts, except that the semi-colons between guards in a multi-way if are optional. So it is not necessary to line up all the guards at the same column; this is consistent with the way guards work in function definitions and case expressions.
In record construction and record pattern matching it is entirely unambiguous which field is referred to, even if there are two different data types in scope with a common field name. For example:
module M where data S = MkS { x :: Int, y :: Bool } module Foo where import M data T = MkT { x :: Int } ok1 (MkS { x = n }) = n+1 -- Unambiguous ok2 n = MkT { x = n+1 } -- Unambiguous bad1 k = k { x = 3 } -- Ambiguous bad2 k = x k -- Ambiguous
Even though there are two x
's in scope,
it is clear that the x
in the pattern in the
definition of ok1
can only mean the field
x
from type S
. Similarly for
the function ok2
. However, in the record update
in bad1
and the record selection in bad2
it is not clear which of the two types is intended.
Haskell 98 regards all four as ambiguous, but with the
-XDisambiguateRecordFields
flag, GHC will accept
the former two. The rules are precisely the same as those for instance
declarations in Haskell 98, where the method names on the left-hand side
of the method bindings in an instance declaration refer unambiguously
to the method of that class (provided they are in scope at all), even
if there are other variables in scope with the same name.
This reduces the clutter of qualified names when you import two
records from different modules that use the same field name.
Some details:
Field disambiguation can be combined with punning (see Section 7.3.22, “Record puns ”). For example:
module Foo where import M x=True ok3 (MkS { x }) = x+1 -- Uses both disambiguation and punning
With -XDisambiguateRecordFields
you can use unqualified
field names even if the corresponding selector is only in scope qualified
For example, assuming the same module M
as in our earlier example, this is legal:
module Foo where import qualified M -- Note qualified ok4 (M.MkS { x = n }) = n+1 -- Unambiguous
Since the constructor MkS
is only in scope qualified, you must
name it M.MkS
, but the field x
does not need
to be qualified even though M.x
is in scope but x
is not. (In effect, it is qualified by the constructor.)
Record puns are enabled by the flag -XNamedFieldPuns
.
When using records, it is common to write a pattern that binds a variable with the same name as a record field, such as:
data C = C {a :: Int} f (C {a = a}) = a
Record punning permits the variable name to be elided, so one can simply write
f (C {a}) = a
to mean the same pattern as above. That is, in a record pattern, the
pattern a
expands into the pattern a =
a
for the same name a
.
Note that:
Record punning can also be used in an expression, writing, for example,
let a = 1 in C {a}
instead of
let a = 1 in C {a = a}
The expansion is purely syntactic, so the expanded right-hand side expression refers to the nearest enclosing variable that is spelled the same as the field name.
Puns and other patterns can be mixed in the same record:
data C = C {a :: Int, b :: Int} f (C {a, b = 4}) = a
Puns can be used wherever record patterns occur (e.g. in
let
bindings or at the top-level).
A pun on a qualified field name is expanded by stripping off the module qualifier. For example:
f (C {M.a}) = a
means
f (M.C {M.a = a}) = a
(This is useful if the field selector a
for constructor M.C
is only in scope in qualified form.)
Record wildcards are enabled by the flag -XRecordWildCards
.
This flag implies -XDisambiguateRecordFields
.
For records with many fields, it can be tiresome to write out each field individually in a record pattern, as in
data C = C {a :: Int, b :: Int, c :: Int, d :: Int} f (C {a = 1, b = b, c = c, d = d}) = b + c + d
Record wildcard syntax permits a "..
" in a record
pattern, where each elided field f
is replaced by the
pattern f = f
. For example, the above pattern can be
written as
f (C {a = 1, ..}) = b + c + d
More details:
Wildcards can be mixed with other patterns, including puns
(Section 7.3.22, “Record puns
”); for example, in a pattern C {a
= 1, b, ..})
. Additionally, record wildcards can be used
wherever record patterns occur, including in let
bindings and at the top-level. For example, the top-level binding
C {a = 1, ..} = e
defines b
, c
, and
d
.
Record wildcards can also be used in expressions, writing, for example,
let {a = 1; b = 2; c = 3; d = 4} in C {..}
in place of
let {a = 1; b = 2; c = 3; d = 4} in C {a=a, b=b, c=c, d=d}
The expansion is purely syntactic, so the record wildcard expression refers to the nearest enclosing variables that are spelled the same as the omitted field names.
The "..
" expands to the missing
in-scope record fields.
Specifically the expansion of "C {..}
" includes
f
if and only if:
f
is a record field of constructor C
.
The record field f
is in scope somehow (either qualified or unqualified).
In the case of expressions (but not patterns),
the variable f
is in scope unqualified,
apart from the binding of the record selector itself.
For example
module M where data R = R { a,b,c :: Int } module X where import M( R(a,c) ) f b = R { .. }
The R{..}
expands to R{M.a=a}
,
omitting b
since the record field is not in scope,
and omitting c
since the variable c
is not in scope (apart from the binding of the
record selector c
, of course).
A careful reading of the Haskell 98 Report reveals that fixity
declarations (infix
, infixl
, and
infixr
) are permitted to appear inside local bindings
such those introduced by let
and
where
. However, the Haskell Report does not specify
the semantics of such bindings very precisely.
In GHC, a fixity declaration may accompany a local binding:
let f = ... infixr 3 `f` in ...
and the fixity declaration applies wherever the binding is in scope.
For example, in a let
, it applies in the right-hand
sides of other let
-bindings and the body of the
let
C. Or, in recursive do
expressions (Section 7.3.11, “The recursive do-notation
”), the local fixity
declarations of a let
statement scope over other
statements in the group, just as the bound name does.
Moreover, a local fixity declaration *must* accompany a local binding of that name: it is not possible to revise the fixity of name bound elsewhere, as in
let infixr 9 $ in ...
Because local fixity declarations are technically Haskell 98, no flag is necessary to enable them.
With the -XPackageImports
flag, GHC allows
import declarations to be qualified by the package name that the
module is intended to be imported from. For example:
import "network" Network.Socket
would import the module Network.Socket
from
the package network
(any version). This may
be used to disambiguate an import when the same module is
available from multiple packages, or is present in both the
current package being built and an external package.
The special package name this
can be used to
refer to the current package being built.
Note: you probably don't need to use this feature, it was added mainly so that we can build backwards-compatible versions of packages when APIs change. It can lead to fragile dependencies in the common case: modules occasionally move from one package to another, rendering any package-qualified imports broken.
With the -XSafe
, -XTrustworthy
and -XUnsafe
language flags, GHC extends
the import declaration syntax to take an optional safe
keyword after the import
keyword. This feature
is part of the Safe Haskell GHC extension. For example:
import safe qualified Network.Socket as NS
would import the module Network.Socket
with compilation only succeeding if Network.Socket can be
safely imported. For a description of when a import is
considered safe see Section 7.27, “Safe Haskell”
In an import or export list, such as
module M( f, (++) ) where ... import N( f, (++) ) ...
the entities f
and (++)
are values.
However, with type operators (Section 7.4.4, “Type operators”) it becomes possible
to declare (++)
as a type constructor. In that
case, how would you export or import it?
The -XExplicitNamespaces
extension allows you to prefix the name of
a type constructor in an import or export list with "type
" to
disambiguate this case, thus:
module M( f, type (++) ) where ... import N( f, type (++) ) ... module N( f, type (++) ) where data family a ++ b = L a | R b
The extension -XExplicitNamespaces
is implied by -XTypeOperators
and (for some reason) by -XTypeFamilies
.
Turning on an option that enables special syntax might cause working Haskell 98 code to fail to compile, perhaps because it uses a variable name which has become a reserved word. This section lists the syntax that is "stolen" by language extensions. We use notation and nonterminal names from the Haskell 98 lexical syntax (see the Haskell 98 Report). We only list syntax changes here that might affect existing working programs (i.e. "stolen" syntax). Many of these extensions will also enable new context-free syntax, but in all cases programs written to use the new syntax would not be compilable without the option enabled.
There are two classes of special syntax:
New reserved words and symbols: character sequences which are no longer available for use as identifiers in the program.
Other special syntax: sequences of characters that have a different meaning when this particular option is turned on.
The following syntax is stolen:
forall
Stolen (in types) by: -XExplicitForAll
, and hence by
-XScopedTypeVariables
,
-XLiberalTypeSynonyms
,
-XRankNTypes
,
-XExistentialQuantification
mdo
Stolen by: -XRecursiveDo
foreign
Stolen by: -XForeignFunctionInterface
rec
,
proc
, -<
,
>-
, -<<
,
>>-
, and (|
,
|)
brackets
Stolen by: -XArrows
?varid
Stolen by: -XImplicitParams
[|
,
[e|
, [p|
,
[d|
, [t|
,
$(
,
$$(
,
[||
,
[e||
,
$varid
,
$$varid
Stolen by: -XTemplateHaskell
[varid
|
Stolen by: -XQuasiQuotes
varid
{#
},
char
#
,
string
#
,
integer
#
,
float
#
,
float
##
Stolen by: -XMagicHash
(#
, #)
Stolen by: -XUnboxedTuples
varid
!
varid
Stolen by: -XBangPatterns
pattern
Stolen by: -XPatternSynonyms