With the -XEmptyDataDecls
flag (or equivalent LANGUAGE pragma),
GHC lets you declare a data type with no constructors. For example:
data S -- S :: * data T a -- T :: * -> *
Syntactically, the declaration lacks the "= constrs" part. The
type can be parameterised over types of any kind, but if the kind is
not *
then an explicit kind annotation must be used
(see Section 7.12.4, “Explicitly-kinded quantification”).
Such data types have only one value, namely bottom. Nevertheless, they can be useful when defining "phantom types".
Haskell allows datatypes to be given contexts, e.g.
data Eq a => Set a = NilSet | ConsSet a (Set a)
give constructors with types:
NilSet :: Set a ConsSet :: Eq a => a -> Set a -> Set a
This is widely considered a misfeature, and is going to be removed from
the language. In GHC, it is controlled by the deprecated extension
DatatypeContexts
.
GHC allows type constructors, classes, and type variables to be operators, and to be written infix, very much like expressions. More specifically:
A type constructor or class can be an operator, beginning with a colon; e.g. :*:
.
The lexical syntax is the same as that for data constructors.
Data type and type-synonym declarations can be written infix, parenthesised if you want further arguments. E.g.
data a :*: b = Foo a b type a :+: b = Either a b class a :=: b where ... data (a :**: b) x = Baz a b x type (a :++: b) y = Either (a,b) y
Types, and class constraints, can be written infix. For example
x :: Int :*: Bool f :: (a :=: b) => a -> b
A type variable can be an (unqualified) operator e.g. +
.
The lexical syntax is the same as that for variable operators, excluding "(.)",
"(!)", and "(*)". In a binding position, the operator must be
parenthesised. For example:
type T (+) = Int + Int f :: T Either f = Left 3 liftA2 :: Arrow (~>) => (a -> b -> c) -> (e ~> a) -> (e ~> b) -> (e ~> c) liftA2 = ...
Back-quotes work
as for expressions, both for type constructors and type variables; e.g. Int `Either` Bool
, or
Int `a` Bool
. Similarly, parentheses work the same; e.g. (:*:) Int Bool
.
Fixities may be declared for type constructors, or classes, just as for data constructors. However, one cannot distinguish between the two in a fixity declaration; a fixity declaration sets the fixity for a data constructor and the corresponding type constructor. For example:
infixl 7 T, :*:
sets the fixity for both type constructor T
and data constructor T
,
and similarly for :*:
.
Int `a` Bool
.
Function arrow is infixr
with fixity 0. (This might change; I'm not sure what it should be.)
Type synonyms are like macros at the type level, but Haskell 98 imposes many rules
on individual synonym declarations.
With the -XLiberalTypeSynonyms
extension,
GHC does validity checking on types only after expanding type synonyms.
That means that GHC can be very much more liberal about type synonyms than Haskell 98.
You can write a forall
(including overloading)
in a type synonym, thus:
type Discard a = forall b. Show b => a -> b -> (a, String) f :: Discard a f x y = (x, show y) g :: Discard Int -> (Int,String) -- A rank-2 type g f = f 3 True
If you also use -XUnboxedTuples
,
you can write an unboxed tuple in a type synonym:
type Pr = (# Int, Int #) h :: Int -> Pr h x = (# x, x #)
You can apply a type synonym to a forall type:
type Foo a = a -> a -> Bool f :: Foo (forall b. b->b)
After expanding the synonym, f
has the legal (in GHC) type:
f :: (forall b. b->b) -> (forall b. b->b) -> Bool
You can apply a type synonym to a partially applied type synonym:
type Generic i o = forall x. i x -> o x type Id x = x foo :: Generic Id []
After expanding the synonym, foo
has the legal (in GHC) type:
foo :: forall x. x -> [x]
GHC currently does kind checking before expanding synonyms (though even that could be changed.)
After expanding type synonyms, GHC does validity checking on types, looking for the following mal-formedness which isn't detected simply by kind checking:
Type constructor applied to a type involving for-alls.
Unboxed tuple on left of an arrow.
Partially-applied type synonym.
So, for example, this will be rejected:
type Pr = (# Int, Int #) h :: Pr -> Int h x = ...
because GHC does not allow unboxed tuples on the left of a function arrow.
The idea of using existential quantification in data type declarations was suggested by Perry, and implemented in Hope+ (Nigel Perry, The Implementation of Practical Functional Programming Languages, PhD Thesis, University of London, 1991). It was later formalised by Laufer and Odersky (Polymorphic type inference and abstract data types, TOPLAS, 16(5), pp1411-1430, 1994). It's been in Lennart Augustsson's hbc Haskell compiler for several years, and proved very useful. Here's the idea. Consider the declaration:
data Foo = forall a. MkFoo a (a -> Bool) | Nil
The data type Foo
has two constructors with types:
MkFoo :: forall a. a -> (a -> Bool) -> Foo Nil :: Foo
Notice that the type variable a
in the type of MkFoo
does not appear in the data type itself, which is plain Foo
.
For example, the following expression is fine:
[MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
Here, (MkFoo 3 even)
packages an integer with a function
even
that maps an integer to Bool
; and MkFoo 'c'
isUpper
packages a character with a compatible function. These
two things are each of type Foo
and can be put in a list.
What can we do with a value of type Foo
?. In particular,
what happens when we pattern-match on MkFoo
?
f (MkFoo val fn) = ???
Since all we know about val
and fn
is that they
are compatible, the only (useful) thing we can do with them is to
apply fn
to val
to get a boolean. For example:
f :: Foo -> Bool f (MkFoo val fn) = fn val
What this allows us to do is to package heterogeneous values together with a bunch of functions that manipulate them, and then treat that collection of packages in a uniform manner. You can express quite a bit of object-oriented-like programming this way.
What has this to do with existential quantification?
Simply that MkFoo
has the (nearly) isomorphic type
MkFoo :: (exists a . (a, a -> Bool)) -> Foo
But Haskell programmers can safely think of the ordinary universally quantified type given above, thereby avoiding adding a new existential quantification construct.
An easy extension is to allow arbitrary contexts before the constructor. For example:
data Baz = forall a. Eq a => Baz1 a a | forall b. Show b => Baz2 b (b -> b)
The two constructors have the types you'd expect:
Baz1 :: forall a. Eq a => a -> a -> Baz Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
But when pattern matching on Baz1
the matched values can be compared
for equality, and when pattern matching on Baz2
the first matched
value can be converted to a string (as well as applying the function to it).
So this program is legal:
f :: Baz -> String f (Baz1 p q) | p == q = "Yes" | otherwise = "No" f (Baz2 v fn) = show (fn v)
Operationally, in a dictionary-passing implementation, the
constructors Baz1
and Baz2
must store the
dictionaries for Eq
and Show
respectively, and
extract it on pattern matching.
GHC allows existentials to be used with records syntax as well. For example:
data Counter a = forall self. NewCounter { _this :: self , _inc :: self -> self , _display :: self -> IO () , tag :: a }
Here tag
is a public field, with a well-typed selector
function tag :: Counter a -> a
. The self
type is hidden from the outside; any attempt to apply _this
,
_inc
or _display
as functions will raise a
compile-time error. In other words, GHC defines a record selector function
only for fields whose type does not mention the existentially-quantified variables.
(This example used an underscore in the fields for which record selectors
will not be defined, but that is only programming style; GHC ignores them.)
To make use of these hidden fields, we need to create some helper functions:
inc :: Counter a -> Counter a inc (NewCounter x i d t) = NewCounter { _this = i x, _inc = i, _display = d, tag = t } display :: Counter a -> IO () display NewCounter{ _this = x, _display = d } = d x
Now we can define counters with different underlying implementations:
counterA :: Counter String counterA = NewCounter { _this = 0, _inc = (1+), _display = print, tag = "A" } counterB :: Counter String counterB = NewCounter { _this = "", _inc = ('#':), _display = putStrLn, tag = "B" } main = do display (inc counterA) -- prints "1" display (inc (inc counterB)) -- prints "##"
Record update syntax is supported for existentials (and GADTs):
setTag :: Counter a -> a -> Counter a setTag obj t = obj{ tag = t }
The rule for record update is this: the types of the updated fields may mention only the universally-quantified type variables of the data constructor. For GADTs, the field may mention only types that appear as a simple type-variable argument in the constructor's result type. For example:
data T a b where { T1 { f1::a, f2::b, f3::(b,c) } :: T a b } -- c is existential upd1 t x = t { f1=x } -- OK: upd1 :: T a b -> a' -> T a' b upd2 t x = t { f3=x } -- BAD (f3's type mentions c, which is -- existentially quantified) data G a b where { G1 { g1::a, g2::c } :: G a [c] } upd3 g x = g { g1=x } -- OK: upd3 :: G a b -> c -> G c b upd4 g x = g { g2=x } -- BAD (f2's type mentions c, which is not a simple -- type-variable argument in G1's result type)
There are several restrictions on the ways in which existentially-quantified constructors can be use.
When pattern matching, each pattern match introduces a new, distinct, type for each existential type variable. These types cannot be unified with any other type, nor can they escape from the scope of the pattern match. For example, these fragments are incorrect:
f1 (MkFoo a f) = a
Here, the type bound by MkFoo
"escapes", because a
is the result of f1
. One way to see why this is wrong is to
ask what type f1
has:
f1 :: Foo -> a -- Weird!
What is this "a
" in the result type? Clearly we don't mean
this:
f1 :: forall a. Foo -> a -- Wrong!
The original program is just plain wrong. Here's another sort of error
f2 (Baz1 a b) (Baz1 p q) = a==q
It's ok to say a==b
or p==q
, but
a==q
is wrong because it equates the two distinct types arising
from the two Baz1
constructors.
You can't pattern-match on an existentially quantified
constructor in a let
or where
group of
bindings. So this is illegal:
f3 x = a==b where { Baz1 a b = x }
Instead, use a case
expression:
f3 x = case x of Baz1 a b -> a==b
In general, you can only pattern-match
on an existentially-quantified constructor in a case
expression or
in the patterns of a function definition.
The reason for this restriction is really an implementation one.
Type-checking binding groups is already a nightmare without
existentials complicating the picture. Also an existential pattern
binding at the top level of a module doesn't make sense, because it's
not clear how to prevent the existentially-quantified type "escaping".
So for now, there's a simple-to-state restriction. We'll see how
annoying it is.
You can't use existential quantification for newtype
declarations. So this is illegal:
newtype T = forall a. Ord a => MkT a
Reason: a value of type T
must be represented as a
pair of a dictionary for Ord t
and a value of type
t
. That contradicts the idea that
newtype
should have no concrete representation.
You can get just the same efficiency and effect by using
data
instead of newtype
. If
there is no overloading involved, then there is more of a case for
allowing an existentially-quantified newtype
,
because the data
version does carry an
implementation cost, but single-field existentially quantified
constructors aren't much use. So the simple restriction (no
existential stuff on newtype
) stands, unless there
are convincing reasons to change it.
You can't use deriving
to define instances of a
data type with existentially quantified data constructors.
Reason: in most cases it would not make sense. For example:;
data T = forall a. MkT [a] deriving( Eq )
To derive Eq
in the standard way we would need to have equality
between the single component of two MkT
constructors:
instance Eq T where (MkT a) == (MkT b) = ???
But a
and b
have distinct types, and so can't be compared.
It's just about possible to imagine examples in which the derived instance
would make sense, but it seems altogether simpler simply to prohibit such
declarations. Define your own instances!
When the GADTSyntax
extension is enabled,
GHC allows you to declare an algebraic data type by
giving the type signatures of constructors explicitly. For example:
data Maybe a where Nothing :: Maybe a Just :: a -> Maybe a
The form is called a "GADT-style declaration" because Generalised Algebraic Data Types, described in Section 7.4.7, “Generalised Algebraic Data Types (GADTs)”, can only be declared using this form.
Notice that GADT-style syntax generalises existential types (Section 7.4.5, “Existentially quantified data constructors ”). For example, these two declarations are equivalent:
data Foo = forall a. MkFoo a (a -> Bool) data Foo' where { MKFoo :: a -> (a->Bool) -> Foo' }
Any data type that can be declared in standard Haskell-98 syntax can also be declared using GADT-style syntax. The choice is largely stylistic, but GADT-style declarations differ in one important respect: they treat class constraints on the data constructors differently. Specifically, if the constructor is given a type-class context, that context is made available by pattern matching. For example:
data Set a where MkSet :: Eq a => [a] -> Set a makeSet :: Eq a => [a] -> Set a makeSet xs = MkSet (nub xs) insert :: a -> Set a -> Set a insert a (MkSet as) | a `elem` as = MkSet as | otherwise = MkSet (a:as)
A use of MkSet
as a constructor (e.g. in the definition of makeSet
)
gives rise to a (Eq a)
constraint, as you would expect. The new feature is that pattern-matching on MkSet
(as in the definition of insert
) makes available an (Eq a)
context. In implementation terms, the MkSet
constructor has a hidden field that stores
the (Eq a)
dictionary that is passed to MkSet
; so
when pattern-matching that dictionary becomes available for the right-hand side of the match.
In the example, the equality dictionary is used to satisfy the equality constraint
generated by the call to elem
, so that the type of
insert
itself has no Eq
constraint.
For example, one possible application is to reify dictionaries:
data NumInst a where MkNumInst :: Num a => NumInst a intInst :: NumInst Int intInst = MkNumInst plus :: NumInst a -> a -> a -> a plus MkNumInst p q = p + q
Here, a value of type NumInst a
is equivalent
to an explicit (Num a)
dictionary.
All this applies to constructors declared using the syntax of Section 7.4.5.2, “Existentials and type classes”.
For example, the NumInst
data type above could equivalently be declared
like this:
data NumInst a = Num a => MkNumInst (NumInst a)
Notice that, unlike the situation when declaring an existential, there is
no forall
, because the Num
constrains the
data type's universally quantified type variable a
.
A constructor may have both universal and existential type variables: for example,
the following two declarations are equivalent:
data T1 a = forall b. (Num a, Eq b) => MkT1 a b data T2 a where MkT2 :: (Num a, Eq b) => a -> b -> T2 a
All this behaviour contrasts with Haskell 98's peculiar treatment of contexts on a data type declaration (Section 4.2.1 of the Haskell 98 Report). In Haskell 98 the definition
data Eq a => Set' a = MkSet' [a]
gives MkSet'
the same type as MkSet
above. But instead of
making available an (Eq a)
constraint, pattern-matching
on MkSet'
requires an (Eq a)
constraint!
GHC faithfully implements this behaviour, odd though it is. But for GADT-style declarations,
GHC's behaviour is much more useful, as well as much more intuitive.
The rest of this section gives further details about GADT-style data type declarations.
The result type of each data constructor must begin with the type constructor being defined.
If the result type of all constructors
has the form T a1 ... an
, where a1 ... an
are distinct type variables, then the data type is ordinary;
otherwise is a generalised data type (Section 7.4.7, “Generalised Algebraic Data Types (GADTs)”).
As with other type signatures, you can give a single signature for several data constructors.
In this example we give a single signature for T1
and T2
:
data T a where T1,T2 :: a -> T a T3 :: T a
The type signature of
each constructor is independent, and is implicitly universally quantified as usual.
In particular, the type variable(s) in the "data T a where
" header
have no scope, and different constructors may have different universally-quantified type variables:
data T a where -- The 'a' has no scope T1,T2 :: b -> T b -- Means forall b. b -> T b T3 :: T a -- Means forall a. T a
A constructor signature may mention type class constraints, which can differ for different constructors. For example, this is fine:
data T a where T1 :: Eq b => b -> b -> T b T2 :: (Show c, Ix c) => c -> [c] -> T c
When pattern matching, these constraints are made available to discharge constraints in the body of the match. For example:
f :: T a -> String f (T1 x y) | x==y = "yes" | otherwise = "no" f (T2 a b) = show a
Note that f
is not overloaded; the Eq
constraint arising
from the use of ==
is discharged by the pattern match on T1
and similarly the Show
constraint arising from the use of show
.
Unlike a Haskell-98-style
data type declaration, the type variable(s) in the "data Set a where
" header
have no scope. Indeed, one can write a kind signature instead:
data Set :: * -> * where ...
or even a mixture of the two:
data Bar a :: (* -> *) -> * where ...
The type variables (if given) may be explicitly kinded, so we could also write the header for Foo
like this:
data Bar a (b :: * -> *) where ...
You can use strictness annotations, in the obvious places in the constructor type:
data Term a where Lit :: !Int -> Term Int If :: Term Bool -> !(Term a) -> !(Term a) -> Term a Pair :: Term a -> Term b -> Term (a,b)
You can use a deriving
clause on a GADT-style data type
declaration. For example, these two declarations are equivalent
data Maybe1 a where { Nothing1 :: Maybe1 a ; Just1 :: a -> Maybe1 a } deriving( Eq, Ord ) data Maybe2 a = Nothing2 | Just2 a deriving( Eq, Ord )
The type signature may have quantified type variables that do not appear in the result type:
data Foo where MkFoo :: a -> (a->Bool) -> Foo Nil :: Foo
Here the type variable a
does not appear in the result type
of either constructor.
Although it is universally quantified in the type of the constructor, such
a type variable is often called "existential".
Indeed, the above declaration declares precisely the same type as
the data Foo
in Section 7.4.5, “Existentially quantified data constructors
”.
The type may contain a class context too, of course:
data Showable where MkShowable :: Show a => a -> Showable
You can use record syntax on a GADT-style data type declaration:
data Person where Adult :: { name :: String, children :: [Person] } -> Person Child :: Show a => { name :: !String, funny :: a } -> Person
As usual, for every constructor that has a field f
, the type of
field f
must be the same (modulo alpha conversion).
The Child
constructor above shows that the signature
may have a context, existentially-quantified variables, and strictness annotations,
just as in the non-record case. (NB: the "type" that follows the double-colon
is not really a type, because of the record syntax and strictness annotations.
A "type" of this form can appear only in a constructor signature.)
Record updates are allowed with GADT-style declarations, only fields that have the following property: the type of the field mentions no existential type variables.
As in the case of existentials declared using the Haskell-98-like record syntax (Section 7.4.5.3, “Record Constructors”), record-selector functions are generated only for those fields that have well-typed selectors. Here is the example of that section, in GADT-style syntax:
data Counter a where NewCounter { _this :: self , _inc :: self -> self , _display :: self -> IO () , tag :: a } :: Counter a
As before, only one selector function is generated here, that for tag
.
Nevertheless, you can still use all the field names in pattern matching and record construction.
In a GADT-style data type declaration there is no obvious way to specify that a data constructor
should be infix, which makes a difference if you derive Show
for the type.
(Data constructors declared infix are displayed infix by the derived show
.)
So GHC implements the following design: a data constructor declared in a GADT-style data type
declaration is displayed infix by Show
iff (a) it is an operator symbol,
(b) it has two arguments, (c) it has a programmer-supplied fixity declaration. For example
infix 6 (:--:) data T a where (:--:) :: Int -> Bool -> T Int
Generalised Algebraic Data Types generalise ordinary algebraic data types by allowing constructors to have richer return types. Here is an example:
data Term a where Lit :: Int -> Term Int Succ :: Term Int -> Term Int IsZero :: Term Int -> Term Bool If :: Term Bool -> Term a -> Term a -> Term a Pair :: Term a -> Term b -> Term (a,b)
Notice that the return type of the constructors is not always Term a
, as is the
case with ordinary data types. This generality allows us to
write a well-typed eval
function
for these Terms
:
eval :: Term a -> a eval (Lit i) = i eval (Succ t) = 1 + eval t eval (IsZero t) = eval t == 0 eval (If b e1 e2) = if eval b then eval e1 else eval e2 eval (Pair e1 e2) = (eval e1, eval e2)
The key point about GADTs is that pattern matching causes type refinement. For example, in the right hand side of the equation
eval :: Term a -> a eval (Lit i) = ...
the type a
is refined to Int
. That's the whole point!
A precise specification of the type rules is beyond what this user manual aspires to,
but the design closely follows that described in
the paper Simple
unification-based type inference for GADTs,
(ICFP 2006).
The general principle is this: type refinement is only carried out
based on user-supplied type annotations.
So if no type signature is supplied for eval
, no type refinement happens,
and lots of obscure error messages will
occur. However, the refinement is quite general. For example, if we had:
eval :: Term a -> a -> a eval (Lit i) j = i+j
the pattern match causes the type a
to be refined to Int
(because of the type
of the constructor Lit
), and that refinement also applies to the type of j
, and
the result type of the case
expression. Hence the addition i+j
is legal.
These and many other examples are given in papers by Hongwei Xi, and Tim Sheard. There is a longer introduction on the wiki, and Ralf Hinze's Fun with phantom types also has a number of examples. Note that papers may use different notation to that implemented in GHC.
The rest of this section outlines the extensions to GHC that support GADTs. The extension is enabled with
-XGADTs
. The -XGADTs
flag also sets -XRelaxedPolyRec
.
A GADT can only be declared using GADT-style syntax (Section 7.4.6, “Declaring data types with explicit constructor signatures”);
the old Haskell-98 syntax for data declarations always declares an ordinary data type.
The result type of each constructor must begin with the type constructor being defined,
but for a GADT the arguments to the type constructor can be arbitrary monotypes.
For example, in the Term
data
type above, the type of each constructor must end with Term ty
, but
the ty
need not be a type variable (e.g. the Lit
constructor).
It is permitted to declare an ordinary algebraic data type using GADT-style syntax.
What makes a GADT into a GADT is not the syntax, but rather the presence of data constructors
whose result type is not just T a b
.
You cannot use a deriving
clause for a GADT; only for
an ordinary data type.
As mentioned in Section 7.4.6, “Declaring data types with explicit constructor signatures”, record syntax is supported. For example:
data Term a where Lit { val :: Int } :: Term Int Succ { num :: Term Int } :: Term Int Pred { num :: Term Int } :: Term Int IsZero { arg :: Term Int } :: Term Bool Pair { arg1 :: Term a , arg2 :: Term b } :: Term (a,b) If { cnd :: Term Bool , tru :: Term a , fls :: Term a } :: Term a
However, for GADTs there is the following additional constraint:
every constructor that has a field f
must have
the same result type (modulo alpha conversion)
Hence, in the above example, we cannot merge the num
and arg
fields above into a
single name. Although their field types are both Term Int
,
their selector functions actually have different types:
num :: Term Int -> Term Int arg :: Term Bool -> Term Int
When pattern-matching against data constructors drawn from a GADT,
for example in a case
expression, the following rules apply:
The type of the scrutinee must be rigid.
The type of the entire case
expression must be rigid.
The type of any free variable mentioned in any of
the case
alternatives must be rigid.
A type is "rigid" if it is completely known to the compiler at its binding site. The easiest way to ensure that a variable a rigid type is to give it a type signature. For more precise details see Simple unification-based type inference for GADTs . The criteria implemented by GHC are given in the Appendix.