7.4. Extensions to data types and type synonyms

7.4.1. Data types with no constructors

With the -XEmptyDataDecls flag (or equivalent LANGUAGE pragma), GHC lets you declare a data type with no constructors. For example:

  data S      -- S :: *
  data T a    -- T :: * -> *

Syntactically, the declaration lacks the "= constrs" part. The type can be parameterised over types of any kind, but if the kind is not * then an explicit kind annotation must be used (see Section 7.11.4, “Explicitly-kinded quantification”).

Such data types have only one value, namely bottom. Nevertheless, they can be useful when defining "phantom types".

7.4.2. Data type contexts

Haskell allows datatypes to be given contexts, e.g.

data Eq a => Set a = NilSet | ConsSet a (Set a)

give constructors with types:

NilSet :: Set a
ConsSet :: Eq a => a -> Set a -> Set a

This is widely considered a misfeature, and is going to be removed from the language. In GHC, it is controlled by the deprecated extension DatatypeContexts.

7.4.3. Infix type constructors, classes, and type variables

GHC allows type constructors, classes, and type variables to be operators, and to be written infix, very much like expressions. More specifically:

  • A type constructor or class can be an operator, beginning with a colon; e.g. :*:. The lexical syntax is the same as that for data constructors.

  • Data type and type-synonym declarations can be written infix, parenthesised if you want further arguments. E.g.

      data a :*: b = Foo a b
      type a :+: b = Either a b
      class a :=: b where ...
    
      data (a :**: b) x = Baz a b x
      type (a :++: b) y = Either (a,b) y
    

  • Types, and class constraints, can be written infix. For example

    	x :: Int :*: Bool
            f :: (a :=: b) => a -> b
      

  • A type variable can be an (unqualified) operator e.g. +. The lexical syntax is the same as that for variable operators, excluding "(.)", "(!)", and "(*)". In a binding position, the operator must be parenthesised. For example:

       type T (+) = Int + Int
       f :: T Either
       f = Left 3
    
       liftA2 :: Arrow (~>)
    	  => (a -> b -> c) -> (e ~> a) -> (e ~> b) -> (e ~> c)
       liftA2 = ...
    

  • Back-quotes work as for expressions, both for type constructors and type variables; e.g. Int `Either` Bool, or Int `a` Bool. Similarly, parentheses work the same; e.g. (:*:) Int Bool.

  • Fixities may be declared for type constructors, or classes, just as for data constructors. However, one cannot distinguish between the two in a fixity declaration; a fixity declaration sets the fixity for a data constructor and the corresponding type constructor. For example:

      infixl 7 T, :*:
    

    sets the fixity for both type constructor T and data constructor T, and similarly for :*:. Int `a` Bool.

  • Function arrow is infixr with fixity 0. (This might change; I'm not sure what it should be.)

7.4.4. Liberalised type synonyms

Type synonyms are like macros at the type level, but Haskell 98 imposes many rules on individual synonym declarations. With the -XLiberalTypeSynonyms extension, GHC does validity checking on types only after expanding type synonyms. That means that GHC can be very much more liberal about type synonyms than Haskell 98.

  • You can write a forall (including overloading) in a type synonym, thus:

      type Discard a = forall b. Show b => a -> b -> (a, String)
    
      f :: Discard a
      f x y = (x, show y)
    
      g :: Discard Int -> (Int,String)    -- A rank-2 type
      g f = f 3 True
    

  • If you also use -XUnboxedTuples, you can write an unboxed tuple in a type synonym:

      type Pr = (# Int, Int #)
    
      h :: Int -> Pr
      h x = (# x, x #)
    

  • You can apply a type synonym to a forall type:

      type Foo a = a -> a -> Bool
    
      f :: Foo (forall b. b->b)
    

    After expanding the synonym, f has the legal (in GHC) type:

      f :: (forall b. b->b) -> (forall b. b->b) -> Bool
    

  • You can apply a type synonym to a partially applied type synonym:

      type Generic i o = forall x. i x -> o x
      type Id x = x
    
      foo :: Generic Id []
    

    After expanding the synonym, foo has the legal (in GHC) type:

      foo :: forall x. x -> [x]
    

GHC currently does kind checking before expanding synonyms (though even that could be changed.)

After expanding type synonyms, GHC does validity checking on types, looking for the following mal-formedness which isn't detected simply by kind checking:

  • Type constructor applied to a type involving for-alls.

  • Unboxed tuple on left of an arrow.

  • Partially-applied type synonym.

So, for example, this will be rejected:

  type Pr = (# Int, Int #)

  h :: Pr -> Int
  h x = ...

because GHC does not allow unboxed tuples on the left of a function arrow.

7.4.5. Existentially quantified data constructors

The idea of using existential quantification in data type declarations was suggested by Perry, and implemented in Hope+ (Nigel Perry, The Implementation of Practical Functional Programming Languages, PhD Thesis, University of London, 1991). It was later formalised by Laufer and Odersky (Polymorphic type inference and abstract data types, TOPLAS, 16(5), pp1411-1430, 1994). It's been in Lennart Augustsson's hbc Haskell compiler for several years, and proved very useful. Here's the idea. Consider the declaration:

  data Foo = forall a. MkFoo a (a -> Bool)
           | Nil

The data type Foo has two constructors with types:

  MkFoo :: forall a. a -> (a -> Bool) -> Foo
  Nil   :: Foo

Notice that the type variable a in the type of MkFoo does not appear in the data type itself, which is plain Foo. For example, the following expression is fine:

  [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]

Here, (MkFoo 3 even) packages an integer with a function even that maps an integer to Bool; and MkFoo 'c' isUpper packages a character with a compatible function. These two things are each of type Foo and can be put in a list.

What can we do with a value of type Foo?. In particular, what happens when we pattern-match on MkFoo?

  f (MkFoo val fn) = ???

Since all we know about val and fn is that they are compatible, the only (useful) thing we can do with them is to apply fn to val to get a boolean. For example:

  f :: Foo -> Bool
  f (MkFoo val fn) = fn val

What this allows us to do is to package heterogeneous values together with a bunch of functions that manipulate them, and then treat that collection of packages in a uniform manner. You can express quite a bit of object-oriented-like programming this way.

7.4.5.1. Why existential?

What has this to do with existential quantification? Simply that MkFoo has the (nearly) isomorphic type

  MkFoo :: (exists a . (a, a -> Bool)) -> Foo

But Haskell programmers can safely think of the ordinary universally quantified type given above, thereby avoiding adding a new existential quantification construct.

7.4.5.2. Existentials and type classes

An easy extension is to allow arbitrary contexts before the constructor. For example:

data Baz = forall a. Eq a => Baz1 a a
         | forall b. Show b => Baz2 b (b -> b)

The two constructors have the types you'd expect:

Baz1 :: forall a. Eq a => a -> a -> Baz
Baz2 :: forall b. Show b => b -> (b -> b) -> Baz

But when pattern matching on Baz1 the matched values can be compared for equality, and when pattern matching on Baz2 the first matched value can be converted to a string (as well as applying the function to it). So this program is legal:

  f :: Baz -> String
  f (Baz1 p q) | p == q    = "Yes"
               | otherwise = "No"
  f (Baz2 v fn)            = show (fn v)

Operationally, in a dictionary-passing implementation, the constructors Baz1 and Baz2 must store the dictionaries for Eq and Show respectively, and extract it on pattern matching.

7.4.5.3. Record Constructors

GHC allows existentials to be used with records syntax as well. For example:

data Counter a = forall self. NewCounter
    { _this    :: self
    , _inc     :: self -> self
    , _display :: self -> IO ()
    , tag      :: a
    }

Here tag is a public field, with a well-typed selector function tag :: Counter a -> a. The self type is hidden from the outside; any attempt to apply _this, _inc or _display as functions will raise a compile-time error. In other words, GHC defines a record selector function only for fields whose type does not mention the existentially-quantified variables. (This example used an underscore in the fields for which record selectors will not be defined, but that is only programming style; GHC ignores them.)

To make use of these hidden fields, we need to create some helper functions:

inc :: Counter a -> Counter a
inc (NewCounter x i d t) = NewCounter
    { _this = i x, _inc = i, _display = d, tag = t }

display :: Counter a -> IO ()
display NewCounter{ _this = x, _display = d } = d x

Now we can define counters with different underlying implementations:

counterA :: Counter String
counterA = NewCounter
    { _this = 0, _inc = (1+), _display = print, tag = "A" }

counterB :: Counter String
counterB = NewCounter
    { _this = "", _inc = ('#':), _display = putStrLn, tag = "B" }

main = do
    display (inc counterA)         -- prints "1"
    display (inc (inc counterB))   -- prints "##"

Record update syntax is supported for existentials (and GADTs):

setTag :: Counter a -> a -> Counter a
setTag obj t = obj{ tag = t }

The rule for record update is this: the types of the updated fields may mention only the universally-quantified type variables of the data constructor. For GADTs, the field may mention only types that appear as a simple type-variable argument in the constructor's result type. For example:

data T a b where { T1 { f1::a, f2::b, f3::(b,c) } :: T a b } -- c is existential
upd1 t x = t { f1=x }   -- OK:   upd1 :: T a b -> a' -> T a' b
upd2 t x = t { f3=x }   -- BAD   (f3's type mentions c, which is
                        --        existentially quantified)

data G a b where { G1 { g1::a, g2::c } :: G a [c] }
upd3 g x = g { g1=x }   -- OK:   upd3 :: G a b -> c -> G c b
upd4 g x = g { g2=x }   -- BAD (f2's type mentions c, which is not a simple
                        --      type-variable argument in G1's result type)

7.4.5.4. Restrictions

There are several restrictions on the ways in which existentially-quantified constructors can be use.

  • When pattern matching, each pattern match introduces a new, distinct, type for each existential type variable. These types cannot be unified with any other type, nor can they escape from the scope of the pattern match. For example, these fragments are incorrect:

    f1 (MkFoo a f) = a
    

    Here, the type bound by MkFoo "escapes", because a is the result of f1. One way to see why this is wrong is to ask what type f1 has:

      f1 :: Foo -> a             -- Weird!
    

    What is this "a" in the result type? Clearly we don't mean this:

      f1 :: forall a. Foo -> a   -- Wrong!
    

    The original program is just plain wrong. Here's another sort of error

      f2 (Baz1 a b) (Baz1 p q) = a==q
    

    It's ok to say a==b or p==q, but a==q is wrong because it equates the two distinct types arising from the two Baz1 constructors.

  • You can't pattern-match on an existentially quantified constructor in a let or where group of bindings. So this is illegal:

      f3 x = a==b where { Baz1 a b = x }
    

    Instead, use a case expression:

      f3 x = case x of Baz1 a b -> a==b
    

    In general, you can only pattern-match on an existentially-quantified constructor in a case expression or in the patterns of a function definition. The reason for this restriction is really an implementation one. Type-checking binding groups is already a nightmare without existentials complicating the picture. Also an existential pattern binding at the top level of a module doesn't make sense, because it's not clear how to prevent the existentially-quantified type "escaping". So for now, there's a simple-to-state restriction. We'll see how annoying it is.

  • You can't use existential quantification for newtype declarations. So this is illegal:

      newtype T = forall a. Ord a => MkT a
    

    Reason: a value of type T must be represented as a pair of a dictionary for Ord t and a value of type t. That contradicts the idea that newtype should have no concrete representation. You can get just the same efficiency and effect by using data instead of newtype. If there is no overloading involved, then there is more of a case for allowing an existentially-quantified newtype, because the data version does carry an implementation cost, but single-field existentially quantified constructors aren't much use. So the simple restriction (no existential stuff on newtype) stands, unless there are convincing reasons to change it.

  • You can't use deriving to define instances of a data type with existentially quantified data constructors. Reason: in most cases it would not make sense. For example:;

    data T = forall a. MkT [a] deriving( Eq )
    

    To derive Eq in the standard way we would need to have equality between the single component of two MkT constructors:

    instance Eq T where
      (MkT a) == (MkT b) = ???
    

    But a and b have distinct types, and so can't be compared. It's just about possible to imagine examples in which the derived instance would make sense, but it seems altogether simpler simply to prohibit such declarations. Define your own instances!

7.4.6. Declaring data types with explicit constructor signatures

When the GADTSyntax extension is enabled, GHC allows you to declare an algebraic data type by giving the type signatures of constructors explicitly. For example:

  data Maybe a where
      Nothing :: Maybe a
      Just    :: a -> Maybe a

The form is called a "GADT-style declaration" because Generalised Algebraic Data Types, described in Section 7.4.7, “Generalised Algebraic Data Types (GADTs)”, can only be declared using this form.

Notice that GADT-style syntax generalises existential types (Section 7.4.5, “Existentially quantified data constructors ”). For example, these two declarations are equivalent:

  data Foo = forall a. MkFoo a (a -> Bool)
  data Foo' where { MKFoo :: a -> (a->Bool) -> Foo' }

Any data type that can be declared in standard Haskell-98 syntax can also be declared using GADT-style syntax. The choice is largely stylistic, but GADT-style declarations differ in one important respect: they treat class constraints on the data constructors differently. Specifically, if the constructor is given a type-class context, that context is made available by pattern matching. For example:

  data Set a where
    MkSet :: Eq a => [a] -> Set a

  makeSet :: Eq a => [a] -> Set a
  makeSet xs = MkSet (nub xs)

  insert :: a -> Set a -> Set a
  insert a (MkSet as) | a `elem` as = MkSet as
                      | otherwise   = MkSet (a:as)

A use of MkSet as a constructor (e.g. in the definition of makeSet) gives rise to a (Eq a) constraint, as you would expect. The new feature is that pattern-matching on MkSet (as in the definition of insert) makes available an (Eq a) context. In implementation terms, the MkSet constructor has a hidden field that stores the (Eq a) dictionary that is passed to MkSet; so when pattern-matching that dictionary becomes available for the right-hand side of the match. In the example, the equality dictionary is used to satisfy the equality constraint generated by the call to elem, so that the type of insert itself has no Eq constraint.

For example, one possible application is to reify dictionaries:

   data NumInst a where
     MkNumInst :: Num a => NumInst a

   intInst :: NumInst Int
   intInst = MkNumInst

   plus :: NumInst a -> a -> a -> a
   plus MkNumInst p q = p + q

Here, a value of type NumInst a is equivalent to an explicit (Num a) dictionary.

All this applies to constructors declared using the syntax of Section 7.4.5.2, “Existentials and type classes”. For example, the NumInst data type above could equivalently be declared like this:

   data NumInst a
      = Num a => MkNumInst (NumInst a)

Notice that, unlike the situation when declaring an existential, there is no forall, because the Num constrains the data type's universally quantified type variable a. A constructor may have both universal and existential type variables: for example, the following two declarations are equivalent:

   data T1 a
	= forall b. (Num a, Eq b) => MkT1 a b
   data T2 a where
	MkT2 :: (Num a, Eq b) => a -> b -> T2 a

All this behaviour contrasts with Haskell 98's peculiar treatment of contexts on a data type declaration (Section 4.2.1 of the Haskell 98 Report). In Haskell 98 the definition

  data Eq a => Set' a = MkSet' [a]

gives MkSet' the same type as MkSet above. But instead of making available an (Eq a) constraint, pattern-matching on MkSet' requires an (Eq a) constraint! GHC faithfully implements this behaviour, odd though it is. But for GADT-style declarations, GHC's behaviour is much more useful, as well as much more intuitive.

The rest of this section gives further details about GADT-style data type declarations.

  • The result type of each data constructor must begin with the type constructor being defined. If the result type of all constructors has the form T a1 ... an, where a1 ... an are distinct type variables, then the data type is ordinary; otherwise is a generalised data type (Section 7.4.7, “Generalised Algebraic Data Types (GADTs)”).

  • As with other type signatures, you can give a single signature for several data constructors. In this example we give a single signature for T1 and T2:

      data T a where
        T1,T2 :: a -> T a
        T3 :: T a
    

  • The type signature of each constructor is independent, and is implicitly universally quantified as usual. In particular, the type variable(s) in the "data T a where" header have no scope, and different constructors may have different universally-quantified type variables:

      data T a where        -- The 'a' has no scope
        T1,T2 :: b -> T b   -- Means forall b. b -> T b
        T3 :: T a           -- Means forall a. T a
    

  • A constructor signature may mention type class constraints, which can differ for different constructors. For example, this is fine:

      data T a where
        T1 :: Eq b => b -> b -> T b
        T2 :: (Show c, Ix c) => c -> [c] -> T c
    

    When pattern matching, these constraints are made available to discharge constraints in the body of the match. For example:

      f :: T a -> String
      f (T1 x y) | x==y      = "yes"
                 | otherwise = "no"
      f (T2 a b)             = show a
    

    Note that f is not overloaded; the Eq constraint arising from the use of == is discharged by the pattern match on T1 and similarly the Show constraint arising from the use of show.

  • Unlike a Haskell-98-style data type declaration, the type variable(s) in the "data Set a where" header have no scope. Indeed, one can write a kind signature instead:

      data Set :: * -> * where ...
    

    or even a mixture of the two:

      data Bar a :: (* -> *) -> * where ...
    

    The type variables (if given) may be explicitly kinded, so we could also write the header for Foo like this:

      data Bar a (b :: * -> *) where ...
    

  • You can use strictness annotations, in the obvious places in the constructor type:

      data Term a where
          Lit    :: !Int -> Term Int
          If     :: Term Bool -> !(Term a) -> !(Term a) -> Term a
          Pair   :: Term a -> Term b -> Term (a,b)
    

  • You can use a deriving clause on a GADT-style data type declaration. For example, these two declarations are equivalent

      data Maybe1 a where {
          Nothing1 :: Maybe1 a ;
          Just1    :: a -> Maybe1 a
        } deriving( Eq, Ord )
    
      data Maybe2 a = Nothing2 | Just2 a
           deriving( Eq, Ord )
    

  • The type signature may have quantified type variables that do not appear in the result type:

      data Foo where
         MkFoo :: a -> (a->Bool) -> Foo
         Nil   :: Foo
    

    Here the type variable a does not appear in the result type of either constructor. Although it is universally quantified in the type of the constructor, such a type variable is often called "existential". Indeed, the above declaration declares precisely the same type as the data Foo in Section 7.4.5, “Existentially quantified data constructors ”.

    The type may contain a class context too, of course:

      data Showable where
        MkShowable :: Show a => a -> Showable
    

  • You can use record syntax on a GADT-style data type declaration:

      data Person where
          Adult :: { name :: String, children :: [Person] } -> Person
          Child :: Show a => { name :: !String, funny :: a } -> Person
    

    As usual, for every constructor that has a field f, the type of field f must be the same (modulo alpha conversion). The Child constructor above shows that the signature may have a context, existentially-quantified variables, and strictness annotations, just as in the non-record case. (NB: the "type" that follows the double-colon is not really a type, because of the record syntax and strictness annotations. A "type" of this form can appear only in a constructor signature.)

  • Record updates are allowed with GADT-style declarations, only fields that have the following property: the type of the field mentions no existential type variables.

  • As in the case of existentials declared using the Haskell-98-like record syntax (Section 7.4.5.3, “Record Constructors”), record-selector functions are generated only for those fields that have well-typed selectors. Here is the example of that section, in GADT-style syntax:

    data Counter a where
        NewCounter { _this    :: self
                   , _inc     :: self -> self
                   , _display :: self -> IO ()
                   , tag      :: a
                   }
            :: Counter a
    

    As before, only one selector function is generated here, that for tag. Nevertheless, you can still use all the field names in pattern matching and record construction.

7.4.7. Generalised Algebraic Data Types (GADTs)

Generalised Algebraic Data Types generalise ordinary algebraic data types by allowing constructors to have richer return types. Here is an example:

  data Term a where
      Lit    :: Int -> Term Int
      Succ   :: Term Int -> Term Int
      IsZero :: Term Int -> Term Bool
      If     :: Term Bool -> Term a -> Term a -> Term a
      Pair   :: Term a -> Term b -> Term (a,b)

Notice that the return type of the constructors is not always Term a, as is the case with ordinary data types. This generality allows us to write a well-typed eval function for these Terms:

  eval :: Term a -> a
  eval (Lit i) 	    = i
  eval (Succ t)     = 1 + eval t
  eval (IsZero t)   = eval t == 0
  eval (If b e1 e2) = if eval b then eval e1 else eval e2
  eval (Pair e1 e2) = (eval e1, eval e2)

The key point about GADTs is that pattern matching causes type refinement. For example, in the right hand side of the equation

  eval :: Term a -> a
  eval (Lit i) =  ...

the type a is refined to Int. That's the whole point! A precise specification of the type rules is beyond what this user manual aspires to, but the design closely follows that described in the paper Simple unification-based type inference for GADTs, (ICFP 2006). The general principle is this: type refinement is only carried out based on user-supplied type annotations. So if no type signature is supplied for eval, no type refinement happens, and lots of obscure error messages will occur. However, the refinement is quite general. For example, if we had:

  eval :: Term a -> a -> a
  eval (Lit i) j =  i+j

the pattern match causes the type a to be refined to Int (because of the type of the constructor Lit), and that refinement also applies to the type of j, and the result type of the case expression. Hence the addition i+j is legal.

These and many other examples are given in papers by Hongwei Xi, and Tim Sheard. There is a longer introduction on the wiki, and Ralf Hinze's Fun with phantom types also has a number of examples. Note that papers may use different notation to that implemented in GHC.

The rest of this section outlines the extensions to GHC that support GADTs. The extension is enabled with -XGADTs. The -XGADTs flag also sets -XRelaxedPolyRec.

  • A GADT can only be declared using GADT-style syntax (Section 7.4.6, “Declaring data types with explicit constructor signatures”); the old Haskell-98 syntax for data declarations always declares an ordinary data type. The result type of each constructor must begin with the type constructor being defined, but for a GADT the arguments to the type constructor can be arbitrary monotypes. For example, in the Term data type above, the type of each constructor must end with Term ty, but the ty need not be a type variable (e.g. the Lit constructor).

  • It is permitted to declare an ordinary algebraic data type using GADT-style syntax. What makes a GADT into a GADT is not the syntax, but rather the presence of data constructors whose result type is not just T a b.

  • You cannot use a deriving clause for a GADT; only for an ordinary data type.

  • As mentioned in Section 7.4.6, “Declaring data types with explicit constructor signatures”, record syntax is supported. For example:

      data Term a where
          Lit    { val  :: Int }      :: Term Int
          Succ   { num  :: Term Int } :: Term Int
          Pred   { num  :: Term Int } :: Term Int
          IsZero { arg  :: Term Int } :: Term Bool
          Pair   { arg1 :: Term a
                 , arg2 :: Term b
                 }                    :: Term (a,b)
          If     { cnd  :: Term Bool
                 , tru  :: Term a
                 , fls  :: Term a
                 }                    :: Term a
    

    However, for GADTs there is the following additional constraint: every constructor that has a field f must have the same result type (modulo alpha conversion) Hence, in the above example, we cannot merge the num and arg fields above into a single name. Although their field types are both Term Int, their selector functions actually have different types:

      num :: Term Int -> Term Int
      arg :: Term Bool -> Term Int
    

  • When pattern-matching against data constructors drawn from a GADT, for example in a case expression, the following rules apply:

    • The type of the scrutinee must be rigid.

    • The type of the entire case expression must be rigid.

    • The type of any free variable mentioned in any of the case alternatives must be rigid.

    A type is "rigid" if it is completely known to the compiler at its binding site. The easiest way to ensure that a variable a rigid type is to give it a type signature. For more precise details see Simple unification-based type inference for GADTs . The criteria implemented by GHC are given in the Appendix.