*Indexed type families* are a new GHC extension to
facilitate type-level
programming. Type families are a generalisation of *associated
data types*
(“Associated
Types with Class”, M. Chakravarty, G. Keller, S. Peyton Jones,
and S. Marlow. In Proceedings of “The 32nd Annual ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages (POPL'05)”, pages
1-13, ACM Press, 2005) and *associated type synonyms*
(“Type
Associated Type Synonyms”. M. Chakravarty, G. Keller, and
S. Peyton Jones.
In Proceedings of “The Tenth ACM SIGPLAN International Conference on
Functional Programming”, ACM Press, pages 241-253, 2005). Type families
themselves are described in the paper “Type
Checking with Open Type Functions”, T. Schrijvers,
S. Peyton-Jones,
M. Chakravarty, and M. Sulzmann, in Proceedings of “ICFP 2008: The
13th ACM SIGPLAN International Conference on Functional
Programming”, ACM Press, pages 51-62, 2008. Type families
essentially provide type-indexed data types and named functions on types,
which are useful for generic programming and highly parameterised library
interfaces as well as interfaces with enhanced static information, much like
dependent types. They might also be regarded as an alternative to functional
dependencies, but provide a more functional style of type-level programming
than the relational style of functional dependencies.

Indexed type families, or type families for short, are type constructors that
represent sets of types. Set members are denoted by supplying the type family
constructor with type parameters, which are called *type
indices*. The
difference between vanilla parametrised type constructors and family
constructors is much like between parametrically polymorphic functions and
(ad-hoc polymorphic) methods of type classes. Parametric polymorphic functions
behave the same at all type instances, whereas class methods can change their
behaviour in dependence on the class type parameters. Similarly, vanilla type
constructors imply the same data representation for all type instances, but
family constructors can have varying representation types for varying type
indices.

Indexed type families come in two flavours: *data
families* and *type synonym
families*. They are the indexed family variants of algebraic
data types and type synonyms, respectively. The instances of data families
can be data types and newtypes.

Type families are enabled by the flag `-XTypeFamilies`

.
Additional information on the use of type families in GHC is available on
the
Haskell wiki page on type families.

Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the type-indexes to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.

Indexed data families are introduced by a signature, such as

data family GMap k :: * -> *

The special `family`

distinguishes family from standard
data declarations. The result kind annotation is optional and, as
usual, defaults to `*`

if omitted. An example is

data family Array e

Named arguments can also be given explicit kind signatures if needed.
Just as with
[http://www.haskell.org/ghc/docs/latest/html/users_guide/gadt.html GADT
declarations] named arguments are entirely optional, so that we can
declare `Array`

alternatively with

data family Array :: * -> *

When a data family is declared as part of a type class, we drop
the `family`

special. The `GMap`

declaration takes the following form

class GMapKey k where data GMap k :: * -> * ...

In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as type-indexes. Moreover, the argument names must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. Hence, the following contrived example is admissible:

class C a b c where data T c a :: *

Instance declarations of data and newtype families are very similar to
standard data and newtype declarations. The only two differences are
that the keyword `data`

or `newtype`

is followed by `instance`

and that some or all of the
type arguments can be non-variable types, but may not contain forall
types or type synonym families. However, data families are generally
allowed in type parameters, and type synonyms are allowed as long as
they are fully applied and expand to a type that is itself admissible -
exactly as this is required for occurrences of type synonyms in class
instance parameters. For example, the `Either`

instance for `GMap`

is

data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)

In this example, the declaration has only one variant. In general, it can be any number.

Data and newtype instance declarations are only legit when an
appropriate family declaration is in scope - just like class instances
require the class declaration to be visible. Moreover, each instance
declaration has to conform to the kind determined by its family
declaration. This implies that the number of parameters of an instance
declaration matches the arity determined by the kind of the family.
Although, all data families are declared with
the `data`

keyword, instances can be
either `data`

or `newtype`

s, or a mix
of both.

Even if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible:

data family T a data instance T Int = A data instance T Char = B nonsence :: T a -> Int nonsence A = 1 -- WRONG: These two equations together... nonsence B = 2 -- ...will produce a type error.

Given the functionality provided by GADTs (Generalised Algebraic Data
Types), it might seem as if a definition, such as the above, should be
feasible. However, type families are - in contrast to GADTs - are
*open;* i.e., new instances can always be added,
possibly in other
modules. Supporting pattern matching across different data instances
would require a form of extensible case construct.

When an associated data family instance is declared within a type
class instance, we drop the `instance`

keyword in the
family instance. So, the `Either`

instance
for `GMap`

becomes:

instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v) ...

The most important point about associated family instances is that the
type indexes corresponding to class parameters must be identical to
the type given in the instance head; here this is the first argument
of `GMap`

, namely `Either a b`

,
which coincides with the only class parameter. Any parameters to the
family constructor that do not correspond to class parameters, need to
be variables in every instance; here this is the
variable `v`

.

Instances for an associated family can only appear as part of
instances declarations of the class in which the family was declared -
just as with the equations of the methods of a class. Also in
correspondence to how methods are handled, declarations of associated
types can be omitted in class instances. If an associated family
instance is omitted, the corresponding instance type is not inhabited;
i.e., only diverging expressions, such
as `undefined`

, can assume the type.

In the case of multi-parameter type classes, the visibility of class
parameters in the right-hand side of associated family instances
depends *solely* on the parameters of the data
family. As an example, consider the simple class declaration

class C a b where data T a

Only one of the two class parameters is a parameter to the data family. Hence, the following instance declaration is invalid:

instance C [c] d where data T [c] = MkT (c, d) -- WRONG!! 'd' is not in scope

Here, the right-hand side of the data instance mentions the type
variable `d`

that does not occur in its left-hand
side. We cannot admit such data instances as they would compromise
type safety.

Type class instances of instances of data families can be defined as
usual, and in particular data instance declarations can
have `deriving`

clauses. For example, we can write

data GMap () v = GMapUnit (Maybe v) deriving Show

which implicitly defines an instance of the form

instance Show v => Show (GMap () v) where ...

Note that class instances are always for
particular *instances* of a data family and never
for an entire family as a whole. This is for essentially the same
reasons that we cannot define a toplevel function that performs
pattern matching on the data constructors
of *different* instances of a single type family.
It would require a form of extensible case construct.

The association of data constructors with type families is more dynamic
than that is the case with standard data and newtype declarations. In
the standard case, the notation `T(..)`

in an import or
export list denotes the type constructor and all the data constructors
introduced in its declaration. However, a family declaration never
introduces any data constructors; instead, data constructors are
introduced by family instances. As a result, which data constructors
are associated with a type family depends on the currently visible
instance declarations for that family. Consequently, an import or
export item of the form `T(..)`

denotes the family
constructor and all currently visible data constructors - in the case of
an export item, these may be either imported or defined in the current
module. The treatment of import and export items that explicitly list
data constructors, such as `GMap(GMapEither)`

, is
analogous.

As expected, an import or export item of the
form `C(..)`

denotes all of the class' methods and
associated types. However, when associated types are explicitly
listed as subitems of a class, we need some new syntax, as uppercase
identifiers as subitems are usually data constructors, not type
constructors. To clarify that we denote types here, each associated
type name needs to be prefixed by the keyword `type`

.
So for example, when explicitly listing the components of
the `GMapKey`

class, we write ```
GMapKey(type
GMap, empty, lookup, insert)
```

.

Assuming our running `GMapKey`

class example, let us
look at some export lists and their meaning:

`module GMap (GMapKey) where...`

: Exports just the class name.`module GMap (GMapKey(..)) where...`

: Exports the class, the associated type`GMap`

and the member functions`empty`

,`lookup`

, and`insert`

. None of the data constructors is exported.`module GMap (GMapKey(..), GMap(..)) where...`

: As before, but also exports all the data constructors`GMapInt`

,`GMapChar`

,`GMapUnit`

,`GMapPair`

, and`GMapUnit`

.`module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...`

: As before.`module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...`

: As before.

Finally, you can write `GMapKey(type GMap)`

to denote
both the class `GMapKey`

as well as its associated
type `GMap`

. However, you cannot
write `GMapKey(type GMap(..))`

— i.e.,
sub-component specifications cannot be nested. To
specify `GMap`

's data constructors, you have to list
it separately.

Type families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated type synonyms). The former is the more general variant, as it lacks the requirement for the type-indexes to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.

Indexed type families are introduced by a signature, such as

type family Elem c :: *

The special `family`

distinguishes family from standard
type declarations. The result kind annotation is optional and, as
usual, defaults to `*`

if omitted. An example is

type family Elem c

Parameters can also be given explicit kind signatures if needed. We call the number of parameters in a type family declaration, the family's arity, and all applications of a type family must be fully saturated w.r.t. to that arity. This requirement is unlike ordinary type synonyms and it implies that the kind of a type family is not sufficient to determine a family's arity, and hence in general, also insufficient to determine whether a type family application is well formed. As an example, consider the following declaration:

type family F a b :: * -> * -- F's arity is 2, -- although it's overall kind is * -> * -> * -> *

Given this declaration the following are examples of well-formed and malformed types:

F Char [Int] -- OK! Kind: * -> * F Char [Int] Bool -- OK! Kind: * F IO Bool -- WRONG: kind mismatch in the first argument F Bool -- WRONG: unsaturated application

When a type family is declared as part of a type class, we drop
the `family`

special. The `Elem`

declaration takes the following form

class Collects ce where type Elem ce :: * ...

The argument names of the type family must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. Hence, the following contrived example is admissible:

class C a b c where type T c a :: *

These rules are exactly as for associated data families.

Instance declarations of type families are very similar to standard type
synonym declarations. The only two differences are that the
keyword `type`

is followed
by `instance`

and that some or all of the type
arguments can be non-variable types, but may not contain forall types or
type synonym families. However, data families are generally allowed, and
type synonyms are allowed as long as they are fully applied and expand
to a type that is admissible - these are the exact same requirements as
for data instances. For example, the `[e]`

instance
for `Elem`

is

type instance Elem [e] = e

Type family instance declarations are only legitimate when an appropriate family declaration is in scope - just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration, and the number of type parameters in an instance declaration must match the number of type parameters in the family declaration. Finally, the right-hand side of a type instance must be a monotype (i.e., it may not include foralls) and after the expansion of all saturated vanilla type synonyms, no synonyms, except family synonyms may remain. Here are some examples of admissible and illegal type instances:

type family F a :: * type instance F [Int] = Int -- OK! type instance F String = Char -- OK! type instance F (F a) = a -- WRONG: type parameter mentions a type family type instance F (forall a. (a, b)) = b -- WRONG: a forall type appears in a type parameter type instance F Float = forall a.a -- WRONG: right-hand side may not be a forall type type family G a b :: * -> * type instance G Int = (,) -- WRONG: must be two type parameters type instance G Int Char Float = Double -- WRONG: must be two type parameters

When an associated family instance is declared within a type class
instance, we drop the `instance`

keyword in the family
instance. So, the `[e]`

instance
for `Elem`

becomes:

instance (Eq (Elem [e])) => Collects ([e]) where type Elem [e] = e ...

The most important point about associated family instances is that the
type indexes corresponding to class parameters must be identical to the
type given in the instance head; here this is `[e]`

,
which coincides with the only class parameter.

Instances for an associated family can only appear as part of instances
declarations of the class in which the family was declared - just as
with the equations of the methods of a class. Also in correspondence to
how methods are handled, declarations of associated types can be omitted
in class instances. If an associated family instance is omitted, the
corresponding instance type is not inhabited; i.e., only diverging
expressions, such as `undefined`

, can assume the type.

The instance declarations of a type family used in a single program may only overlap if the right-hand sides of the overlapping instances coincide for the overlapping types. More formally, two instance declarations overlap if there is a substitution that makes the left-hand sides of the instances syntactically the same. Whenever that is the case, the right-hand sides of the instances must also be syntactically equal under the same substitution. This condition is independent of whether the type family is associated or not, and it is not only a matter of consistency, but one of type safety.

Here are two example to illustrate the condition under which overlap is permitted.

type instance F (a, Int) = [a] type instance F (Int, b) = [b] -- overlap permitted type instance G (a, Int) = [a] type instance G (Char, a) = [a] -- ILLEGAL overlap, as [Char] /= [Int]

In order to guarantee that type inference in the presence of type families decidable, we need to place a number of additional restrictions on the formation of type instance declarations (c.f., Definition 5 (Relaxed Conditions) of “Type Checking with Open Type Functions”). Instance declarations have the general form

type instance F t1 .. tn = t

where we require that for every type family application ```
(G s1
.. sm)
```

in `t`

,

`s1 .. sm`

do not contain any type family constructors,the total number of symbols (data type constructors and type variables) in

`s1 .. sm`

is strictly smaller than in`t1 .. tn`

, andfor every type variable

`a`

,`a`

occurs in`s1 .. sm`

at most as often as in`t1 .. tn`

.

These restrictions are easily verified and ensure termination of type
inference. However, they are not sufficient to guarantee completeness
of type inference in the presence of, so called, ''loopy equalities'',
such as `a ~ [F a]`

, where a recursive occurrence of
a type variable is underneath a family application and data
constructor application - see the above mentioned paper for details.

If the option `-XUndecidableInstances`

is passed to the
compiler, the above restrictions are not enforced and it is on the
programmer to ensure termination of the normalisation of type families
during type inference.

Type context can include equality constraints of the form ```
t1 ~
t2
```

, which denote that the types `t1`

and `t2`

need to be the same. In the presence of type
families, whether two types are equal cannot generally be decided
locally. Hence, the contexts of function signatures may include
equality constraints, as in the following example:

sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2

where we require that the element type of `c1`

and `c2`

are the same. In general, the
types `t1`

and `t2`

of an equality
constraint may be arbitrary monotypes; i.e., they may not contain any
quantifiers, independent of whether higher-rank types are otherwise
enabled.

Equality constraints can also appear in class and instance contexts. The former enable a simple translation of programs using functional dependencies into programs using family synonyms instead. The general idea is to rewrite a class declaration of the form

class C a b | a -> b

to

class (F a ~ b) => C a b where type F a

That is, we represent every functional dependency (FD) ```
a1 .. an
-> b
```

by an FD type family `F a1 .. an`

and a
superclass context equality `F a1 .. an ~ b`

,
essentially giving a name to the functional dependency. In class
instances, we define the type instances of FD families in accordance
with the class head. Method signatures are not affected by that
process.

NB: Equalities in superclass contexts are not fully implemented in GHC 6.10.