6.19.3. Generic programming

There are a few ways to do datatype-generic programming using the GHC.Generics framework. One is making use of the Generically and Generically1 wrappers from GHC.Generics, instances can be derived via them using DerivingVia:

{-# LANGUAGE DeriveGeneric      #-}
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE DerivingVia        #-}

import GHC.Generics

data V4 a = V4 a a a a
  deriving
  stock (Generic, Generic1)

  deriving (Semigroup, Monoid)
  via Generically (V4 a)

  deriving (Functor, Applicative)
  via Generically1 V4

The older approach uses DeriveGeneric, DefaultSignatures, and DeriveAnyClass. It derives instances by providing a distinguished generic implementation as part of the type class declaration. This section gives a very brief overview of how to do it.

Generic programming support in GHC allows defining classes with methods that do not need a user specification when instantiating: the method body is automatically derived by GHC. This is similar to what happens for standard classes such as Read and Show, for instance, but now for user-defined classes.

Note

GHC used to have an implementation of generic classes as defined in the paper “Derivable type classes”, Ralf Hinze and Simon Peyton Jones, Haskell Workshop, Montreal Sept 2000, pp. 94-105. These have been removed and replaced by the more general support for generic programming.

6.19.3.1. Deriving representations

The first thing we need is generic representations. The GHC.Generics module defines a couple of primitive types that are used to represent Haskell datatypes:

-- | Unit: used for constructors without arguments
data U1 p = U1

-- | Constants, additional parameters and recursion of kind Type
newtype K1 i c p = K1 { unK1 :: c }

-- | Meta-information (constructor names, etc.)
newtype M1 i c f p = M1 { unM1 :: f p }

-- | Sums: encode choice between constructors
infixr 5 :+:
data (:+:) f g p = L1 (f p) | R1 (g p)

-- | Products: encode multiple arguments to constructors
infixr 6 :*:
data (:*:) f g p = f p :*: g p

The Generic and Generic1 classes mediate between user-defined datatypes and their internal representation as a sum-of-products:

class Generic a where
  -- Encode the representation of a user datatype
  type Rep a :: Type -> Type
  -- Convert from the datatype to its representation
  from  :: a -> (Rep a) x
  -- Convert from the representation to the datatype
  to    :: (Rep a) x -> a

class Generic1 (f :: k -> Type) where
  type Rep1 f :: k -> Type

  from1  :: f a -> Rep1 f a
  to1    :: Rep1 f a -> f a

Generic1 is used for functions that can only be defined over type containers, such as map. Note that Generic1 ranges over types of kind Type -> Type by default, but if the PolyKinds extension is enabled, then it can range of types of kind k -> Type, for any kind k.

DeriveGeneric
Since:7.2.1
Status:Included in GHC2024, GHC2021

Allow automatic deriving of instances for the Generic typeclass.

Instances of these classes can be derived by GHC with the DeriveGeneric extension, and are necessary to be able to define generic instances automatically.

For example, a user-defined datatype of trees

data UserTree a = Node a (UserTree a) (UserTree a) | Leaf

in a Main module in a package named foo will get the following representation:

instance Generic (UserTree a) where
  -- Representation type
  type Rep (UserTree a) =
    M1 D ('MetaData "UserTree" "Main" "package-name" 'False) (
          M1 C ('MetaCons "Node" 'PrefixI 'False) (
                M1 S ('MetaSel 'Nothing
                               'NoSourceUnpackedness
                               'NoSourceStrictness
                               'DecidedLazy)
                     (K1 R a)
            :*: M1 S ('MetaSel 'Nothing
                               'NoSourceUnpackedness
                               'NoSourceStrictness
                               'DecidedLazy)
                     (K1 R (UserTree a))
            :*: M1 S ('MetaSel 'Nothing
                               'NoSourceUnpackedness
                               'NoSourceStrictness
                               'DecidedLazy)
                     (K1 R (UserTree a)))
      :+: M1 C ('MetaCons "Leaf" 'PrefixI 'False) U1)

  -- Conversion functions
  from (Node x l r) = M1 (L1 (M1 (M1 (K1 x) :*: M1 (K1 l) :*: M1 (K1 r))))
  from Leaf         = M1 (R1 (M1 U1))
  to (M1 (L1 (M1 (M1 (K1 x) :*: M1 (K1 l) :*: M1 (K1 r))))) = Node x l r
  to (M1 (R1 (M1 U1)))                                      = Leaf

This representation is generated automatically if a deriving Generic clause is attached to the datatype. Standalone deriving can also be used.

6.19.3.2. Writing generic functions

A generic function is defined by creating a class and giving instances for each of the representation types of GHC.Generics. As an example we show generic serialization:

data Bin = O | I

class GSerialize f where
  gput :: f a -> [Bin]

instance GSerialize U1 where
  gput U1 = []

instance (GSerialize a, GSerialize b) => GSerialize (a :*: b) where
  gput (x :*: y) = gput x ++ gput y

instance (GSerialize a, GSerialize b) => GSerialize (a :+: b) where
  gput (L1 x) = O : gput x
  gput (R1 x) = I : gput x

instance (GSerialize a) => GSerialize (M1 i c a) where
  gput (M1 x) = gput x

instance (Serialize a) => GSerialize (K1 i a) where
  gput (K1 x) = put x

A caveat: this encoding strategy may not be reliable across different versions of GHC. When deriving a Generic instance is free to choose any nesting of :+: and :*: it chooses, so if GHC chooses (a :+: b) :+: c, then the encoding for a would be [O, O], b would be [O, I], and c would be [I]. However, if GHC chooses a :+: (b :+: c), then the encoding for a would be [O], b would be [I, O], and c would be [I, I]. (In practice, the current implementation tries to produce a more-or-less balanced nesting of :+: and :*: so that the traversal of the structure of the datatype from the root to a particular component can be performed in logarithmic rather than linear time.)

Typically this GSerialize class will not be exported, as it only makes sense to have instances for the representation types.

6.19.3.3. Unlifted representation types

The data family URec is provided to enable generic programming over datatypes with certain unlifted arguments. There are six instances corresponding to common unlifted types:

data family URec a p

data instance URec (Ptr ()) p = UAddr   { uAddr#   :: Addr#   }
data instance URec Char     p = UChar   { uChar#   :: Char#   }
data instance URec Double   p = UDouble { uDouble# :: Double# }
data instance URec Int      p = UInt    { uInt#    :: Int#    }
data instance URec Float    p = UFloat  { uFloat#  :: Float#  }
data instance URec Word     p = UWord   { uWord#   :: Word#   }

Six type synonyms are provided for convenience:

type UAddr   = URec (Ptr ())
type UChar   = URec Char
type UDouble = URec Double
type UFloat  = URec Float
type UInt    = URec Int
type UWord   = URec Word

As an example, this data declaration:

data IntHash = IntHash Int#
  deriving Generic

results in the following Generic instance:

instance 'Generic' IntHash where
  type 'Rep' IntHash =
    'D1' ('MetaData "IntHash" "Main" "package-name" 'False)
      ('C1' ('MetaCons "IntHash" 'PrefixI 'False)
        ('S1' ('MetaSel 'Nothing
                        'NoSourceUnpackedness
                        'NoSourceStrictness
                        'DecidedLazy)
              'UInt'))

A user could provide, for example, a GSerialize UInt instance so that a Serialize IntHash instance could be easily defined in terms of GSerialize.

6.19.3.4. Generic defaults

The only thing left to do now is to define a “front-end” class, which is exposed to the user:

class Serialize a where
  put :: a -> [Bin]

  default put :: (Generic a, GSerialize (Rep a)) => a -> [Bin]
  put = gput . from

Here we use a default signature to specify that the user does not have to provide an implementation for put, as long as there is a Generic instance for the type to instantiate. For the UserTree type, for instance, the user can just write:

instance (Serialize a) => Serialize (UserTree a)

The default method for put is then used, corresponding to the generic implementation of serialization. If you are using DeriveAnyClass, the same instance is generated by simply attaching a deriving Serialize clause to the UserTree datatype declaration. For more examples of generic functions please refer to the generic-deriving package on Hackage.

6.19.3.5. More information

For more details please refer to the Haskell Wiki page or the original paper [Generics2010].

[Generics2010]Jose Pedro Magalhaes, Atze Dijkstra, Johan Jeuring, and Andres Loeh. A generic deriving mechanism for Haskell. Proceedings of the third ACM Haskell symposium on Haskell (Haskell‘2010), pp. 37-48, ACM, 2010.