Chapter 3 Conventions

In this section, I describe the general conventions I have followed in designing Edison. I include only those conventions that affect users of the library. Conventions that affect developers are described in the separate Edison Developer's Guide.

3.1 Use of modules

Each data structure is implemented as a Haskell module. These modules should always be imported qualified to prevent a flood of name clashes (see Section 3.2). I recommend renaming each imported module (using the as keyword) both to reduce the overhead of qualified names and to make substituting one module for another as painless as possible. The one Edison module that is typically imported unqualified is the EdisonPrelude.

Example use of an Edison data structure:

  module Foo where

  import EdisonPrelude
  import qualified SimpleQueue as Q

  data Tree a = Empty | Node a (Tree a) (Tree a)

  breadthFirst :: Tree a -> [a]
  breadthFirst t = bfs (Q.single t)
    where bfs q = 
            case Q.lview q of
              Just2 (Node x l r) q' -> x : bfs (Q.snoc (Q.snoc q' l) r)
              Just2 Empty q' -> bfs q'
              Nothing2 -> []

3.2 Reuse of names

I have attempted to choose names that are as standard as possible. This means that operations for different abstractions frequently share the same name (empty, null, size, etc.). It also means that in many cases I have reused names from the Prelude. However, these name clashes should not be a problem because I expect Edison modules to be imported qualified (see Section 3.1). If, for some reason, you choose to import Edison modules unqualified, you will usually need to import the Prelude hiding the relevant names.

Edison modules also frequently share type names. For example, every sequence type constructor is named Seq unless there is a good reason otherwise. This makes substituting one module for another fairly painless, especially when imported modules are renamed as in the previous section.

An example of ``a good reason otherwise'' for not using the standard type name is when the type constructor has the wrong kind. This usually happens when one data structure is parameterized by another. For example, an implementation of sequences that is parameterized on another implementation of sequences might be given as

  data HigherOrderSeq seq a = ...
  instance Sequence seq => Sequence (HigherOrderSeq seq) where ...

However, even modules such as these will typically define Seq as a type synonym for some good default choice, e.g.,

  type Seq = HigherOrderSeq BankersQueue.Seq

3.3 Use and non-use of classes

Each family of abstractions is defined as a set of classes---a main class that every implementation of that abstraction should support and several auxiliary subclasses that an implementation may or may not support.

However, not all applications need the power of classes, so each method is also accessible directly from the implementation module. For example, an implementation of sequences will typically define and export a type Seq a together with all the relevant functions on that type (using the same names as the corresponding methods!), and, in addition, declare Seq to be an instance of the Sequence class.

Thus, you can choose to use overloading or not, as appropriate for your particular application. For example, in the following module,

  module Foo where
  import qualified Sequence as S
  import qualified HoodMelvilleQueue as Q
  ...

I could refer to the empty queue of type Q.Seq a as either S.empty or Q.empty. The former refers to the empty method of the Sequence class (which might be resolved using a type signature to be of type Q.Seq a) and the latter refers directly to the empty value from Q, without going through the class mechanism.

Note that this example is somewhat unrealistic. In practice, you would very rarely need to import both modules in their entirety. Usually, you would either import only the implementation module

  import qualified HoodMelvilleQueue as Q

or import the class module together with the type from the implementation module

  import qualified Sequence as S
  import HoodMelvilleQueue (Seq)

Note that in the last line, I imported HoodMelvilleQueue unqualified. Normally, this would produce a flood of name clashes, but it is acceptable when you are selectively importing only a type name or two.

3.4 Bonus Operations

Some implementations export a few extra operations beyond those included in the relevant classes. These are typically operations that are particularly efficient for that implementation, but which are not general enough to warrant inclusion in a class. An example of this is the function unsafeMapMonotonic that is supported by many priority queues.

3.5 Fixity

Since qualified infix symbols are fairly ugly, I avoid infix symbols as much as possible. For example, I call the sequence catenation function append instead of ++.

3.6 Error handling

Since Haskell has no good way to recover from errors, I avoid signalling errors if there is any reasonable alternative. For many functions, it is easy to avoid this by returning the Maybe type (or something similar), but sometimes, as with the head function on lists and the corresponding lhead function on sequences, this approach is just too painful. For lhead of an empty sequence, there really is no choice but to signal an error, but other times there is a reasonable alternative. For example, I define both ltail of the empty sequence and take of a negative argument to return the empty sequence even though the corresponding Prelude functions would signal errors in both of these cases.

3.7 (Lack of) Portability

Edison does not conform to either Haskell 1.4 or Haskell98. It will hopefully conform to Haskell 2. For now, it is guaranteed to run only under GHC.

There are three non-standard language features that I use. Of these, only the first is critical. The other two affect some fraction of the individual modules, and could be eliminated fairly easily.

Multi-parameter type classes. Fortunately, this is also the least controversial. Some form of multi-parameter type classes seems certain to make it into Haskell 2, and I expect it to be easy to adapt Edison's signatures accordingly.
Unboxed integers. I frequently use unboxed integers when, for example, every node in a tree needs to maintain some sort of size field. I could simply declare these fields to be strict, but using unboxed integers in these situations is significantly faster.
Pattern guards. I occasionally use pattern guards. Getting by without them would not be difficult, but, since pattern guards also seem likely to make it into Haskell 2, I have not bothered.

If your compiler does not support unboxed integers or pattern guards, you will still be able to use most of Edison---just not those individual data structures that use these features. If your compiler does not support multi-parameter type classes, then you are in much bigger trouble---you will only be able to use the sequence data structures, not collections or associative collections.

3.8 Unsafe operations

Consider converting a list of elements into a binary search tree. This can be implemented particularly efficiently if we know that the list is already sorted. And in fact, it often is already sorted in practice. This special case, and a handful of similar cases, are common enough and important enough to deserve their own functions. These are functions with non-trivial preconditions that are too expensive to check at runtime, so we simply make their behavior undefined if the preconditions are not satisfied. Violating the preconditions may break the implementations in horrible ways, so we adopt a special naming convention to emphasize that these operations are unsafe---almost all such functions are given names beginning with the word unsafe (unsafeFromOrdList, unsafeInsertMin, etc.).

The one place where I have violated this convention is in the Set class, where there is a whole family of operations with names like insertWith and unionWith. These functions take a combining function that is used to resolve collisions. For example, when inserting an element into a set that already contains that element, the combining function is called on the new and old elements to determine which element will remain in the new set.¹ The combining functions typically return one element or the other, but they can also combine the elements in non-trivial ways. These combining functions are required to satisfy the precondition that, given two equal elements, they return a third element that is equal to the other two.

3.9 Currying

All types in Edison are fully curried.

3.10 Order of arguments

Whenever a function takes multiple arguments, there is a choice as to the order of the arguments. I have tried to make these choices according to the following rules, in decreasing order of importance:

Favor an order that is more useful for partial applications. For example, the member function
```
  member :: CollX c a => c a -> a -> Bool
```
takes the collection first and the element second, rather than the other way around, because it is much more commonly partially applied to a collection than to an element.
Favor an order with significant mnemonic value. For example, the cons and snoc functions
```
  cons :: Sequence s => a -> s a -> s a
  snoc :: Sequence s => s a -> a -> s a
```
take their arguments in opposite orders because cons adds an element to the left of a sequence and snoc adds an element to the right of a sequence.
Functions that modify a collection should take the collection last. For example, the insert function has type
```
  insert :: CollX c a => a -> c a -> c a
```
Taking the collection last supports a convenient style of stringing several update operations in a row using the $ combinator, as in
```
  insert 1 $ insert 2 $ insert 3 ns
```
rather than
```
  insert (insert (insert ns 3) 2) 1
```
Consistency with similar operations.
Personal taste.

3.11 Completeness

In designing a library, there is always a delicate question of how much to put in, and how much to leave out. Including too much can lead to code bloat, and make the library somewhat harder to learn. Including too little can make the library significantly less useful. I have tried to err on the side of including too much, rather than too little.

Note that this can have an affect on efficiency. Because the classes in Edison have many methods, the dictionaries for the classes will be large, and so building these dictionaries dynamically will be expensive. Fortunately, most dictionaries can be built statically. The exceptions often involve things like non-regular datatypes and polymorphic recursion. This is not to say don't use these features; just don't mix them with overloading and expect the result to be efficient.

3.12 (In)Efficiency

In text books on data structures and algorithms, data structures are often organized into hierarchies according to efficiency. For example, a hierarchy of sequences might include such entries as

queues support efficicent insertions at the rear and deletions from the front
steques (stack-ended queues, also known as output-restricted deques) are queues that additionally support efficient insertions at the front
deques (double-ended queues) are steques that additionally support efficient deletions from the back

Unfortunately, a class hierarchy structured along these lines is so fine-grained that it is nearly impossible to use.

In Edison, the class hierarchies are determined by functionality, not by efficiency. For example, the hierarchy for sequences contains only a single class Sequence, because all the sequence operations are possible on all the sequence implementations, even if some of the operations are inefficient for some of the implementations. Similarly, the root class for collections--which include sets, bags, and priority queues--contains a member method, even though this function is rather inefficient for most priority queues.

At some later date, we may support a special mode that prints out a warning whenever one of the particularly inefficient operations is called. This would be implemented by replacing the relevant default definitions (the inefficient operations almost always just use one of the defaults) with calls to a warning function, such as

  foo = warn moduleName "foo" fooDefault

instead of

  foo = fooDefault

The warn function would either print out a message and then return its third argument, or simply return the third argument without printing a message, depending perhaps on a compiler flag, or on which library you linked in. Let me emphasize that we do nothing like this yet.

3.13 Strictness

Most of the operations on most of the data structures are strict. This is inevitable for data structures with non-trivial invariants. Even given that, however, many of the operations are stricter than necessary. In fact, I never deliberately make any computations lazy, unless the laziness is required by the algorithm (as often happens with amortized data structures, for instance). In particular, I never use irrefutable patterns to make something lazier.

Note, however, that the various sequence implementations are always lazy in their elements. Similarly, associative collections are always lazy in their elements (but usually strict in their keys).

1: Such a combining function is useful only when nominally equal elements are distinguishable in other ways---that is, when the ``equality'' relation is really an equivalence relation. However, this is extremely common.