Chapter 3 Conventions
In this section, I describe the general conventions I have followed
in designing Edison. I include only those conventions that affect
users of the library. Conventions that affect developers are described
in the separate Edison Developer's Guide.
3.1 Use of modules
Each data structure is implemented as a Haskell module. These
modules should always be imported qualified to prevent a flood of
name clashes (see Section 3.2).
I recommend renaming each imported module (using the
as keyword) both to reduce the overhead of qualified names and to
make substituting one module for another as painless as possible.
The one Edison module that is typically imported unqualified is the
EdisonPrelude.
Example use of an Edison data structure:
module Foo where
import EdisonPrelude
import qualified SimpleQueue as Q
data Tree a = Empty | Node a (Tree a) (Tree a)
breadthFirst :: Tree a -> [a]
breadthFirst t = bfs (Q.single t)
where bfs q =
case Q.lview q of
Just2 (Node x l r) q' -> x : bfs (Q.snoc (Q.snoc q' l) r)
Just2 Empty q' -> bfs q'
Nothing2 -> []
3.2 Reuse of names
I have attempted to choose names that are as standard as possible.
This means that operations for different abstractions frequently share
the same name (empty, null, size, etc.). It also means
that in many cases I have reused names from the Prelude. However,
these name clashes should not be a problem because I expect Edison
modules to be imported qualified (see Section 3.1).
If, for some reason, you choose to import Edison modules unqualified,
you will usually need to import the Prelude hiding the relevant
names.
Edison modules also frequently share type names. For example, every
sequence type constructor is named Seq unless there is a good reason
otherwise. This makes substituting one module for another fairly
painless, especially when imported modules are renamed as in the previous
section.
An example of ``a good reason otherwise'' for not using the standard
type name is when the type constructor has the wrong kind. This usually
happens when one data structure is parameterized by another. For example,
an implementation of sequences that is parameterized on
another implementation of sequences might be given as
data HigherOrderSeq seq a = ...
instance Sequence seq => Sequence (HigherOrderSeq seq) where ...
However, even modules such as these will typically define Seq
as a type synonym for some good default choice, e.g.,
type Seq = HigherOrderSeq BankersQueue.Seq
3.3 Use and non-use of classes
Each family of abstractions is defined as a set of classes---a main
class that every implementation of that abstraction should support and
several auxiliary subclasses that an implementation may or may not support.
However, not all applications need the power of classes, so each method is
also accessible directly from the implementation module. For example, an
implementation of sequences will typically define and export a
type Seq a together with all the relevant functions on that type
(using the same names as the corresponding methods!), and, in addition,
declare Seq to be an instance of the Sequence class.
Thus, you can choose to use overloading or not, as appropriate for your
particular application. For example, in the following module,
module Foo where
import qualified Sequence as S
import qualified HoodMelvilleQueue as Q
...
I could refer to the empty queue of type Q.Seq a as either
S.empty or Q.empty. The former refers to the empty method
of the Sequence class (which might be resolved using a type
signature to be of type Q.Seq a) and the latter refers
directly to the empty value from Q, without going through the
class mechanism.
Note that this example is somewhat unrealistic.
In practice, you would very rarely need to import both modules
in their entirety. Usually, you would either import only the implementation
module
import qualified HoodMelvilleQueue as Q
or import the class module together with the type from the implementation
module
import qualified Sequence as S
import HoodMelvilleQueue (Seq)
Note that in the last line, I imported HoodMelvilleQueue
unqualified. Normally, this would produce a flood of name clashes,
but it is acceptable when you are selectively importing only a type
name or two.
3.4 Bonus Operations
Some implementations export a few extra operations beyond those included
in the relevant classes. These are typically operations that are
particularly efficient for that implementation, but which are not general
enough to warrant inclusion in a class. An example of this is the
function unsafeMapMonotonic that is supported by many priority
queues.
3.5 Fixity
Since qualified infix symbols are fairly ugly, I avoid infix symbols
as much as possible. For example, I call the sequence catenation function
append instead of ++.
3.6 Error handling
Since Haskell has no good way to recover from errors, I avoid
signalling errors if there is any reasonable alternative. For many
functions, it is easy to avoid this by returning the Maybe type (or
something similar), but sometimes, as with the head function on lists
and the corresponding lhead function on sequences, this approach is
just too painful. For lhead of an empty sequence, there really is no
choice but to signal an error, but other times there is a reasonable
alternative. For example, I define both ltail of the empty sequence
and take of a negative argument to return the empty sequence even
though the corresponding Prelude functions would signal errors in both
of these cases.
3.7 (Lack of) Portability
Edison does not conform to either Haskell 1.4 or Haskell98.
It will hopefully conform to Haskell 2. For now, it is guaranteed to
run only under GHC.
There are three non-standard language features that I use. Of these,
only the first is critical. The other two affect some fraction of the
individual modules, and could be eliminated fairly easily.
-
Multi-parameter type classes. Fortunately, this is also
the least controversial. Some form of multi-parameter type classes
seems certain to make it into Haskell 2, and I expect it to be easy
to adapt Edison's signatures accordingly.
- Unboxed integers. I frequently use unboxed integers when,
for example, every node in a tree needs to maintain some sort of size
field. I could simply declare these fields to be strict, but
using unboxed integers in these situations is significantly faster.
- Pattern guards. I occasionally use pattern guards.
Getting by without them would not be difficult, but, since pattern guards
also seem likely to make it into Haskell 2, I have not bothered.
If your compiler does not support unboxed integers or pattern guards, you
will still be able to use most of Edison---just not those
individual data structures that use these features.
If your compiler does not support multi-parameter type classes, then
you are in much bigger trouble---you will only be able to use the
sequence data structures, not collections or associative collections.
3.8 Unsafe operations
Consider converting a list of elements into a binary search tree.
This can be implemented particularly efficiently if we know that the list is
already sorted. And in fact, it often is already sorted in practice.
This special case, and a handful of similar cases, are common enough and
important enough to deserve their own functions. These are functions
with non-trivial preconditions that are too expensive to check at runtime,
so we simply make their behavior undefined if the preconditions are not
satisfied. Violating the preconditions may break the implementations in
horrible ways, so we adopt a special naming convention to emphasize that
these operations are unsafe---almost all such functions are given names
beginning with the word unsafe (unsafeFromOrdList,
unsafeInsertMin, etc.).
The one place where I have violated this convention is in the Set
class, where there is a whole family of operations with names like
insertWith and unionWith. These functions take a combining
function that is used to resolve collisions. For example, when
inserting an element into a set that already contains that element,
the combining function is called on the new and old elements to
determine which element will remain in the new set.1
The combining functions typically return one element or the other, but
they can also combine the elements in non-trivial ways. These
combining functions are required to satisfy the precondition that,
given two equal elements, they return a third element that is equal to
the other two.
3.9 Currying
All types in Edison are fully curried.
3.10 Order of arguments
Whenever a function takes multiple arguments, there is a choice as to
the order of the arguments. I have tried to make these choices according
to the following rules, in decreasing order of importance:
-
Favor an order that is more useful for partial applications.
For example, the member function
member :: CollX c a => c a -> a -> Bool
takes the collection first and the element second, rather than the other
way around, because it is much more commonly partially applied to a
collection than to an element.
- Favor an order with significant mnemonic value.
For example, the cons and snoc functions
cons :: Sequence s => a -> s a -> s a
snoc :: Sequence s => s a -> a -> s a
take their arguments in opposite orders because cons adds an
element to the left of a sequence and snoc adds an element to the
right of a sequence.
- Functions that modify a collection should take the collection last.
For example, the insert function has type
insert :: CollX c a => a -> c a -> c a
Taking the collection last supports a convenient style of stringing several
update operations in a row using the $ combinator, as in
insert 1 $ insert 2 $ insert 3 ns
rather than
insert (insert (insert ns 3) 2) 1
- Consistency with similar operations.
- Personal taste.
3.11 Completeness
In designing a library, there is always a delicate question of how much
to put in, and how much to leave out. Including too much can lead to code
bloat, and make the library somewhat harder to learn. Including too little
can make the library significantly less useful. I have tried to err on the
side of including too much, rather than too little.
Note that this can have an affect on efficiency. Because the classes
in Edison have many methods, the dictionaries for the classes will be
large, and so building these dictionaries dynamically will be
expensive. Fortunately, most dictionaries can be built statically.
The exceptions often involve things like non-regular datatypes and
polymorphic recursion. This is not to say don't use these features;
just don't mix them with overloading and expect the result to be
efficient.
3.12 (In)Efficiency
In text books on data structures and algorithms, data structures are
often organized into hierarchies according to efficiency. For
example, a hierarchy of sequences might include such entries as
-
queues support efficicent insertions at the rear and
deletions from the front
- steques (stack-ended queues, also known as
output-restricted deques) are queues that additionally support
efficient insertions at the front
- deques (double-ended queues) are steques that
additionally support efficient deletions from the back
Unfortunately, a class hierarchy structured along these lines
is so fine-grained that it is nearly impossible to use.
In Edison, the class hierarchies are determined by functionality, not
by efficiency. For example, the hierarchy for sequences contains only
a single class Sequence, because all the sequence operations are
possible on all the sequence implementations, even if some of the
operations are inefficient for some of the implementations.
Similarly, the root class for collections--which include sets, bags,
and priority queues--contains a member method, even though this
function is rather inefficient for most priority queues.
At some later date, we may support a special mode that prints out a
warning whenever one of the particularly inefficient operations is called.
This would be implemented by replacing the relevant default definitions
(the inefficient operations almost always just use one of the defaults)
with calls to a warning function, such as
foo = warn moduleName "foo" fooDefault
instead of
foo = fooDefault
The warn function would either print out a message and then return
its third argument, or simply return the third argument without printing
a message, depending perhaps on a compiler flag, or on which library you
linked in. Let me emphasize that we do nothing like this yet.
3.13 Strictness
Most of the operations on most of the data structures are strict. This is
inevitable for data structures with non-trivial invariants. Even given that,
however, many of the operations are stricter than necessary. In fact, I never
deliberately make any computations lazy, unless the laziness is
required by the algorithm (as often happens with amortized data structures,
for instance). In particular, I never use irrefutable patterns to make
something lazier.
Note, however, that the various sequence implementations are always
lazy in their elements. Similarly, associative collections are
always lazy in their elements (but usually strict in their keys).
- 1
-
Such a combining function is useful only when nominally
equal elements are distinguishable in other ways---that is, when
the ``equality'' relation is really an equivalence relation.
However, this is extremely common.