Chapter 6 Collections
The collection abstraction includes sets, bags, and priority
queues (heaps). Collections are defined in Edison as a set of eight
classes, organized in the hierarchy shown in Figure 6.1.
These classes make essential use of multi-parameter type classes.
All collections assume at least an equality relation on elements, and
many also assume an ordering relation. The use of multi-parameter type
classes allows any particular instance to assume further properties
as necessary (such as hashability).
CollX |
OrdCollX |
SetX |
OrdSetX |
|
empty,insert |
deleteMin |
intersect |
no methods |
union,delete |
unsafeInsertMin |
difference |
|
null,size |
filterLT |
subset |
|
member,count |
··· |
subsetEq |
|
··· |
|
|
|
Coll |
OrdColl |
Set |
OrdSet |
|
toSeq |
minElem |
insertWith |
no methods |
lookup |
foldr,foldl |
unionWith |
|
fold |
toOrdSeq |
intersectWith |
|
filter |
··· |
··· |
|
··· |
|
|
|
Figure 6.1: The collection class hierarchy, with typical methods for each class.
The hierarchy contains a root class, CollX, together with seven
subclasses satisfying one or more of three common sub-properties:
-
Uniqueness. Each element in the collection is unique (i.e.,
no two elements in the collection are equal). These subclasses,
indicated by the name Set, represent sets rather than bags.
- Ordering. The elements have a total ordering and it is possible
to process the elements in non-decreasing order. These subclasses,
indicated by the Ord prefix, typically represent either priority
queues (heaps) or sets/bags implemented as binary search trees.
- Observability. An observable collection is one in which it is
possible to view the elements in a collection. The X suffix
indicates lack of observability. This property is discussed in
greater detail below in Section 6.1.
Because collections encompass a wide range of abstractions, there is
no single name that is suitable for all collection type constructors.
However, most modules implementing collections will define a type constructor
named either Bag, Set, or Heap.
Figure 6.2 summarizes all the methods on collections.
These methods will be described in more detail in the sections on
each subclass in the hierarchy.
Collection Methods
-
Constructors:
CollX:
empty, single, insert, insertSeq, union, unionSeq, fromSeq
OrdCollX: unsafeInsertMin, unsafeInsertMax, unsafeFromOrdSeq, unsafeAppend
Set: insertWith, insertSeqWith, unionl, unionr, unionWith, unionSeqWith, fromSeqWith
- Destructors:
OrdColl: minView, minElem, maxView, maxElem
- Deletions:
CollX: delete, deleteAll, deleteSeq
OrdCollX: deleteMin, deleteMax
- Observers:
CollX: null, size, member, count
Coll: lookup, lookupM, lookupAll, lookupWithDefault, toSeq
OrdColl: toOrdSeq
- Filters and partitions:
OrdCollX:
filterLT, filterLE, filterGT, filterGE,
partitionLT_GE, partitionLE_GT, partitionLT_GT
Coll: filter, partition
- Set operations:
SetX: intersect, difference, subset, subsetEq
Set: intersectWith
- Folds:
Coll: fold, fold1
OrdColl: foldr, foldl, foldr1, foldl1
Figure 6.2: Summary of methods for the collections classes.
6.1 Observability
Note that the equality relation defined by the Eq class
is not necessarily true equality. Very often it is merely
an equivalence relation, where equivalent values may be distinguishable
by other means. For example, we might consider two binary search trees
to be equal if they contain the same elements, even if their shapes
are different.
Because of this phenomenon, implementations of observable collections
(i.e., collections where it is possible to inspect the elements) are
rather constrained. Such an implementation must retain the actual
elements that were inserted. For example, it is not possible
in general to represent an observable bag as a finite map from elements
to counts, because even if we know that a given bag contains, say, three
elements from some equivalence class, we do not necessarily know which
three.
On the other hand, implementations of non-observable collections
have much greater freedom to choose abstract representations of
each equivalence class. For example, representing a bag as a finite
map from elements to counts works fine if we never need to know
which representatives from an equivalence class are actually
present. As another example, consider the UniqueHash class
defined in the Edison Prelude. If we know that the hash function
yields a unique integer for each equivalence class, then we can
represent a collection of hashable elements simply as a collection
of integers. With such a representation, we can still do many
useful things like testing for membership---we just can't support functions
like fold or filter that require the elements themselves,
rather than the hashed values.1
6.2 CollX
class Eq a => CollX c a
This is the root class of the hierarchy. However, it is perfectly
adequate for many applications that use sets or bags.
6.2.1 Constructors
-
empty :: coll a
The empty collection.
- single :: a ® coll a
Create a singleton collection.
- insert :: a ® coll a ® coll a
insertSeq :: Sequence seq Þ seq a ® coll a ® coll a
Add an element or a sequence of elements to a collection. For sets,
insert keeps the new element in the case of duplicates, but
insertSeq keeps an unspecified element.
See also insertWith and insertSeqWith in Set.
- union :: coll a ® coll a ® coll a
unionSeq :: Sequence seq Þ seq a ® seq (coll a) ® coll a
Merge two collections or a sequence of collections. For sets, it
is unspecified which element is kept in the case of duplicates.
See also unionl, unionr, unionWith, and unionSeqWith in
Set.
- fromSeq :: Sequence seq Þ seq a ® coll a
Convert a sequence of elements into a collection. For sets, it
is unspecified which element is kept in the case of duplicates.
See also fromSeqWith in Set.
6.2.2 Deletions
-
delete :: a ® coll a ® coll a
deleteAll :: a ® coll a ® coll a
Delete a single occurrence or all occurences of the given element
from a collection. If the element does not appear in the collection,
then leave the collection unchanged. For sets, these functions will be the
same, but for bags
they may be different. For delete on bags, it is unspecified
which of several duplicate elements is deleted.
- deleteSeq :: Sequence seq ® seq a ® coll a ® coll a
Delete a single occurrence of each of the given elements from
a collection, ignoring those elements in the sequence that do not
appear in the collection. For bags, there may be multiple occurrences of a
given element in the collection, in which case it is unspecified
which is deleted.
6.2.3 Observers
-
null :: coll a ® Bool
Test whether the collection is empty.
Axioms:
- size :: coll a ® Int
Return the number of elements in the collection.
- member :: coll a ® a ® Bool
Test whether the given element is in the collection.
Axioms:
member xs x º (count xs x > 0) |
- count :: coll a ® a ® Int
Count how many copies of the given element are in the collection.
6.3 OrdCollX
class (CollX c a, Ord a) => OrdCollX c a
6.3.1 Constructors
-
unsafeInsertMin :: a ® coll a ® coll a
unsafeInsertMax :: coll a ® a ® coll a
Insert an element into a collection with the precondition that the
new element is £ or ³ any existing elements. For sets, this
precondition is strengthened to < or >.
- unsafeFromOrdSeq :: Sequence seq Þ seq a ® coll a
Convert a sequence of elements into a collection with the precondition
that the sequence is already sorted into non-decreasing order. For
sets, this precondition is strengthened to increasing order.
- unsafeAppend :: coll a ® coll a ® coll a
Merge two collections with the precondition that every element
in the first collection is £ every element in the second collection.
For sets, this precondition is strengthened to <.
6.3.2 Deletions
-
deleteMin :: coll a ® coll a
deleteMax :: coll a ® coll a
Delete the minimum or maximum element from the collection, or return
empty if the collection is empty.
If there is more than one minimum or maximum, it is unspecified which
is deleted.
See also minView, minElem, maxView, and maxElem
in OrdColl.
6.3.3 Filters and partitions
-
filterLT :: a ® coll a ® coll a
filterLE :: a ® coll a ® coll a
filterGT :: a ® coll a ® coll a
filterGE :: a ® coll a ® coll a
Extract the subcollection of elements <, £, >, or ³
the given element.
Equivalent to the corresponding calls to filter (in Coll),
but may be much more efficient.
Axioms:
filterLT x xs º filter (< x) xs |
filterLE x xs º filter (<= x) xs |
filterGT x xs º filter (> x) xs |
filterGE x xs º filter (>= x) xs |
- partitionLT_GE :: a ® coll a ® (coll a, coll a)
partitionLE_GT :: a ® coll a ® (coll a, coll a)
partitionLT_GT :: a ® coll a ® (coll a, coll a)
Split a collection into those elements <, £, or < the
given element, and those elements ³, >, or > the given element.
partitionLT_GE and partitionLE_GT are equivalent to the
corresponding calls to partition (in Coll), but may be
much more efficient. partitionLT_GT cannot be expressed
as a single call to partition because it discards elements
equal to the given element.
Axioms:
partitionLT_GE x xs º partition (< x) xs |
partitionLE_GT x xs º partition (<= x) xs |
partitionLT_GT x xs º (filterLT x xs, filterGT x xs) |
6.4 SetX
class CollX c a => SetX c a
6.4.1 Set operations
-
intersect :: coll a ® coll a ® coll a
Computes the intersection of two sets. It is unspecified which of the
two elements is kept.
- difference :: coll a ® coll a ® coll a
Computes the difference of two sets (i.e., the set of all elements in the
first set that are not in the second set).
- subset :: coll a ® coll a ® Bool
subsetEq :: coll a ® coll a ® Bool
Test whether every element in the first set is also in the second
set. subset additionally tests whether the second set contains
at least one element that is not in the first set.
6.5 OrdSetX
class (OrdCollX c a, SetX c a) => OrdSetX c a
This class contains no methods. It exists only as an abbreviation
for the context
(OrdCollX c a, SetX c a)
6.6 Coll
class CollX c a => Coll c a
6.6.1 Observers
-
lookup :: coll a ® a ® a
lookupM :: coll a ® a ® Maybe a
lookupAll :: Sequence seq Þ coll a ® a ® seq a
lookupWithDefault :: a ® coll a ® a ® a
Search for an element in the set that is equal to the given element.
lookup signals an error if no such element exists, while
lookupWithDefault returns a default value (provided as
its first argument). For bags, it is unspecified which of several
duplicates is chosen by lookup, lookupM, or lookupWithDefault.
lookupAll returns all the duplicates, but in an unspecified order.
- toSeq :: Sequence seq Þ coll a ® seq a
Return a sequence of all the elements in a collection, in an
unspecified order.
6.6.2 Filters and partitions
-
filter :: (a ® Bool) ® coll a ® coll a
Extract all the elements satisfying the given predicate.
- partition :: (a ® Bool) ® coll a ® (coll a, coll a)
Split a collection into those elements satisfying the given predicate,
and those elements not satisfying the predicate.
6.6.3 Folds
-
fold :: (a ® b ® b) ® b ® coll a ® b
Combine all the elements in a collection into a single value, given
a combining function and an initial value. Processes the elements
in an unspecified order.
- fold1 :: (a ® a ® a) ® coll a ® a
Combine all the elements in a non-empty collection into a single
value using the given combining function. Signals an error if the
collection is empty. Processes the elements in an unspecified order.
An implementation may choose to process the elements linearly or
in a balanced fashion (like reduce1 on sequences).
6.7 OrdColl
class (Coll c a, OrdCollX c a) => OrdColl c a
6.7.1 Destructors
-
minView :: coll a ® Maybe2 a (coll a)
maxView :: coll a ® Maybe2 (coll a) a
Separate a collection into its minimum/maximum element and
the remaining collection. Return Nothing2 if the
collection is empty. If there is more than one minimum/maximum,
choose an unspecified one.
- minElem :: coll a ® a
maxElem :: coll a ® a
Return the minimum/maximum element in the collection, or
signal an error if the collection is empty. If there is more than
one minimum/maximum, choose an unspecified one.
6.7.2 Observers
-
toOrdSeq :: Sequence seq ® coll a ® seq a
Convert a collection into a non-decreasing sequence of elements.
The order in which clusters of equal elements are listed
is unspecified. (For sets, the order will always be increasing.)
6.7.3 Folds
-
foldr :: (a ® b ® b) ® b ® coll a ® b
foldl :: (b ® a ® b) ® b ® coll a ® b
Fold across the elements in non-decreasing order (increasing order
in the case of sets). The order in which clusters of equal elements
are processed is unspecified.
- foldr1 :: (a ® a ® a) ® coll a ® a
foldl1 :: (a ® a ® a) ® coll a ® a
Fold across the elements in non-decreasing order (increasing order
in the case of sets), or signal an error if the collection is empty.
The order in which clusters of equal elements
are processed is unspecified.
6.8 Set
class (Coll c a, SetX c a) => Set c a
Warning: each of the following ``With'' functions is unsafe.
Each takes a combining function that is used to choose which element
is kept in the case of duplicates. This combining function must satisfy the
precondition that, given two equal elements, it returns a third element
that is equal to both. Usually, the combining function just returns
its first or second argument, but it can combine the elements in non-trivial
ways.
The combining function should usually be associative. If not,
the elements will be combined left-to-right, but with an unspecified
associativity. For example, it x == y == z, then
fromList (Å) [x,y,z] equals either
single (x Å (y Å z))
or
single ((x Å y) Å z).
6.8.1 Constructors
-
insertWith :: (a ® a ® a) ® a ® coll a ® coll a
insertSeqWith :: Sequence seq Þ (a ® a ® a) ® seq a ® coll a ® coll a
Same as insert and insertSeq, but with a combining function
to resolve duplicates. See the comments about associativity for
insertSeqWith.
- unionl :: coll a ® coll a ® coll a
unionr :: coll a ® coll a ® coll a
Same as union but keep the element from the left/right collection
in the case of duplicates.
Axioms:
unionl xs ys º unionWith (l x y ® x) xs ys |
unionr xs ys º unionWith (l x y ® y) xs ys |
- unionWith :: (a ® a ® a) ® coll a ® coll a ® coll a
unionSeqWith :: Sequence seq Þ (a ® a ® a) ® seq (coll) a ® coll a
Same as union and unionSeq, but with a combining function
to resolve duplicates. See the comments about associativity for
unionSeqWith.
- fromSeqWith :: Sequence seq Þ (a ® a ® a) ® seq a ® coll a
Same as fromSeq, but with a combining function to resolve duplicates.
See the comments about associativity.
6.8.2 Set operations
-
intersectWith :: (a ® a ® a) ® coll a ® coll a
Same as intersect, but with a combining function to resolve duplicates.
6.9 OrdSet
class (OrdColl c a, Set c a) => OrdSet c a
This class contains no methods. It exists only as an abbreviation
for the context
(OrdColl c a, Set c a)
6.10 Specialized operations on lists
For each of the collection methods that involve sequences (e.g.,
insertSeq, toOrdSeq, lookupAll), there is a specialized
version that operates on lists. The specialized versions are obtained by
replacing the name Seq with List (e.g., insertSeq becomes
insertList). The sole exception to this naming scheme is the
specialized version of lookupAll, which is named lookupList.
These functions are defined in the Collections module and rely
on overloading (as opposed to being accessible directly from each
implementation module). The types of these functions are
fromList |
:: |
CollX c a Þ [a] ® c a |
insertList |
:: |
CollX c a Þ [a] ® c a ® c a |
unionList |
:: |
CollX c a Þ [c a] ® c a |
deleteList |
:: |
CollX c a Þ [a] ® c a ® c a |
unsafeFromOrdList |
:: |
OrdCollX c a Þ [a] ® c a |
toList |
:: |
Coll c a Þ c a ® [a] |
lookupList |
:: |
Coll c a Þ c a ® a ® [a] |
toOrdList |
:: |
OrdColl c a Þ c a ® [a] |
fromListWith |
:: |
Set c a Þ (a ® a ® a) ® [a] ® c a |
insertListWith |
:: |
Set c a Þ (a ® a ® a) ® [a] ® c a ® c a |
unionListWith |
:: |
Set c a Þ (a ® a ® a) ® [c a] ® c a |
6.11 Utility functions
The module CollectionUtils contains several utility functions.
This module will likely expand in the future.
-
map :: (Coll cin a, CollX cout b) Þ (a ® b) ® (cin a ® cout b)
Map a function across every element in a collection. Note that both
the element type and the collection type may change.
- mapPartial :: (Coll cin a, CollX cout b) Þ (a ® Maybe b) ® (cin a ® cout b)
Map a partial function across every element in a collection, discarding
Nothing results. Note that both the element type and the collection
type may change.
- unsafeMapMonotonic :: (OrdColl cin a, OrdCollX cout b) Þ (a ® b) ® (cin a ® cout b)
Map a monotonic function across every element in a collection. Note
that both the element type and the collection type may change.
The function f must satisfy the precondition that
x <= y -- f x <= f y
For sets this precondition is strengthened to
x < y -- f x < f y
- unionMap :: (Coll cin a, CollX cout b) Þ (a ® cout b) ® (cin a ® cout b)
Apply a collection-producing function to every element in a collection,
and merge the results. Note that both the element type and the collection
type may change. unionMap is essentially equivalent to the bind
operation on monads, except that collections cannot be monads because of the
use of multi-parameter type classes.
- 1
- In fact, we can even
support fold and filter if the hashing function is reversible,
but this is relatively uncommon.