Previous Contents Next

Chapter 6   Collections

The collection abstraction includes sets, bags, and priority queues (heaps). Collections are defined in Edison as a set of eight classes, organized in the hierarchy shown in Figure 6.1. These classes make essential use of multi-parameter type classes. All collections assume at least an equality relation on elements, and many also assume an ordering relation. The use of multi-parameter type classes allows any particular instance to assume further properties as necessary (such as hashability).




CollX OrdCollX SetX OrdSetX
empty,insert deleteMin intersect no methods
union,delete unsafeInsertMin difference  
null,size filterLT subset  
member,count ··· subsetEq  
···      
Coll OrdColl Set OrdSet
toSeq minElem insertWith no methods
lookup foldr,foldl unionWith  
fold toOrdSeq intersectWith  
filter ··· ···  
···      

Figure 6.1: The collection class hierarchy, with typical methods for each class.


The hierarchy contains a root class, CollX, together with seven subclasses satisfying one or more of three common sub-properties: Because collections encompass a wide range of abstractions, there is no single name that is suitable for all collection type constructors. However, most modules implementing collections will define a type constructor named either Bag, Set, or Heap.

Figure 6.2 summarizes all the methods on collections. These methods will be described in more detail in the sections on each subclass in the hierarchy.


Collection Methods

Constructors:

CollX: empty, single, insert, insertSeq, union, unionSeq, fromSeq
OrdCollX: unsafeInsertMin, unsafeInsertMax, unsafeFromOrdSeq, unsafeAppend
Set: insertWith, insertSeqWith, unionl, unionr, unionWith, unionSeqWith, fromSeqWith
Destructors:

OrdColl: minView, minElem, maxView, maxElem
Deletions:

CollX: delete, deleteAll, deleteSeq
OrdCollX: deleteMin, deleteMax
Observers:

CollX: null, size, member, count
Coll: lookup, lookupM, lookupAll, lookupWithDefault, toSeq
OrdColl: toOrdSeq
Filters and partitions:

OrdCollX: filterLT, filterLE, filterGT, filterGE,
   partitionLT_GE, partitionLE_GT, partitionLT_GT
Coll: filter, partition
Set operations:

SetX: intersect, difference, subset, subsetEq
Set: intersectWith
Folds:

Coll: fold, fold1
OrdColl: foldr, foldl, foldr1, foldl1



Figure 6.2: Summary of methods for the collections classes.


6.1   Observability

Note that the equality relation defined by the Eq class is not necessarily true equality. Very often it is merely an equivalence relation, where equivalent values may be distinguishable by other means. For example, we might consider two binary search trees to be equal if they contain the same elements, even if their shapes are different.

Because of this phenomenon, implementations of observable collections (i.e., collections where it is possible to inspect the elements) are rather constrained. Such an implementation must retain the actual elements that were inserted. For example, it is not possible in general to represent an observable bag as a finite map from elements to counts, because even if we know that a given bag contains, say, three elements from some equivalence class, we do not necessarily know which three.

On the other hand, implementations of non-observable collections have much greater freedom to choose abstract representations of each equivalence class. For example, representing a bag as a finite map from elements to counts works fine if we never need to know which representatives from an equivalence class are actually present. As another example, consider the UniqueHash class defined in the Edison Prelude. If we know that the hash function yields a unique integer for each equivalence class, then we can represent a collection of hashable elements simply as a collection of integers. With such a representation, we can still do many useful things like testing for membership---we just can't support functions like fold or filter that require the elements themselves, rather than the hashed values.1

6.2   CollX

class Eq a => CollX c a
This is the root class of the hierarchy. However, it is perfectly adequate for many applications that use sets or bags.

6.2.1   Constructors

empty :: coll a

The empty collection.
single :: a ® coll a

Create a singleton collection.
insert :: a ® coll a ® coll a
insertSeq :: Sequence seq
Þ seq a ® coll a ® coll a

Add an element or a sequence of elements to a collection. For sets, insert keeps the new element in the case of duplicates, but insertSeq keeps an unspecified element.

See also insertWith and insertSeqWith in Set.
union :: coll a ® coll a ® coll a
unionSeq :: Sequence seq
Þ seq a ® seq (coll a) ® coll a

Merge two collections or a sequence of collections. For sets, it is unspecified which element is kept in the case of duplicates.

See also unionl, unionr, unionWith, and unionSeqWith in Set.
fromSeq :: Sequence seq Þ seq a ® coll a

Convert a sequence of elements into a collection. For sets, it is unspecified which element is kept in the case of duplicates.

See also fromSeqWith in Set.

6.2.2   Deletions

delete :: a ® coll a ® coll a
deleteAll :: a
® coll a ® coll a

Delete a single occurrence or all occurences of the given element from a collection. If the element does not appear in the collection, then leave the collection unchanged. For sets, these functions will be the same, but for bags they may be different. For delete on bags, it is unspecified which of several duplicate elements is deleted.
deleteSeq :: Sequence seq ® seq a ® coll a ® coll a

Delete a single occurrence of each of the given elements from a collection, ignoring those elements in the sequence that do not appear in the collection. For bags, there may be multiple occurrences of a given element in the collection, in which case it is unspecified which is deleted.

6.2.3   Observers

null :: coll a ® Bool

Test whether the collection is empty.

Axioms:
  
null xs º (size xs == 0)
size :: coll a ® Int

Return the number of elements in the collection.
member :: coll a ® a ® Bool

Test whether the given element is in the collection.

Axioms:
  
member xs x º (count xs x > 0)
count :: coll a ® a ® Int

Count how many copies of the given element are in the collection.

6.3   OrdCollX

class (CollX c a, Ord a) => OrdCollX c a

6.3.1   Constructors

unsafeInsertMin :: a ® coll a ® coll a
unsafeInsertMax :: coll a
® a ® coll a

Insert an element into a collection with the precondition that the new element is £ or ³ any existing elements. For sets, this precondition is strengthened to < or >.
unsafeFromOrdSeq :: Sequence seq Þ seq a ® coll a

Convert a sequence of elements into a collection with the precondition that the sequence is already sorted into non-decreasing order. For sets, this precondition is strengthened to increasing order.
unsafeAppend :: coll a ® coll a ® coll a

Merge two collections with the precondition that every element in the first collection is £ every element in the second collection. For sets, this precondition is strengthened to <.

6.3.2   Deletions

deleteMin :: coll a ® coll a
deleteMax :: coll a
® coll a

Delete the minimum or maximum element from the collection, or return empty if the collection is empty. If there is more than one minimum or maximum, it is unspecified which is deleted.

See also minView, minElem, maxView, and maxElem in OrdColl.

6.3.3   Filters and partitions

filterLT :: a ® coll a ® coll a
filterLE :: a
® coll a ® coll a
filterGT :: a
® coll a ® coll a
filterGE :: a
® coll a ® coll a

Extract the subcollection of elements <, £, >, or ³ the given element. Equivalent to the corresponding calls to filter (in Coll), but may be much more efficient.

Axioms:
  
filterLT x xs º filter (< x) xs
filterLE x xs º filter (<= x) xs
filterGT x xs º filter (> x) xs
filterGE x xs º filter (>= x) xs
partitionLT_GE :: a ® coll a ® (coll a, coll a)
partitionLE_GT :: a
® coll a ® (coll a, coll a)
partitionLT_GT :: a
® coll a ® (coll a, coll a)

Split a collection into those elements <, £, or < the given element, and those elements ³, >, or > the given element. partitionLT_GE and partitionLE_GT are equivalent to the corresponding calls to partition (in Coll), but may be much more efficient. partitionLT_GT cannot be expressed as a single call to partition because it discards elements equal to the given element.

Axioms:
  
partitionLT_GE x xs º partition (< x) xs
partitionLE_GT x xs º partition (<= x) xs
partitionLT_GT x xs º (filterLT x xs, filterGT x xs)

6.4   SetX

class CollX c a => SetX c a

6.4.1   Set operations

intersect :: coll a ® coll a ® coll a

Computes the intersection of two sets. It is unspecified which of the two elements is kept.
difference :: coll a ® coll a ® coll a

Computes the difference of two sets (i.e., the set of all elements in the first set that are not in the second set).
subset :: coll a ® coll a ® Bool
subsetEq :: coll a
® coll a ® Bool

Test whether every element in the first set is also in the second set. subset additionally tests whether the second set contains at least one element that is not in the first set.

6.5   OrdSetX

class (OrdCollX c a, SetX c a) => OrdSetX c a
This class contains no methods. It exists only as an abbreviation for the context
(OrdCollX c a, SetX c a)

6.6   Coll

class CollX c a => Coll c a

6.6.1   Observers

lookup :: coll a ® a ® a
lookupM :: coll a
® a ® Maybe a
lookupAll :: Sequence seq
Þ coll a ® a ® seq a
lookupWithDefault :: a
® coll a ® a ® a

Search for an element in the set that is equal to the given element. lookup signals an error if no such element exists, while lookupWithDefault returns a default value (provided as its first argument). For bags, it is unspecified which of several duplicates is chosen by lookup, lookupM, or lookupWithDefault. lookupAll returns all the duplicates, but in an unspecified order.

toSeq :: Sequence seq Þ coll a ® seq a

Return a sequence of all the elements in a collection, in an unspecified order.

6.6.2   Filters and partitions

filter :: (a ® Bool) ® coll a ® coll a

Extract all the elements satisfying the given predicate.
partition :: (a ® Bool) ® coll a ® (coll a, coll a)

Split a collection into those elements satisfying the given predicate, and those elements not satisfying the predicate.

6.6.3   Folds

fold :: (a ® b ® b) ® b ® coll a ® b

Combine all the elements in a collection into a single value, given a combining function and an initial value. Processes the elements in an unspecified order.
fold1 :: (a ® a ® a) ® coll a ® a

Combine all the elements in a non-empty collection into a single value using the given combining function. Signals an error if the collection is empty. Processes the elements in an unspecified order. An implementation may choose to process the elements linearly or in a balanced fashion (like reduce1 on sequences).

6.7   OrdColl

class (Coll c a, OrdCollX c a) => OrdColl c a

6.7.1   Destructors

minView :: coll a ® Maybe2 a (coll a)
maxView :: coll a
® Maybe2 (coll a) a

Separate a collection into its minimum/maximum element and the remaining collection. Return Nothing2 if the collection is empty. If there is more than one minimum/maximum, choose an unspecified one.
minElem :: coll a ® a
maxElem :: coll a
® a

Return the minimum/maximum element in the collection, or signal an error if the collection is empty. If there is more than one minimum/maximum, choose an unspecified one.

6.7.2   Observers

toOrdSeq :: Sequence seq ® coll a ® seq a

Convert a collection into a non-decreasing sequence of elements. The order in which clusters of equal elements are listed is unspecified. (For sets, the order will always be increasing.)

6.7.3   Folds

foldr :: (a ® b ® b) ® b ® coll a ® b
foldl :: (b
® a ® b) ® b ® coll a ® b

Fold across the elements in non-decreasing order (increasing order in the case of sets). The order in which clusters of equal elements are processed is unspecified.
foldr1 :: (a ® a ® a) ® coll a ® a
foldl1 :: (a
® a ® a) ® coll a ® a

Fold across the elements in non-decreasing order (increasing order in the case of sets), or signal an error if the collection is empty. The order in which clusters of equal elements are processed is unspecified.

6.8   Set

class (Coll c a, SetX c a) => Set c a
Warning: each of the following ``With'' functions is unsafe. Each takes a combining function that is used to choose which element is kept in the case of duplicates. This combining function must satisfy the precondition that, given two equal elements, it returns a third element that is equal to both. Usually, the combining function just returns its first or second argument, but it can combine the elements in non-trivial ways.

The combining function should usually be associative. If not, the elements will be combined left-to-right, but with an unspecified associativity. For example, it x == y == z, then fromList (Å) [x,y,z] equals either
single (x Å (y Å z))   or   single ((x Å y) Å z).

6.8.1   Constructors

insertWith :: (a ® a ® a) ® a ® coll a ® coll a
insertSeqWith :: Sequence seq
Þ (a ® a ® a) ® seq a ® coll a ® coll a

Same as insert and insertSeq, but with a combining function to resolve duplicates. See the comments about associativity for insertSeqWith.
unionl :: coll a ® coll a ® coll a
unionr :: coll a
® coll a ® coll a

Same as union but keep the element from the left/right collection in the case of duplicates.

Axioms:
  
unionl xs ys º unionWith (l x y ® x) xs ys
unionr xs ys º unionWith (l x y ® y) xs ys
unionWith :: (a ® a ® a) ® coll a ® coll a ® coll a
unionSeqWith :: Sequence seq
Þ (a ® a ® a) ® seq (coll) a ® coll a

Same as union and unionSeq, but with a combining function to resolve duplicates. See the comments about associativity for unionSeqWith.
fromSeqWith :: Sequence seq Þ (a ® a ® a) ® seq a ® coll a

Same as fromSeq, but with a combining function to resolve duplicates. See the comments about associativity.

6.8.2   Set operations

intersectWith :: (a ® a ® a) ® coll a ® coll a

Same as intersect, but with a combining function to resolve duplicates.

6.9   OrdSet

class (OrdColl c a, Set c a) => OrdSet c a
This class contains no methods. It exists only as an abbreviation for the context
(OrdColl c a, Set c a)

6.10   Specialized operations on lists

For each of the collection methods that involve sequences (e.g., insertSeq, toOrdSeq, lookupAll), there is a specialized version that operates on lists. The specialized versions are obtained by replacing the name Seq with List (e.g., insertSeq becomes insertList). The sole exception to this naming scheme is the specialized version of lookupAll, which is named lookupList.

These functions are defined in the Collections module and rely on overloading (as opposed to being accessible directly from each implementation module). The types of these functions are
fromList  :: CollX c a Þ [a] ® c a
insertList  :: CollX c a Þ [a] ® c a ® c a
unionList  :: CollX c a Þ [c a] ® c a
deleteList  :: CollX c a Þ [a] ® c a ® c a
unsafeFromOrdList  :: OrdCollX c a Þ [a] ® c a
toList  :: Coll c a Þ c a ® [a]
lookupList  :: Coll c a Þ c a ® a ® [a]
toOrdList  :: OrdColl c a Þ c a ® [a]
fromListWith  :: Set c a Þ (a ® a ® a) ® [a] ® c a
insertListWith  :: Set c a Þ (a ® a ® a) ® [a] ® c a ® c a
unionListWith  :: Set c a Þ (a ® a ® a) ® [c a] ® c a

6.11   Utility functions

The module CollectionUtils contains several utility functions. This module will likely expand in the future.

map :: (Coll cin a, CollX cout b) Þ (a ® b) ® (cin a ® cout b)

Map a function across every element in a collection. Note that both the element type and the collection type may change.
mapPartial :: (Coll cin a, CollX cout b) Þ (a ® Maybe b) ® (cin a ® cout b)

Map a partial function across every element in a collection, discarding Nothing results. Note that both the element type and the collection type may change.
unsafeMapMonotonic :: (OrdColl cin a, OrdCollX cout b) Þ (a ® b) ® (cin a ® cout b)

Map a monotonic function across every element in a collection. Note that both the element type and the collection type may change. The function f must satisfy the precondition that
x <= y -- f x <= f y
For sets this precondition is strengthened to
x < y -- f x < f y

unionMap :: (Coll cin a, CollX cout b) Þ (a ® cout b) ® (cin a ® cout b)

Apply a collection-producing function to every element in a collection, and merge the results. Note that both the element type and the collection type may change. unionMap is essentially equivalent to the bind operation on monads, except that collections cannot be monads because of the use of multi-parameter type classes.

1
In fact, we can even support fold and filter if the hashing function is reversible, but this is relatively uncommon.

Previous Contents Next