28 May 2012

Haskell: Finding the minimum & maximum values of a Foldable in one pass

I recently came across Dan Piponi’s blog post 'Haskell Monoids & their Uses' and towards the end of the post he suggests creating monoids to work out the maximum and minimum values of a Foldable value in one pass.

The http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/Data-Foldable.html type class provides a generic approach to walking through a datastructure, accumulating values as we go. The foldMap function applies a function to each element of our structure and then accumulates the return values of each of these applications.

A list is one example of a type which implements the Foldable type class like so:

instance Foldable [] where
    foldr = Prelude.foldr
    foldl = Prelude.foldl
    foldl' = List.foldl'
    foldr1 = Prelude.foldr1
    foldl1 = Prelude.foldl1

In this case it delegates all of the Foldable functions to previously defined functions from the Prelude or List modules.

The 'foldMap' function mentioned above has the following default implementation:

foldMap :: Monoid m => (a -> m) -> t a -> m
foldMap f = foldr (mappend . f) mempty

Dan shows an example where we use 'foldMap' to check whether any values in a Foldable have the value 1 by using the 'Any' monoid:

> foldMap (Any . (== 1)) [1..20]
Any {getAny = True}

I couldn’t quite understand how it worked so I expanded the 'foldMap' definition which results in the following:

> Data.Foldable.foldr (\x acc -> acc `mappend` (Any . (== 1)) x) mempty [1..20]
Any {getAny = True}

So we’re folding over the list with an initial seed value of 'Any False' as that’s what mempty will evaluate to. For each value we’re then calling 'mappend' with two 'Any' values and passing on the result.

So in this example it would go like this:

Any False `mappend` (Any . (==1)) 20 => Any False `mappend` Any False => Any False
Any False `mappend` (Any . (==1)) 19 => Any False `mappend` Any False => Any False
...
Any False `mappend` (Any . (==1)) 1 => Any False `mappend` Any True => Any True

In our case we need to define our own monoids to keep track of the maximum value in a Foldable:

import Data.Monoid
import Data.Foldable

data MaxSoFar a = MaxSoFar { getMaxSoFar :: a } deriving (Eq, Ord, Read, Show, Bounded)
instance (Num a,  Ord a, Bounded a) => Monoid (MaxSoFar a) where
  mempty = MaxSoFar (minBound)
  MaxSoFar x `mappend` MaxSoFar y = MaxSoFar (max x y)

We set 'mempty' to the minimum possible Int value so that anything Int value compared to it will automatically become the max value when it gets compared to that.

For 'mappend' we take the values stored in the two 'MaxSoFar' instances and then select the maximum and return a new MaxSoFar.

We call foldMap like so:

> foldMap (MaxSoFar) ([1..20] :: [Int])
MaxSoFar {getMaxSoFar = 20}

MinSoFar is pretty similar in design:

data MinSoFar a = MinSoFar { getMinSoFar :: a } deriving (Eq, Ord, Read, Show, Bounded)
instance (Num a,  Ord a, Bounded a) => Monoid (MinSoFar a) where
  mempty = MinSoFar maxBound
  MinSoFar x `mappend` MinSoFar y = MinSoFar (min x y)

And we can then combine them together like this:

> foldMap (\x -> (MaxSoFar x, MinSoFar x)) ([1..20] :: [Int])
(MaxSoFar {getMaxSoFar = 20},MinSoFar {getMinSoFar = 1})

This works because we can make use of what Dan calls the 'Product Monoid', which is a Monoid instance for a tuple:

instance (Monoid a, Monoid b) => Monoid (a,b) where
  mempty = (mempty, mempty)
  (a1,b1) `mappend` (a2,b2) = (a1 `mappend` a2, b1 `mappend` b2)

So in our example the fold would go like this:

(MaxSoFar -9223372036854775808, MinSoFar 9223372036854775807) `mappend` (MaxSoFar 20, MinSoFar 20) => (MaxSoFar 20, MinSoFar 20)
(MaxSoFar 20, MinSoFar 20) `mappend` (MaxSoFar 19, MinSoFar 19) => (MaxSoFar 20, MinSoFar 19)
...
(MaxSoFar 20, MinSoFar 2) `mappend` (MaxSoFar 1, MinSoFar 1) => (MaxSoFar 20, MinSoFar 1)

We could probably achieve the same thing without using monoids but I think this approach is pretty neat.

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.