Haskell: Strictness and the monadic bind
As I mentioned towards the end of my post about implementing the union find data structure in Haskell I wrote another version using a mutable array and having not seen much of a performance improvement started commenting out code to try and find the problem.
I eventually narrowed it down to the union function which was defined like so:
union :: IO (IOArray Int Int) -> Int -> Int -> IO (IOArray Int Int)
union arrayContainer x y = do
actualArray <- arrayContainer
ls <- getAssocs actualArray
leader1 <- readArray actualArray x
leader2 <- readArray actualArray y
let newValues = (map (\(index, value) -> (index, leader1)) . filter (\(index, value) -> value == leader2)) ls
sequence $ map (\(idx, val) -> writeArray actualArray idx val) newValues
return actualArray
I was using Unix’s http://en.wikipedia.org/wiki/Time_(Unix) function to get the execution time since this meant I didn’t need to make any changes to the program and this level of granularity was ok.
The first time I ran the program it executed in 36.379 seconds and my first hunch was that a lot of time was being taken up writing to the array so I commented out that line:
union :: IO (IOArray Int Int) -> Int -> Int -> IO (IOArray Int Int)
union arrayContainer x y = do
actualArray <- arrayContainer
ls <- getAssocs actualArray
leader1 <- readArray actualArray x
leader2 <- readArray actualArray y
let newValues = (map (\(index, value) -> (index, leader1)) . filter (\(index, value) -> value == leader2)) ls
-- sequence $ map (\(idx, val) -> writeArray actualArray idx val) newValues
return actualArray
The execution time decreased to 33.381 seconds so the writing of the array was actually only a small part of the total execution time.
I thought it was quite strange that it was taking so long to execute since things are generally lazily evaluated in Haskell and my assumption was that newValues wasn’t being evaluated since I hadn’t used it anywhere. I decided to comment that out to see what difference it would make:
union :: IO (IOArray Int Int) -> Int -> Int -> IO (IOArray Int Int)
union arrayContainer x y = do
actualArray <- arrayContainer
ls <- getAssocs actualArray
leader1 <- readArray actualArray x
leader2 <- readArray actualArray y
-- let newValues = (map (\(index, value) -> (index, leader1)) . filter (\(index, value) -> value == leader2)) ls
-- sequence $ map (\(idx, val) -> writeArray actualArray idx val) newValues
return actualArray
The execution time was now 33.579 seconds so commenting out that line hadn’t actually made any difference. I assumed ls wasn’t being evaluated since it isn’t being used but I thought I’d better check:
union :: IO (IOArray Int Int) -> Int -> Int -> IO (IOArray Int Int)
union arrayContainer x y = do
actualArray <- arrayContainer
-- ls <- getAssocs actualArray
leader1 <- readArray actualArray x
leader2 <- readArray actualArray y
-- let newValues = (map (\(index, value) -> (index, leader1)) . filter (\(index, value) -> value == leader2)) ls
-- sequence $ map (\(idx, val) -> writeArray actualArray idx val) newValues
return actualArray
The execution time now reduced to 3.882 seconds thereby suggesting that http://hackage.haskell.org/packages/archive/array/0.2.0.0/doc/html/Data-Array-MArray.html#v%3AgetAssocs was being strictly evaluated.
We are doing what’s called a monadic bind which (at least) within GHCI is strictly evaluated but isn’t necessarily evaluated like this everywhere else:
Monad operations (bind and return) have to be non-strict in fact, always! However other operations can be specific to each monad. For instance some are strict (like IO), and some are non-strict (like []).
From my observations it would seem that the IOArray is one of those monads which evaluates bind strictly.
I tried looking at the Haskell source code to see if I could find any code to prove what I’d observed but I’m not entirely sure what I should be looking for!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.