Clojure: First steps with reducers
I’ve been playing around with Clojure a bit today in preparation for a talk I’m giving next week and found myself writing the following code to apply the same function to three different scores:
(defn log2 [n]
(/ (Math/log n) (Math/log 2)))
(defn score-item [n]
(if (= n 0) 0 (log2 n)))
(+ (score-item 12) (score-item 13) (score-item 5)) 9.60733031374961
I’d forgotten about folding over a collection but quickly remembered that I could achieve the same result with the following code:
(reduce #(+ %1 (score-item %2)) 0 [12 13 5]) 9.60733031374961
The added advantage here is that if I want to add a 4th score to the mix all I need to do is append it to the end of the vector:
(reduce #(+ %1 (score-item %2)) 0 [12 13 5 6]) 12.192292814470767
However, while Googling to remind myself of the order of the arguments to reduce I kept coming across articles and documentation about reducers which I’d heard about but never used.
As I understand they’re used to achieve performance gains and easier composition of functions over collections so I’m not sure how useful they’ll be to me but I thought I’d give them a try.
Our first step is to bring the namespace into scope:
(require '[clojure.core.reducers :as r])
Now we can compute the same result using the reduce function:
(r/reduce #(+ %1 (score-item %2)) 0 [12 13 5 6]) 12.192292814470767
So far, so identical. If we wanted to calculate individual scores and then filter out those below a certain threshold the code would behave a little differently:
(->>[12 13 5 6]
(map score-item)
(filter #(> % 3))) (3.5849625007211565 3.700439718141092)
(->> [12 13 5 6]
(r/map score-item)
(r/filter #(> % 3))) #object[clojure.core.reducers$folder$reify__19192 0x5d0edf21 "clojure.core.reducers$folder$reify__19192@5d0edf21"]
Instead of giving us a vector of scores the reducers version returns a reducer which can pass into reduce or fold if we want an accumulated result or into if we want to output a collection. In this case we want the latter:
(->> [12 13 5 6]
(r/map score-item)
(r/filter #(> % 3))
(into [])) (3.5849625007211565 3.700439718141092)
With a measly 4 item collection I don’t think the reducers are going to provide much speed improvement here but we’d need to use the http://clojure.org/reference/reducers#_using_reducers function if we want processing of the collection to be done in parallel.
One for next time!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.