12 Jun 2012

Functional Thinking: Separating concerns

Over the weekend I was trying to port some of the neo4j import code for the ThoughtWorks graph I’ve been working on to make use of the REST Batch API and I came across an interesting example of imperative vs functional thinking.

I’m using the neography gem to populate the graph and to start with I was just creating a person node and then creating an index entry for it:

RUBY people_to_load = Set.new
people_to_load << { :name => "Mark Needham", :id => 1 }
people_to_load << { :name => "Jenn Smith", :id => 2 }
people_to_load << { :name => "Chris Ford", :id => 3 }

command_index = 0
people_commands = people_to_load.inject([]) do |acc, person|
  acc << [:create_node, {:id => person[:id], :name => person[:name]}]
  acc << [:add_node_to_index, "people", "name", person[:name], "{#{command_index}}"]
  command_index += 2
  acc
end

Neography::Rest.new.batch * people_commands

people_commands ends up containing the following arrays in the above example:

 [
  [:create_node, {:id=>"1", :name=>"Mark Needham"}],
  [:add_node_to_index, "people", "name", "Mark Needham", "{0}"],
  [:create_node, {:id=>"2", :name=>"Jenn Smith"}],
  [:add_node_to_index, "people", "name", "Jenn Smith", "{2}"],
  [:create_node, {:id=>"3", :name=>"Chris Ford"},
  [:add_node_to_index, "people", "name", "Chris Ford", "{4}"]
 ]

We can refer to previously executed batch commands by referencing their 'job id' which in this case is their 0 indexed position in the list of commands. e.g. the second command which indexes me refers to the node created in 'job id' '0' i.e the first command in this batch

In the neo4j REST API we’d be able to define an arbitrary id for a command and then reference that later on but it’s not implemented that way in neography.

I thought having the 'command_index += 2' was a bit rubbish because it’s nothing to do with the problem I’m trying to solve so I posted to twitter to see if there was a more functional way to do this.

My colleague Chris Ford came up with a neat approach which involved using 'each_with_index' to work out the index positions rather than having a counter. His final version looked like this:

RUBY insert_commands = people_to_load.map do |person|
  [:create_node, {:id => person[:id], :name => person[:name]}]
end

index_commands = people_to_load.each_with_index.map do |person, index|
  [:add_node_to_index, "people", "name", person[:name], "{#{index}}"]
end

people_commands = insert_commands + index_commands

The neat thing about this solution is that Chris has separated the two concerns - creating the node and indexing it.

When I was thinking about this problem imperatively they seemed to belong together but there’s actually no reason for that to be the case and we can write simpler code by separating them.

We do iterate through the set twice but since it’s not really that big it doesn’t make too much difference. to the performance.

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.