neo4j/cypher/Lucene: Dealing with special characters
neo4j uses Lucene to handle indexing of nodes and relationships in the graph but something that can be a bit confusing at first is how to handle special characters in Lucene queries.
For example let’s say we set up a database with the following data:
CREATE ({name: "-one"})
CREATE ({name: "-two"})
CREATE ({name: "-three"})
CREATE ({name: "four"})
And for whatever reason we only wanted to return the nodes that begin with a hyphen.
A hyphen is a special character in Lucene so if we forget to escape it we’ll end up with an impressive stack trace:
START p = node:node_auto_index("name:-*") RETURN p;
==> RuntimeException: org.apache.lucene.queryParser.ParseException: Cannot parse 'name:-*': Encountered " "-" "- "" at line 1, column 5.
==> Was expecting one of:
==> <BAREOPER> ...
==> "(" ...
==> "*" ...
==> <QUOTED> ...
==> <TERM> ...
==> <PREFIXTERM> ...
==> <WILDTERM> ...
==> "[" ...
==> "{" ...
==> <NUMBER> ...
==>
So we change our query to escape the hyphen:
START p = node:node_auto_index("name:\-*") RETURN p;
which results in the following exception:
==> SyntaxException: invalid escape sequence
==>
==> Think we should have better error message here? Help us by sending this query to cypher@neo4j.org.
==>
==> Thank you, the Neo4j Team.
==>
==> "START p = node:node_auto_index("name:\-*") RETURN p"
==>
The problem is that the cypher parser also treats '\' as an escape character so we need to use two of them to make our query do what we want:
START p = node:node_auto_index("name:\\-*") RETURN p;
==> +------------------------+
==> | p |
==> +------------------------+
==> | Node[4]{name:"-one"} |
==> | Node[5]{name:"-two"} |
==> | Node[6]{name:"-three"} |
==> +------------------------+
==> 3 rows
Alternatively, as Chris pointed out, we could make use of parameters in which case we don’t need to worry about how the cypher parser handles escaping:
require 'neography'
neo = Neography::Rest.new
query = "START p = node:node_auto_index({query}) RETURN p"
result = neo.execute_query(query, { :query => 'name:\-*'})
p result["data"].map { |x| x[0]["data"] }
$ bundle exec ruby params.rb
[{"name"=>"-one"}, {"name"=>"-two"}, {"name"=>"-three"}]
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.