jq: error - Cannot iterate over null (null)
I’ve been playing around with the jq library again over the past couple of days to convert the JSON from the Stack Overflow API into CSV and found myself needing to deal with an optional field.
I’ve downloaded 100 or so questions and stored them as an array in a JSON array like so:
$ head -n 100 so.json
[
{
"has_more": true,
"items": [
{
"is_answered": false,
"delete_vote_count": 0,
"body_markdown": "...",
"tags": [
"jdbc",
"neo4j",
"cypher",
"spring-data-neo4j"
],
"question_id": 33023306,
"title": "How to delete multiple nodes by specific ID using Cypher",
"down_vote_count": 0,
"view_count": 8,
"answers": [
{
...
]
I wrote the following command to try and extract the answer meta data and the corresponding question_id:
$ jq -r \
'.[] | .items[] |
{ question_id: .question_id, answer: .answers[] } |
[.question_id, .answer.answer_id, .answer.title] |
@csv' so.json
33023306,33024189,"How to delete multiple nodes by specific ID using Cypher"
33020796,33021958,"How do a general search across string properties in my nodes?"
33018818,33020068,"Neo4j match nodes related to all nodes in collection"
33018818,33024273,"Neo4j match nodes related to all nodes in collection"
jq: error (at so.json:134903): Cannot iterate over null (null)
Unfortunately this results in an error since some questions haven’t been answered yet and therefore don’t have the 'answers' property.
While reading the docs I came across the alternative operation '//' which can be used to provide defaults - in this case I thought I could plugin an empty array of answers if a question hadn’t been answered yet:
$ jq -r \
'.[] | .items[] |
{ question_id: .question_id, answer: (.answers[] // []) } |
[.question_id, .answer.answer_id, .answer.title] |
@csv' so.json
33023306,33024189,"How to delete multiple nodes by specific ID using Cypher"
33020796,33021958,"How do a general search across string properties in my nodes?"
33018818,33020068,"Neo4j match nodes related to all nodes in collection"
33018818,33024273,"Neo4j match nodes related to all nodes in collection"
jq: error (at so.json:134903): Cannot iterate over null (null)
Still the same error! Reading down the page I noticed the ? operator which provides syntactic sugar for handling/catching errors. I gave it a try:
$ jq -r '.[] | .items[] |
{ question_id: .question_id, answer: .answers[]? } |
[.question_id, .answer.answer_id, .answer.title] |
@csv' so.json | head -n10
33023306,33024189,"How to delete multiple nodes by specific ID using Cypher"
33020796,33021958,"How do a general search across string properties in my nodes?"
33018818,33020068,"Neo4j match nodes related to all nodes in collection"
33018818,33024273,"Neo4j match nodes related to all nodes in collection"
33015714,33021482,"Upgrade of spring data neo4j 3.x to 4.x Relationship Operations"
33011477,33011721,"Why does Neo4j OGM delete method return void?"
33011102,33011565,"Neo4j and algorithms"
33011102,33013260,"Neo4j and algorithms"
33010859,33011505,"Importing data into an existing database in neo4j"
33009673,33010942,"How do I use Spring Data Neo4j to persist a Map (java.util.Map) object inside an NodeEntity?"
As far as I can tell we are just skipping any records that don’t contain 'answers' which is exactly the behaviour I’m after so that’s great - just what we need!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.