ClickHouse: Nested type Array(String) cannot be inside Nullable type (ILLEGAL_TYPE_OF_ARGUMENT)
I’ve been working with some data that’s in CSV format but has tab-separated values in some columns. In this blog post, we’re going to learn how to process that data in ClickHouse.
The CSV file that we’re working with looks like this:
value |
---|
foo bar |
We’ll launch ClickHouse Local (clickhouse local
) and then run the following:
FROM file('data.csv', CSVWithNames)
SELECT *;
┌─value─────┐
│ foo bar │
└───────────┘
Let’s try to split the value
field on tab using the splitByString
function:
FROM file('data.csv', CSVWithNames)
SELECT splitByString('\t', value);
Received exception:
Code: 43. DB::Exception: Nested type Array(String) cannot be inside Nullable type: While processing splitByString('\t', value). (ILLEGAL_TYPE_OF_ARGUMENT)
The problem we’ve run into here is that ClickHouse’s CSV reader assumes that fields are nullable since it’s possible that there could be nulls.
We can disable the schema_inference_make_columns_nullable
setting to tell it to not treat fields as nullable.
FROM file('data.csv', CSVWithNames)
SELECT splitByString('\t', value)
SETTINGS schema_inference_make_columns_nullable = 0;
┌─splitByString('\t', value)─┐
│ ['foo','bar'] │
└────────────────────────────┘
Alternatively, we can use the assumeNotNull
function if we want it to keep all other fields nullable:
FROM file('data.csv', CSVWithNames)
SELECT splitByString('\t', assumeNotNull(value));
┌─splitByString('\t', assumeNotNull(value))─┐
│ ['foo','bar'] │
└───────────────────────────────────────────┘
But this function will return an arbitrary value if a null is found. So let’s say we update our CSV file to look like this:
value |
---|
foo bar |
mark bar |
If we run that last query, we’ll see the following output:
┌─splitByString('\t', assumeNotNull(value))─┐
│ ['foo','bar'] │
│ [''] │
│ ['mark','bar'] │
└───────────────────────────────────────────┘
So we might prefer to use ifNull
, which lets us choose our own default value to use if a null value is encountered:
FROM file('data.csv', CSVWithNames)
SELECT splitByString('\t', ifNull(value, 'N/A'));
┌─splitByString('\t', ifNull(value, 'N/A'))─┐
│ ['foo','bar'] │
│ ['N/A'] │
│ ['mark','bar'] │
└───────────────────────────────────────────┘
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.