Apache Pinot: Inserts from SQL - Unable to get tasks states map - No task is generated for table
I recently wrote a post on the StarTre blog describing the inserts from SQL feature that was added in Apache Pinot 0.11, and while writing it I came across some interesting exceptions due to configuration mistakes I’d made. In this post we’re going to describe one of those exceptions.
To recap, I was trying to ingest a bunch of JSON files from an S3 bucket using the following SQL query:
INSERT INTO "events"
FROM FILE 's3://marks-st-cloud-bucket/events/*.json'
OPTION(
taskName=myTask-s3,
input.fs.className=org.apache.pinot.plugin.filesystem.S3PinotFS,
input.fs.prop.accessKey=AKIARCOCT6DWLUB7F77Z,
input.fs.prop.secretKey=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,
input.fs.prop.region=eu-west-2
);
Note
|
Don’t worry, those credentials were deactivated and deleted several days ago. |
When I ran this query against a Pinot cluster that contained a controller, broker, and server, I got the following exception:
[
{
"message": "QueryExecutionError:\norg.apache.commons.httpclient.HttpException: Unable to get tasks states map. Error code 400, Error message: {\"code\":400,\"error\":\"No task is generated for table: events, with task type: SegmentGenerationAndPushTask\"}\n\tat org.apache.pinot.common.minion.MinionClient.executeTask(MinionClient.java:123)\n\tat org.apache.pinot.core.query.executor.sql.SqlQueryExecutor.executeDMLStatement(SqlQueryExecutor.java:102)\n\tat org.apache.pinot.controller.api.resources.PinotQueryResource.executeSqlQuery(PinotQueryResource.java:145)\n\tat org.apache.pinot.controller.api.resources.PinotQueryResource.handlePostSql(PinotQueryResource.java:103)",
"errorCode": 200
}
]
My mistake here was that I didn’t have a minion in the cluster. The ingestion job is run by the minion component, so without one of those this feature doesn’t work!
An update (30th June 2023)
Today I learned that you can get this error even if you do have a minion configured. The scenario that results in this error is if no files are found for ingestion.
This might happen if you have an invalid glob expression in the includeFileNamePattern
:
For example, the following throws the exception:
SET taskName = 'events-task7';
SET input.fs.className = 'org.apache.pinot.spi.filesystem.LocalPinotFS';
SET includeFileNamePattern='glob:customers.csv';
INSERT INTO customers
FROM FILE 'file:///input/';
We can fix the query by adding **/
at the beginning:
SET taskName = 'events-task7';
SET input.fs.className = 'org.apache.pinot.spi.filesystem.LocalPinotFS';
SET includeFileNamePattern='glob:**/customers.csv';
INSERT INTO customers
FROM FILE 'file:///input/';
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.