Apache Pinot: Deleting instances in a bad state
Sometimes when I start up a local Pinot cluster after doing a hard shutdown (by restarting my computer) I noticed that the Pinot Data Explorer shows controllers, brokers, or servers in a bad state. In this blog post we’ll see how to get rid of those bad instances.
The screenshot below shows several instances in the bad state.
I’m not entirely sure why this happens, but I assume it’s something to do with my local IP address changing as far as Pinot’s concerned. Having these instances in a bad state doesn’t actually cause me any problems, they’re more of an irritation.
Luckily we can get rid of that irritation with help from the REST API’s drop instance endpoint, shown in the screen shot below:
So, what does calling this end point actually do? I had a quick look at the code and learnt that it deletes the following Zookeeper entries:
-
INSTANCES/<instanceName>
-
/CONFIGS/PARTICIPANT/<instanceName>
We can see an example of what that part of our Zookeeper metadata looks like below:
We want to remove the following instances:
-
Controller_172.21.0.4_9000
-
Controller_172.21.0.2_9000
-
Controller_172.21.0.5_9000
-
Server_172.21.0.3_8098
-
Server_172.21.0.4_8098
-
Broker_172.21.0.3_8099
Let’s try to remove those instances, by running the following command:
for instance in "Controller_172.21.0.4_9000" "Controller_172.21.0.2_9000" "Controller_172.21.0.5_9000" "Server_172.21.0.3_8098" "Server_172.21.0.4_8098" "Broker_172.21.0.3_8099"; do
curl -X DELETE "http://localhost:9000/instances/${instance}" \
-H "accept: application/json" 2>/dev/null;
printf "\n"
done
{"status":"Successfully dropped instance"}
{"status":"Successfully dropped instance"}
{"status":"Successfully dropped instance"}
{"status":"Successfully dropped instance"}
{"_code":409,"_error":"Failed to drop instance Server_172.21.0.4_8098 - Instance Server_172.21.0.4_8098 exists in ideal state for races_REALTIME"}
{"_code":409,"_error":"Failed to drop instance Broker_172.21.0.3_8099 - Instance Broker_172.21.0.3_8099 exists in ideal state for brokerResource"}
We were able to remove four of the instances, but it looks like two of them are still in use. Let’s see if we can figure out what’s going on.
We can return the ideal state for the brokerResource
by running the following command:
curl -X GET "http://localhost:9000/zk/get?path=%2FPinotCluster%2FIDEALSTATES%2FbrokerResource" \
-H "accept: text/plain" 2>/dev/null
{
"id" : "brokerResource",
"simpleFields" : {
"BATCH_MESSAGE_MODE" : "false",
"IDEAL_STATE_MODE" : "CUSTOMIZED",
"NUM_PARTITIONS" : "3",
"REBALANCE_MODE" : "CUSTOMIZED",
"REPLICAS" : "0",
"STATE_MODEL_DEF_REF" : "BrokerResourceOnlineOfflineStateModel",
"STATE_MODEL_FACTORY_NAME" : "DEFAULT"
},
"mapFields" : {
"courses_OFFLINE" : {
"Broker_172.21.0.3_8099" : "ONLINE",
"Broker_172.21.0.5_8099" : "ONLINE"
},
"parkrun_REALTIME" : {
"Broker_172.21.0.3_8099" : "ONLINE",
"Broker_172.21.0.5_8099" : "ONLINE"
},
"races_REALTIME" : {
"Broker_172.21.0.3_8099" : "ONLINE",
"Broker_172.21.0.5_8099" : "ONLINE"
}
},
"listFields" : { }
}
Let’s create a file, newBrokerState.json
, that removes the Broker_172.21.0.3_8099
entry.
The file should look like this:
{
"id" : "brokerResource",
"simpleFields" : {
"BATCH_MESSAGE_MODE" : "false",
"IDEAL_STATE_MODE" : "CUSTOMIZED",
"NUM_PARTITIONS" : "3",
"REBALANCE_MODE" : "CUSTOMIZED",
"REPLICAS" : "0",
"STATE_MODEL_DEF_REF" : "BrokerResourceOnlineOfflineStateModel",
"STATE_MODEL_FACTORY_NAME" : "DEFAULT"
},
"mapFields" : {
"courses_OFFLINE" : {
"Broker_172.21.0.5_8099" : "ONLINE"
},
"parkrun_REALTIME" : {
"Broker_172.21.0.5_8099" : "ONLINE"
},
"races_REALTIME" : {
"Broker_172.21.0.5_8099" : "ONLINE"
}
},
"listFields" : { }
}
Now navigate to the Zookeeper browser at http://localhost:9000/#/zookeeper and navigate down to Pinot Cluster → IDEALSTATES → brokerResource
.
Click on the edit button, paste in this JSON document, and click Update.
And now let’s do the same thing for races_REALTIME
.
The ideal state at the moment looks like this:
curl -X GET "http://localhost:9000/zk/get?path=%2FPinotCluster%2FIDEALSTATES%2Fraces_REALTIME" \
-H "accept: text/plain" 2>/dev/null
{
"id" : "races_REALTIME",
"simpleFields" : {
"BATCH_MESSAGE_MODE" : "false",
"IDEAL_STATE_MODE" : "CUSTOMIZED",
"INSTANCE_GROUP_TAG" : "races_REALTIME",
"MAX_PARTITIONS_PER_INSTANCE" : "1",
"NUM_PARTITIONS" : "11",
"REBALANCE_MODE" : "CUSTOMIZED",
"REPLICAS" : "1",
"STATE_MODEL_DEF_REF" : "SegmentOnlineOfflineStateModel",
"STATE_MODEL_FACTORY_NAME" : "DEFAULT"
},
"mapFields" : {
"races__0__0__20220127T1647Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__10__20220218T1304Z" : {
"Server_172.21.0.4_8098" : "CONSUMING"
},
"races__0__1__20220202T1635Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__2__20220203T1636Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__3__20220209T1442Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__4__20220210T1442Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__5__20220212T1807Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__6__20220214T1304Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__7__20220215T1304Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__8__20220216T1304Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__9__20220217T1304Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
}
},
"listFields" : { }
}
And now let’s replace Server_172.21.0.4_8098
with Server_172.21.0.6_8098
for the consuming segment.
{
"id" : "races_REALTIME",
"simpleFields" : {
"BATCH_MESSAGE_MODE" : "false",
"IDEAL_STATE_MODE" : "CUSTOMIZED",
"INSTANCE_GROUP_TAG" : "races_REALTIME",
"MAX_PARTITIONS_PER_INSTANCE" : "1",
"NUM_PARTITIONS" : "11",
"REBALANCE_MODE" : "CUSTOMIZED",
"REPLICAS" : "1",
"STATE_MODEL_DEF_REF" : "SegmentOnlineOfflineStateModel",
"STATE_MODEL_FACTORY_NAME" : "DEFAULT"
},
"mapFields" : {
"races__0__0__20220127T1647Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__10__20220218T1304Z" : {
"Server_172.21.0.6_8098" : "CONSUMING"
},
"races__0__1__20220202T1635Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__2__20220203T1636Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__3__20220209T1442Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__4__20220210T1442Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__5__20220212T1807Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__6__20220214T1304Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__7__20220215T1304Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__8__20220216T1304Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
},
"races__0__9__20220217T1304Z" : {
"Server_172.21.0.6_8098" : "ONLINE"
}
},
"listFields" : { }
}
We’ll update that node in Zookeeper the same way that we did with the brokerResource
, and now we can re-run our command to delete these two instances:
for instance in "Controller_172.21.0.4_9000" "Controller_172.21.0.2_9000" "Controller_172.21.0.5_9000" "Server_172.21.0.3_8098" "Server_172.21.0.4_8098" "Broker_172.21.0.3_8099"; do
curl -X DELETE "http://localhost:9000/instances/${instance}" \
-H "accept: application/json" 2>/dev/null;
printf "\n"
done
{"_code":404,"_error":"Instance Controller_172.21.0.4_9000 not found"}
{"_code":404,"_error":"Instance Controller_172.21.0.2_9000 not found"}
{"_code":404,"_error":"Instance Controller_172.21.0.5_9000 not found"}
{"_code":404,"_error":"Instance Server_172.21.0.3_8098 not found"}
{"status":"Successfully dropped instance"}
{"status":"Successfully dropped instance"}
The first 4 instances return a 404 status since we’ve already deleted them, but the last two have now been deleted!
Now if we navigate back to the home page of the Pinot Data Explorer, we’ll see that there are no bad instances anymore:
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.