Hello all, I am facing an issue where one of the Workers in one of the Worker Groups seems to be stuck in deploying a version while the other worker updated the version correctly as per the attached screenshot, Have anyone faced the same issue and what kind of troubleshooting / workaround can be done , Worth mentioning that I have tried rebooting the stuck worker twice and rebooted the leader as well.
Go to the worker CLI -> type ' ps -ef | grep cribl ' ..There is a good chance a runaway process is still refusing to go downKill the process -> stop -> start .. normally does the trick
Thanks Raanan, Will give it a try but can you help me understand what is the runaway process ?
Check if there is any PQ data or dst staging directory in CRIBL_HOME that is many GB in size. If so then relocate those directories outside of CRIBL_HOME. It's possible the backup is taking too long , or outright failing, which causes the config to not get loaded in a timely manner.
Thanks <@U012ZP93EER> and <@U01J549PR6Y>, Just for the sake of the record we found the inputs.yaml owned by the root user while the whole application running by functional user
The runaway process would be a cribl process.
This was the main reason of having the Worker stuck in the version allocation process after restart
Good find and happy to hear the issue has been fixed
Reply
Login to the community
No account yet? Create an account
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.