The sizing pages talk about scaling in regards to data volumes in and out. However, is there any guidance on when to add additional processors due to a large number of consecutive tasks (i.e. Collection Jobs)? Does each Collection Job run in its own worker process? On an 2 node worker group with 8 vCPUs/ea could we run into queuing issues if we had 50 or 100 collectors attempting to run at the same time?
Jobs are broken into tasks which are put into a job queue and are taken off the queue in the leader node as the worker processes in the group request tasks to complete. In this manner, all the data that was discovered is distributed as evenly as possible across the worker group.
Something to consider is the limits page regarding the number of jobs/tasks that can be run concurrently: https://docs.cribl.io/stream/collectors-job-limits/
for larger collection use cases, i'd encourage a separate worker group dedicated to collection
Thank you all
It's best to avoid scheduling jobs in such a way that they run simultaneously. Some overlap may be unavoidable but the more processes that are available then the more tasks that can be executed to finish a job.
Reply
Login to the community
No account yet? Create an account
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.