Skip to main content
Solved

FDR SQS Ingestion Throughput Issue With 6 Receivers Not Handling 1.5TB Daily Volume

  • April 8, 2026
  • 9 replies
  • 0 views

This message originated from Cribl Community Slack.
Click here to view the original link.

Has anyone figured out the optimal number of receivers for FDR/SQS based ingestion? We have around 1.5TB of daily volume from CS FDR but 6 receivers and 10 message limit is not keeping up

Best answer by Brandon McCombs

Typically you should leave the defaults alone. FDR data is massive so a single file series (1 sqs message) is enough FDR data to keep a process busy and that isn't including data from other inputs. We set the defaults to 1 and 1 for that reason. If you do more then you are simply oversubscribing the processes and making them beg for mercy. Just like with other inputs, the proper way to scale to higher data volume is with more processes. Increasing the pollers and messages shouldn't ever be needed for FDR data. It can be necessary for general s3/sqs collection but only when there are millions of small files.

9 replies

Typically you should leave the defaults alone. FDR data is massive so a single file series (1 sqs message) is enough FDR data to keep a process busy and that isn't including data from other inputs. We set the defaults to 1 and 1 for that reason. If you do more then you are simply oversubscribing the processes and making them beg for mercy. Just like with other inputs, the proper way to scale to higher data volume is with more processes. Increasing the pollers and messages shouldn't ever be needed for FDR data. It can be necessary for general s3/sqs collection but only when there are millions of small files.

I’ll add that those settings are applied per process and are multiplied. So you currently have 6x10 messages PER process and each message would contain file parts for a file that is potentially a GB or larger in size. So the processes would be completely saturated and not be able to finish some parts before visibility timeout is reached, thus causing re downloads and duplication.

Okay, thanks. If there's a large backlog in SQS we just need to scale the workers up to catch up? > Just like with other inputs, the proper way to scale to higher data volume is with more processes.

It's currently sitting at 3,500

Yes. If it was just lots of tiny files then you could increase the messages and receivers because it wouldn't necessarily be a massive change in data volume. But with FDR it is so you have to scale properly to accommodate it.

I think I have a better understanding now. Thanks for clarifying!

Do I need to look at the throughput in or out when I am optimizing the worker node size? Say I am pulling in 30 MBPs on average but sending out 92MBPs. Which one should I be optimizing around?

Sorry, I am dumb. It says ingest rate so there's my answer

Both. Our sizing guidelines state you must factor in both in+out because sending out isn’t free computationally.