Skip to main content
Question

Full Read of S3 Bucket for only new items

  • March 11, 2025
  • 4 replies
  • 51 views

I am trying to pull in DNS Logs from Cisco Umbrella. I have my S3 Connector configured to pull down the logs every 5 minutes. The problem is that it pulls down all the logs. I only want the new logs. I have tried adding the follow to my Conf section of the JSON configuration but it doesn't appear to work.

"trackProgress": true,     

"trackMode": "file",     

"trackField": "s3.object.etag",     

"useLastModifiedTime": true

4 replies

Jon Rust
Forum|alt.badge.img
  • Employee
  • March 11, 2025

Normally the best way to ingest new objects from S3 is to use the S3 Source (not Collector) which requires an SQS subscription. Any time a new object is created in the bucket, an SQS item is created, which gets consumed by Cribl, and the object is downloaded.


  • Author
  • Participating Frequently
  • March 11, 2025

It's a Cisco managed bucket so it looks like I don't have access the queue to use SQS or to setup the source. What I do have though is some on-premise S3 compatible storage. What if… I configured the s3 Collector to initially pull down everything, but when I scheduled the task, It would only grab the last 6 minutes of data and I'll run it every 5 minutes. There would be some overlap but, I could setup another source, which would be the s3 compatible on premise and then use the S3 Source to stream that data in while using the checkpoint option? Thoughts?


Jon Rust
Forum|alt.badge.img
  • Employee
  • March 11, 2025

If you have a way to identify times without opening the objects themselves, yeah, that's going to be you're only option. The S3 Collector does not track the files it has seen. It can only use time filters to isolate pulls to the newest objects based on hints to the timestamp in the path.


  • Author
  • Participating Frequently
  • March 11, 2025

I could use the AWS CLI app to do a folder sync but I have to have a way of deleting those on-premise files routinely to manage storage space.