Skip to main content

I am trying to pull in DNS Logs from Cisco Umbrella. I have my S3 Connector configured to pull down the logs every 5 minutes. The problem is that it pulls down all the logs. I only want the new logs. I have tried adding the follow to my Conf section of the JSON configuration but it doesn't appear to work.

"trackProgress": true,     

"trackMode": "file",     

"trackField": "s3.object.etag",     

"useLastModifiedTime": true

Normally the best way to ingest new objects from S3 is to use the S3 Source (not Collector) which requires an SQS subscription. Any time a new object is created in the bucket, an SQS item is created, which gets consumed by Cribl, and the object is downloaded.


It's a Cisco managed bucket so it looks like I don't have access the queue to use SQS or to setup the source. What I do have though is some on-premise S3 compatible storage. What if… I configured the s3 Collector to initially pull down everything, but when I scheduled the task, It would only grab the last 6 minutes of data and I'll run it every 5 minutes. There would be some overlap but, I could setup another source, which would be the s3 compatible on premise and then use the S3 Source to stream that data in while using the checkpoint option? Thoughts?


If you have a way to identify times without opening the objects themselves, yeah, that's going to be you're only option. The S3 Collector does not track the files it has seen. It can only use time filters to isolate pulls to the newest objects based on hints to the timestamp in the path.


I could use the AWS CLI app to do a folder sync but I have to have a way of deleting those on-premise files routinely to manage storage space.


Reply