I’m curious how the GCS source keeps track of files that it has already consumed and what prevents Stream from consuming the same file more than once? Our use case involves a BigQuery export of Gmail activity logs to a GCS bucket. We’d like to transfer those events to an S3 data lake (I know, moving from one cloud storage to another; we have reasons, but I digress). Ideally, we’d like to consume the events from each file in the GCS bucket and upon success, delete the file from GCS, but at the least, we want to make sure we don’t ingest the same file(s) over and over again.
Thanks!
