Symptom
An Amazon S3 Source (SQS-based) or S3 Collector emits garbled binary _raw instead of the expected decompressed JSON log records. The output looks like random bytes, for example:
��g��n��#hS,…
The affected objects are .gz files written to S3 by Amazon Data (Kinesis) Firehose. Cribl Stream logs no Source-level decompression errors, and the Sources report healthy.
Environment
- Cribl Stream 4.12.1 (behavior applies broadly to the Amazon S3 Source and the S3 Collector across versions).
- Self-managed and Cloud deployments.
- Amazon S3 bucket populated by Amazon Data Firehose, fed by a CloudWatch Logs subscription filter: CloudWatch Logs → subscription filter → Firehose → S3.
Resolution
The fix is on the AWS side — no Cribl Stream change is required. Remove one of the two gzip compression layers at the Firehose S3 destination. Choose one of the following options.
- Set the Firehose S3 destination compression to uncompressed (Terraform
compression_format = "UNCOMPRESSED"). This is the AWS-recommended setting when the source is already-gzipped CloudWatch Logs, and it leaves the single CloudWatch gzip layer that Cribl Stream reads correctly. - Alternatively, keep
compression_format = "GZIP"and enable the Firehose Decompress source records option for the CloudWatch Logs source. Firehose then decompresses the inbound CloudWatch gzip before re-compressing once, again leaving a single gzip layer.
Either option leaves exactly one gzip layer (or plain records), which Cribl Stream decompresses correctly.
Note: Validate the change in a DEV environment first and involve the AWS architects who own the delivery stream — this is an AWS pipeline change, not a Cribl change.
⚠️ The fix applies only to objects written after the change. Existing double-gzipped objects remain unreadable and require a one-time reprocessing pass to remove the extra layer.
Quick confirmation
Confirm the double compression on an object downloaded with the AWS CLI (not the browser Console). Two gunzip passes are needed to reach the JSON payload:
aws s3 cp s3://<bucket>/<key> - | gunzip -c | gunzip -c | head
Readable JSON only after the second pass confirms two gzip layers. The Console-vs-CLI download distinction matters here and is covered in the companion article How to Detect a Double Gzip-Compressed S3 Object (and Why AWS Console vs CLI Downloads Differ).
Cause
The S3 objects are double gzip-compressed: object = gzip(gzip(JSON)). Two stages in the delivery chain each apply gzip.
- CloudWatch Logs subscription delivery to Firehose is already gzip-compressed. Per AWS: "Data sent from CloudWatch Logs to Amazon Data Firehose is already compressed with gzip level 6 compression, so you do not need to use compression within your Firehose delivery stream." (CloudWatch Logs subscription filters — Firehose example)
- The Firehose delivery stream then gzips the data a second time (
extended_s3_configurationcompression_format = "GZIP", with no decompression transform).
Cribl Stream's S3 reader decompresses exactly one gzip layer. Decompression is performed by SmartGunzipTransform, which is gated on the gzip magic bytes (0x1f 0x8b) and strips a single layer. Because the object carries two layers, stripping the outer layer leaves the inner gzip stream, which surfaces as binary _raw — the garbled output. This single-layer behavior is correct and standard; it is the double compression on the AWS side that produces the unreadable result.
This is not a Content-Encoding problem and not a Cribl Stream defect. The Content-Encoding: gzip object metadata set by Firehose plays no part in how Cribl Stream reads the object, and the AWS SDK does not auto-decompress the object body based on that header.
Last Validated
Cribl Stream 4.12.1
Additional Information
- Companion diagnostic article: How to Detect a Double Gzip-Compressed S3 Object (and Why AWS Console vs CLI Downloads Differ) — covers the two-pass
gunzipconfirmation in detail and explains why a browser/Console download of a failing object can look correct while the CLI download does not. - AWS reference: Using CloudWatch Logs subscription filters — Amazon Data Firehose example.
- Diagnostic heuristic: any garbled S3 Source or S3 Collector
_rawfrom a Firehose-fed bucket should trigger a double-gzip check first. The behavior is SDK-agnostic, so the S3 Source and S3 Collector fail identically on the same object.
