Objective
Confirm whether an Amazon S3 object is gzip-compressed more than once, and obtain the exact bytes Cribl Stream reads when diagnosing garbled output from an S3 Source or S3 Collector. This is a reusable diagnostic for any "garbled output from S3" case, independent of which application or service writes the objects.
A Cribl S3 Source or Collector decompresses exactly one gzip layer. It detects gzip by inspecting the object's leading magic bytes (1f 8b 08); it does not honor the Content-Encoding header, does not use the .gz extension, and does not recursively decompress. An object compressed twice (gzip( gzip( payload ) )) therefore reaches the pipeline with the inner gzip layer still intact, which surfaces as garbled binary _raw. This procedure confirms that condition before any AWS-side fix is attempted.
Environment
- Cribl Stream 4.12.1
- Cribl Stream (Source or Collector) — Cloud or on-prem
- Amazon S3 Source (SQS-based) or S3 Collector
- A host with the AWS CLI installed and credentials that can read the bucket
- A shell with
file,gunzip, andxxdavailable
Procedure
Download the object with the AWS CLI, never the Console
Always pull the object with the AWS CLI. The CLI returns the raw stored bytes exactly as Cribl reads them.
-
Download one failing object with the CLI.
aws s3 cp s3://<bucket>/<key> ./obj.gz
⚠️ Do not use an AWS Console or browser download for this test. A browser honors the object's Content-Encoding: gzip system metadata and transparently decompresses one layer in transit. A Console-downloaded copy of a double-gzipped object therefore needs only one manual gunzip and appears correct — masking the second layer and leading to a misdiagnosis. The CLI performs no such decompression. Content-Encoding is S3 system metadata; see Working with S3 object metadata.
Run the two-pass gunzip test
-
Confirm the file is gzip.
file obj.gz # -> gzip compressed data -
Decompress one layer and inspect the result. If it is still gzip, the object carries a second layer.
gunzip -c obj.gz | file - # -> STILL "gzip compressed data" => double-compressed gunzip -c obj.gz | xxd | head -1 # -> begins again with 1f 8b 08 (gzip magic) -
Decompress the second layer to reach the real payload.
gunzip -c obj.gz | gunzip -c | head # -> the actual payload (e.g. JSON) appears only after TWO passes
Interpret the result
-
Count how many
gunzippasses are needed to reach readable text or JSON.
| Passes to reach payload | Conclusion |
|---|---|
| 1 pass → text/JSON | Single-gzip. Cribl reads this correctly. Not a double-gzip problem — look elsewhere (event breaker, parser, character encoding). |
| 2 passes → text/JSON | Double-gzip confirmed. The inner layer is what Cribl emits as garbled |
| 3+ passes → text/JSON | Multi-gzip — same class of problem. Remove the extra layers upstream. |
Needing two gunzip passes to reach the payload means two gzip layers are present. A single-gzip object reaches the payload in one pass and is read correctly by Cribl, regardless of its Content-Encoding value.
Last Validated
Cribl Stream 4.12.1
Additional Information
- Why Cribl produces the garbled output: Cribl's S3 read path decompresses object bodies via a single magic-byte-gated step that strips exactly one gzip layer. It does not honor
Content-Encoding, does not key off the.gzextension, and does not loop until the payload is no longer compressed. The AWS SDK does not auto-decompressGetObjectbodies either. So a double-gzip object yields the inner gzip layer as binary_raw. This behavior is correct and standard; the fix belongs on the AWS side. Content-Encodingis not the cause of the garble. Its only relevance here is the diagnostic trap above: it causes a Console/browser download to silently strip one layer, which misleads testing. It plays no part in how Cribl reads the object.- Companion solution article: For the most common real-world source of double-gzip and the AWS-side fix, see Cribl Stream S3 Source or Collector Emits Garbled Output for CloudWatch Logs to Firehose to S3 Objects (Double-Gzip)
- AWS reference: S3 object metadata — system-defined metadata (
Content-Encoding).
