Skip to main content

How to Detect a Double Gzip-Compressed S3 Object and Why AWS Console vs CLI Downloads Differ

  • June 24, 2026
  • 0 replies
  • 2 views

Jessica Bracken

Objective

Confirm whether an Amazon S3 object is gzip-compressed more than once, and obtain the exact bytes Cribl Stream reads when diagnosing garbled output from an S3 Source or S3 Collector. This is a reusable diagnostic for any "garbled output from S3" case, independent of which application or service writes the objects.

A Cribl S3 Source or Collector decompresses exactly one gzip layer. It detects gzip by inspecting the object's leading magic bytes (1f 8b 08); it does not honor the Content-Encoding header, does not use the .gz extension, and does not recursively decompress. An object compressed twice (gzip( gzip( payload ) )) therefore reaches the pipeline with the inner gzip layer still intact, which surfaces as garbled binary _raw. This procedure confirms that condition before any AWS-side fix is attempted.

Environment

  • Cribl Stream 4.12.1
  • Cribl Stream (Source or Collector) — Cloud or on-prem
  • Amazon S3 Source (SQS-based) or S3 Collector
  • A host with the AWS CLI installed and credentials that can read the bucket
  • A shell with file, gunzip, and xxd available

Procedure

Download the object with the AWS CLI, never the Console

Always pull the object with the AWS CLI. The CLI returns the raw stored bytes exactly as Cribl reads them.

  1. Download one failing object with the CLI.

    aws s3 cp s3://<bucket>/<key> ./obj.gz

⚠️ Do not use an AWS Console or browser download for this test. A browser honors the object's Content-Encoding: gzip system metadata and transparently decompresses one layer in transit. A Console-downloaded copy of a double-gzipped object therefore needs only one manual gunzip and appears correct — masking the second layer and leading to a misdiagnosis. The CLI performs no such decompression. Content-Encoding is S3 system metadata; see Working with S3 object metadata.

Run the two-pass gunzip test

  1. Confirm the file is gzip.

    file obj.gz # -> gzip compressed data
  2. Decompress one layer and inspect the result. If it is still gzip, the object carries a second layer.

    gunzip -c obj.gz | file - # -> STILL "gzip compressed data" => double-compressed gunzip -c obj.gz | xxd | head -1 # -> begins again with 1f 8b 08 (gzip magic)
  3. Decompress the second layer to reach the real payload.

    gunzip -c obj.gz | gunzip -c | head # -> the actual payload (e.g. JSON) appears only after TWO passes

Interpret the result

  1. Count how many gunzip passes are needed to reach readable text or JSON.

Passes to reach payload

Conclusion

1 pass → text/JSON

Single-gzip. Cribl reads this correctly. Not a double-gzip problem — look elsewhere (event breaker, parser, character encoding).

2 passes → text/JSON

Double-gzip confirmed. The inner layer is what Cribl emits as garbled _raw. Remove one compression layer at the writer (AWS side).

3+ passes → text/JSON

Multi-gzip — same class of problem. Remove the extra layers upstream.

Needing two gunzip passes to reach the payload means two gzip layers are present. A single-gzip object reaches the payload in one pass and is read correctly by Cribl, regardless of its Content-Encoding value.

Last Validated

Cribl Stream 4.12.1

Additional Information

  • Why Cribl produces the garbled output: Cribl's S3 read path decompresses object bodies via a single magic-byte-gated step that strips exactly one gzip layer. It does not honor Content-Encoding, does not key off the .gz extension, and does not loop until the payload is no longer compressed. The AWS SDK does not auto-decompress GetObject bodies either. So a double-gzip object yields the inner gzip layer as binary _raw. This behavior is correct and standard; the fix belongs on the AWS side.
  • Content-Encoding is not the cause of the garble. Its only relevance here is the diagnostic trap above: it causes a Console/browser download to silently strip one layer, which misleads testing. It plays no part in how Cribl reads the object.
  • Companion solution article: For the most common real-world source of double-gzip and the AWS-side fix, see Cribl Stream S3 Source or Collector Emits Garbled Output for CloudWatch Logs to Firehose to S3 Objects (Double-Gzip) 
  • AWS reference: S3 object metadata — system-defined metadata (Content-Encoding).