Skip to main content

Overview: Extract and Batch for OpenTelemetry in Cribl

  • May 13, 2026
  • 0 replies
  • 17 views

Emily Ashley

Overview: Extract and Batch for OpenTelemetry in Cribl Stream

 

When you’re working with OpenTelemetry and Cribl, two features tend to show up together: the Extract toggles on the OpenTelemetry (OTel) Source, and the Batch options in the OTLP Logs, OTLP Traces, and OTLP Traces Functions.

We’ll focus mainly on Logs or Spans here and dive deeper into Metrics in a separate post.

In this article, we’ll walk through how OpenTelemetry natively emits batches, what Extract does for Logs, how our OTLP Functions transform and re-batch events, and when you might intentionally not re-batch.

 

OpenTelemetry Emits Batches of Data

First, it helps to talk about OpenTelemetry standard behavior.  OTLP sends signals in batches, not individual events. (See: batch processors) A single OTLP payload for logs, metrics, or traces usually looks something like the examples in this json proto example repo:

Across all three signals, the structure is the same pattern:

  • A Resource that describes where the data came from (service.name, host info, cloud region, and so on)
  • A Scope / Instrumentation Scope that describes which library produced the data
  • A collection of records:
    • Logs: logRecords[]
    • Metrics: a set of dataPoints[] per metric
    • Traces: spans[]

So you might see one OTLP Log batch that contains three log records, and the next batch might contain a hundred. By default, Cribl doesn’t see those as 100 separate events. Cribl’s OTel Source receives each OTLP export request as one event that contains a collection of logs, metrics, or spans.

 

The OTel Source: Extract for Logs, Metrics, and Spans

Let’s talk about what Cribl does with those OTLP payloads when they arrive at the OpenTelemetry (OTel) Source in Cribl Stream or Edge.

On the OTel Source, you’ll see three toggles:

  • Extract logs
  • Extract metrics
  • Extract spans

If you leave these disabled, Cribl behaves like a pass‑through from the perspective of the OTLP structure. For each incoming OTLP payload (per signal), Stream creates one Cribl event that looks a lot like the original:

  • There’s a resource object at the top
  • A scope / instrumentation scope object
  • And then a grouping of logs, spans, or metric data points

That’s perfect if you want to Cribl to be the control plane and act as a smart broker for routing, TLS, and backpressure. But once you turn extract on for a signal, things get a lot more interesting!

When extract is enabled:

  • A batch of N records turns into N individual events in Cribl.
    • One OTLP log batch with 100 logRecords becomes 100 events.
    • One OTLP trace batch with 50 spans becomes 50 events.
  • For each of those new events, the OTel Source:
    • Copies the Resource attributes down onto the event.
    • Copies the relevant Scope attributes down as well.
    • Transforms attributes into short-form key:value pairs for ease of use in Cribl pipelines

Now each log event in the pipeline is self‑contained:

  • It has its own body, timestamps, IDs, severity, attributes, and so on.
  • It also carries the full resource and scope context that used to live at the top of the OTLP batch.

That self‑contained shape is what makes individual-event processing so powerful. You can:

  • Mask or redact sensitive data at the per‑event level
  • Drop “just the noisy bits” instead of whole batches
  • Enrich specific logs or spans based on fields in the body or attributes

All without having to dig deep into a nested structure.

The price you pay for that flexibility is duplication: every event from that batch now has a copy of the same resource attributes. If you did nothing else and sent those individual events out as‑is, you’d see your payload size jump.

When it’s OTLP in -> OTLP out, the most recommended flow is to Extract first, do your work in Pipelines, and then rebatch with the OTLP Functions before sending data downstream.

 

OTLP Functions

Now that we’ve talked about extraction at the Source, let’s look at how the OTLP Logs Function puts things back together.

At a high level, this function’s job is to:

  • Take events mapped to our short-form, self-contained OTLP structure
  • Transform them into the more long-form OTLP spec (more like the proto.json examples)
  • Re-batch those logs by shared resource context before they’re handed to an OTel Destination for serialization and transmission.

You’ll find the full reference here:
OTLP Logs Function.


What OTLP Batching Looks Like

When you turn on Batch OTLP logs in the Function, you’re telling it to regroup individual log events back into OTLP ResourceLogs payloads.

Under the hood, the function:

  • Groups log events that share the same Resource attributes.
  • Within a time window, builds batches where:
    • The shared resource is written once at the top of the batch.
    • Under that, you’ll see scope block with its list of log_records[].

It undoes the per‑event duplication that Extract introduced. 

You start with “N events each carrying the same resource attributes,” and you end up with “one resource block and N log records that reference it.”

You can tune this behavior with settings like:

  • Batch OTLP logs (on/off)
  • Batch size (how many log records to accumulate before sending a batch)
  • Batch timeout (ms) (how long to hold a partial batch before sending it anyway)
  • Batch size limit (KB) (a cap on batch payload size)
  • Batch log metadata keys (optional extra keys to partition batches, beyond resource)
  • Metadata cardinality limit (a safety valve to prevent runaway explosion of batchers when metadata has very high cardinality)

Once you’ve got this wired up, you’re back to sending OTLP Logs in a way that looks a lot like what comes out of an OTel SDK or Collector’s batch processor — but after you’ve had a chance to do all that pipeline magic in between. ✨


To Batch or Not Batch: Choosing Based on the Use Case

Think about it like this: batch when you are optimizing for OTLP transport and downstream OTLP backend parsing, and skip rebatching when you are optimizing for non-OTLP destinations, row-oriented storage, or per-record analytics inside Cribl Search.

Use case

Recommended pattern

Why

OTLP in → transform in Cribl → OTLP out

Extract at the OTel Source, do your pipeline work, then batch again with the OTLP Logs, Metrics, or Traces Function before sending to the OTel Destination.

This gives you per-event control in the pipeline, but still sends spec-friendly OTLP batches downstream.

Extraction duplicates shared resource attributes onto every event; rebatching writes that shared context once at the top of the OTLP batch instead of repeating it N times

OTLP in → transform in Cribl → another format out

Extract first then convert and send the data in the shape your non-OTLP destination expects. In this pattern, you do not rebatch into OTLP.

If the destination is row-based, schema-driven, or not expecting OTLP resource/scope batch structures, keeping events flat is usually simpler for downstream parsing, storage, and analytics.

This keeps each event self-contained.

Converting non-OTLP data into OTLP

Parse and map the source data into Cribl’s internal OTLP shape, then use OTLP functions to transform and batch before OTLP egress.

If the destination is OTLP-aware, batching helps produce the resource/scope/record grouping that OTLP backends expect to deserialize cleanly.

Batching also minimizes data sent over the wire by only writing resource attributes once per batch instead of on each event.

 

Sending metrics to an OTLP-compliant backend

Convert to Cribl metric events, do metric-aware processing, then use the OTLP Metrics Function with batching enabled

For metrics, the batching stage is also where you explicitly choose which fields should be hoisted back into resource attributes as shared context versus which stay as individual metric dimensions.

This can affect downstream view, labels,, storage cost, and cardinality.


Use cases where you may not want to rebatch

A classic example is sending data into Cribl Search or any other search or analytics system that expects one event per row. Storing the data in batches may make sense for long term compliance, but keeping the data in extracted, per-event form is often the better choice for datasets you intend to use. 

That shape works well because:

  • Each event already carries its own body, IDs, and full resource/scope context.
  • Indexing is simpler because the fields you care about live directly on the event instead of inside OTLP batch arrays.
  • Querying is easier because you do not have to reason about nested resourceLogs, resourceSpans, or metric batch structures just to filter on service, host, namespace, or scope fields.

This is usually the right choice when Cribl Search, a data lake, or another row-based analytics platform is the final home for the telemetry. In that world, you are optimizing for storage and query usability, not wire efficiency.

When you need both storage and OTLP egress

You do not have to choose one or the other. If you want searchable flat records for storage and also need spec-compliant OTLP output, branch the pipeline:

  • One branch keeps the extracted events and writes them to Search or lake storage.
  • Another branch rebatches those events and sends them to an OTLP Destination.

That way, the same incoming telemetry can support both a clean search experience and a proper OTLP egress path.

This pattern also applies when you are querying data back out of your lake and then sending results onward as OTLP: keep the flat shape where it helps storage and query, then rebatch only at the point where you need OTLP again.