Solved

Cribl Search Performance Impact When Sending Only _raw Field Versus Extracted Fields

Forum|Forum|2 months ago
December 24, 2025
7 replies
25 views

CarlosM
New Participant

This message originated from Cribl Community Slack.
Click here to view the original link.

Hello team! I’d like to understand the difference between sending only the _raw field to Lake, and sending _raw along with some extracted/parsed fields. I assume it will affect the size of the event, but more importantly, how does it affect the performance in Cribl Search? Additionally, are there any recommendations for using JSON/Parquet data format?

Best answer by jinsea

If you send in _raw and then some parsed fields from that _raw you are going to be duplicating data. Inside the cribl lake you can do things like dataset="staging" AND source_name="syslog_source" | extend raw = parse_json(_raw) in your KQL to more easily see the _raw data :smile:

J

jinsea
Inspiring
Answer
Forum|Forum|2 months ago
December 24, 2025

If you send in _raw and then some parsed fields from that _raw you are going to be duplicating data. Inside the cribl lake you can do things like dataset="staging" AND source_name="syslog_source" | extend raw = parse_json(_raw) in your KQL to more easily see the _raw data :smile:

Like

J

jinsea
Inspiring
Forum|Forum|2 months ago
December 24, 2025

also I think someone mentioned parquet might be better performant but we have only done json so far. :smile:

Like

CarlosM
Author
New Participant
Forum|Forum|2 months ago
December 24, 2025

awesome, thanks for the info @user! :slightly_smiling_face: It can be more of a Cribl Search question, but do you know what is better for searching performance? For example:

dataset="staging" AND source_name="syslog_source" 
| extend raw = parse_json(_raw)
| where raw.ip=="1.1.1.1"

vs

dataset="staging" AND source_name="syslog_source"  ip=="1.1.1.1"

Like

R

Rick Salsa
Employee
Forum|Forum|2 months ago
December 24, 2025

Honestly, it depends on the size of the dataset being searched, but they should be fairly close in performance. Ideally, you've got your highly searched fields as top-level events.

Like

R

Rick Salsa
Employee
Forum|Forum|2 months ago
December 24, 2025

In our current implementation, JSON has better compression, but if your events contain a lot of repeated values in fields (e.g. Kubernetes logs), you'll see smaller file sizes by leveraging Parquet.

Like

R

Rick Salsa
Employee
Forum|Forum|2 months ago
December 24, 2025

If you end up using Parquet, have those commonly searched fields as top-level fields; Search will be able to only pull out those fields requested rather than all fields. See here for more detail: https://cribl.io/blog/cribl-search-parquet-pushdowns-smooth-like-butter/

Like

CarlosM
Author
New Participant
Forum|Forum|2 months ago
December 24, 2025

Great info, thanks for your help @user!

Like

Sign up

Using your Cribl Curious or University Account

Login to the community

Using your Cribl Curious or University Account

Scanning file for viruses.

This file cannot be downloaded