Skip to main content
Solved

Cribl Search Performance Impact When Sending Only _raw Field Versus Extracted Fields

  • December 24, 2025
  • 7 replies
  • 10 views

CarlosM
This message originated from Cribl Community Slack.
Click here to view the original link.

Hello team! I’d like to understand the difference between sending only the _raw field to Lake, and sending _raw along with some extracted/parsed fields. I assume it will affect the size of the event, but more importantly, how does it affect the performance in Cribl Search? Additionally, are there any recommendations for using JSON/Parquet data format?

Best answer by jinsea

If you send in _raw and then some parsed fields from that _raw you are going to be duplicating data. Inside the cribl lake you can do things like dataset="staging" AND source_name="syslog_source" | extend raw = parse_json(_raw) in your KQL to more easily see the _raw data :smile:

7 replies

  • Inspiring
  • Answer
  • December 24, 2025
If you send in _raw and then some parsed fields from that _raw you are going to be duplicating data. Inside the cribl lake you can do things like dataset="staging" AND source_name="syslog_source" | extend raw = parse_json(_raw) in your KQL to more easily see the _raw data :smile:

  • Inspiring
  • December 24, 2025
also I think someone mentioned parquet might be better performant but we have only done json so far. :smile:

CarlosM
  • Author
  • New Participant
  • December 24, 2025
awesome, thanks for the info @user! :slightly_smiling_face: It can be more of a Cribl Search question, but do you know what is better for searching performance? For example:
dataset="staging" AND source_name="syslog_source" 
| extend raw = parse_json(_raw)
| where raw.ip=="1.1.1.1"
vs
dataset="staging" AND source_name="syslog_source"  ip=="1.1.1.1"

  • Employee
  • December 24, 2025
Honestly, it depends on the size of the dataset being searched, but they should be fairly close in performance. Ideally, you've got your highly searched fields as top-level events.

  • Employee
  • December 24, 2025
In our current implementation, JSON has better compression, but if your events contain a lot of repeated values in fields (e.g. Kubernetes logs), you'll see smaller file sizes by leveraging Parquet.

  • Employee
  • December 24, 2025
If you end up using Parquet, have those commonly searched fields as top-level fields; Search will be able to only pull out those fields requested rather than all fields. See here for more detail: https://cribl.io/blog/cribl-search-parquet-pushdowns-smooth-like-butter/

CarlosM
  • Author
  • New Participant
  • December 24, 2025
Great info, thanks for your help @user!