Skip to main content
Question

Duplicate field values in Splunk events from Cribl

  • September 3, 2025
  • 3 replies
  • 143 views

Raffaele

Hello,

I’m using Cribl Cloud to pull JSON events from an Azure Event Hub and forward them to Splunk via HEC. Each incoming event contains a nested array field called records, for example:

{
"records": [
{
"FileName": "xx",
"FileType": "xx",
"NetworkMessageId": "xx",
"RecipientEmailAddress": "xx",
"RecipientObjectId": "xx",
"ReportId": "xx",
"SHA256": "xx",
"SenderDisplayName": "xx",
"SenderObjectId": "x",
"SenderFromAddress": "x",
"FileSize": x,
"Timestamp": "xx",
"TimeGenerated": "xx",
"_ItemId": "xx",
"TenantId": "xx",
"_TimeReceived": "xx",
"_Internal_WorkspaceResourceId": "xx",
"Type": "xx"
},
{
"FileName": "xx",
"FileType": "xx",
"NetworkMessageId": "xx",
"RecipientEmailAddress": "xx",
"RecipientObjectId": "xx",
"ReportId": "xx",
"SHA256": "xx",
"SenderDisplayName": "xx",
"SenderObjectId": "x",
"SenderFromAddress": "x",
"FileSize": x,
"Timestamp": "xx",
"TimeGenerated": "xx",
"_ItemId": "xx",
"TenantId": "xx",
"_TimeReceived": "xx",
"_Internal_WorkspaceResourceId": "xx",
"Type": "xx"
},
{
"FileName": "xx",
"FileType": "xx",
"NetworkMessageId": "xx",
"RecipientEmailAddress": "xx",
"RecipientObjectId": "xx",
"ReportId": "xx",
"SHA256": "xx",
"SenderDisplayName": "xx",
"SenderObjectId": "x",
"SenderFromAddress": "x",
"FileSize": x,
"Timestamp": "xx",
"TimeGenerated": "xx",
"_ItemId": "xx",
"TenantId": "xx",
"_TimeReceived": "xx",
"_Internal_WorkspaceResourceId": "xx",
"Type": "xx"
}
],
"_time": 1756902850.057,
"cribl": "yes",
"security_event_hub": "yes"
}

My goal is to split each element of the records array into a separate, flat event. Here’s what I’ve tried:

  • Unroll on records to produce individual events

  • Flatten to promote nested fields and delete records array

In Splunk, each field’s values are duplicated (and sometimes triplicated), as shown here: (censored values are equals between them)

 

I’ve identified that extracting nested values is causing this anomaly in Splunk.
I’ve tried numerous approaches to resolve it:

  • Replaced the Flatten function with an Eval expression like that:
    Object.assign(__e, Object.assign({}, __e, __e.rec || {})); delete __e.rec; delete __e.records;

  • Tested various JavaScript snippets in Code functions

  • Used JSON Unroll and JSON Decode functions

  • Toggled KV_MODE, AUTO_KV_JSON, and INDEXED_EXTRACTIONS on Heavy Forwarders and Search Heads

None of these solutions work consistently; in some cases values were even triplicated.
Do you have any suggestions to resolve this issue?

Thank you in advance for any insights or working examples.

3 replies

kprior
  • Employee
  • September 17, 2025

Hello! When you are building this within Cribl, are you seeing the duplicate events in the preview mode or only when they are parsing in Splunk? I have some ideas, but I wanted to check what behavior you’re seeing in Cribl before testing.


Prince Osei Bonsu

@Raffaele this simple pipeline will break the events. 

{
"id": "pl_eventHub",
"conf": {
"output": "default",
"streamtags": [],
"groups": {},
"asyncFuncTimeout": 1000,
"functions": [
{
"filter": "true",
"conf": {
"add": [
{
"disabled": false,
"name": "_raw",
"value": "JSON.stringify(records)"
}
],
"remove": [
"records"
]
},
"id": "eval"
},
{
"filter": "true",
"conf": {
"existingOrNew": "new",
"shouldMarkCriblBreaker": true,
"ruleType": "json_array",
"maxEventBytes": 51200,
"timestampAnchorRegex": "/^/",
"timestamp": {
"type": "auto",
"length": 150
},
"timestampTimezone": "local",
"timestampEarliest": "-420weeks",
"timestampLatest": "+1week",
"jsonExtractAll": false,
"eventBreakerRegex": "/[\\n\\r]+(?!\\s)/",
"existingRule": ""
},
"id": "event_breaker"
}
],
"description": ""
}
}

 


Angelo Michele Pizzi

Probably you are unable to do what you are trying to do because that’s not a valid JSON event. There are missing quotes on some fields (FileSize)