I'm sending JSON data to Splunk from cribl (Cloudflare logs) and it causes the data to be a horrendous .30:1 index size to raw size ratio I'm sending _raw as text. There are no props for that sourcetype on the indexer side so there is no indexed extractions. Any help appreciated
May I see a screenshot of the event in the OUT of Cribl?
As an object it is more, but Splunk will restringify it anyway.
Update with the help of <@U01C35EMQ01> <@U020E25MRU1> we found a bug in splunk that if you have in _raw :: splunk creates indexed fields of it
And did Splunk support validate it too?
No response yet from splunk support but I've validated with a few other people it's happening in there environment too
Did you try the escaped colons?
To summarize, if there are ten pairs of `sometext::randomtext` in a single event in `_raw` then Splunk is creating 10 additional index fields. That sound about right <@U020VPXGT34> ?
And does the same behavior happen when events are not sent with Cribl?
From my testing so far didn't try escaping it but did try replacing it with a single colon and the issue didn't appear issue was observed when sending via HF UF crible
Very late here, but wanted to say this is seen in many proxy logs as several advertisers use double colons. Splunk treats treats "::" as a field/value separator and indexes the data as such. This is a situation where it would be awesome if tools like Cribl would flag and fix the data automatically for us.
Reply
Login to the community
No account yet? Create an account
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.