Skip to main content

Hi, I have gone through the cribl university user and admin courses and setting up the product is quite easy however am constantly stuck trying to parse or extract strings of data into fields so that we can search or use the information.

For systems which send in JSON or CSV it is very simple, however most of ours are embedded Linux or network devices which give something like the following:

<183> 02/06/2025:00:24:20 GMT NYDC1VPX01-DMZ 0-PPE-0 : default SSLLOG SSL_HANDSHAKE_SUCCESS 9838002 0 : SPCBId 33168208 - ClientIP 10.1.1.130 - ClientPort 57486 - VserverServiceIP 10.200.80.80 - VserverServicePort 443 - ClientVersion TLSv1.2 - CipherSuite "TLS1-AES-256-CBC-SHA" - Session New - HandshakeTime 47 msShow

What combination of functions can split this out into fields such as hostname, clientip, clientport, etc.

I have tried using ChatGPT to help and can get regex and grok expressions from the AI however these don't work when applying to cribl. I don't see many questions on this topic which makes me think it is very easy for everyone else and I am doing something wrong!

I'd use Regex Extract maybe for the first bit, up to SSL_HANDSHAKE_SUCCESS (as eventtype):

^\<(?<pri>\d+)\>\s*(?<time>\S+\s[A-Z]+)\s(?<host>\S+)\s\S+[\s:]+default\s(?<logtype>\S+)\s(?<eventtype>\S+).*?-\s(?<therest>ClientIP.*)

So you'll get time, host, logtype and eventtype, plus the rest of the log in therest

Now a new RegEx extract with the source field set to therest:

(?<_KEY_0>\w+)\s(?<_VALUE_0>[^\-]+)

Set the g flag on this regex. You should get all the fields extracted from the remainder of the log


Try this pipeline:

{  "id": "curious-syslog",  "conf": {  "output": "default",  "streamtags": [],  "groups": {},  "asyncFuncTimeout": 1000,  "functions": [  {  "filter": "true",  "conf": {  "comment": "Extract the \"header\" of the log first"  },  "id": "comment"  },  {  "filter": "true",  "conf": {  "source": "_raw",  "iterations": 100,  "overwrite": false,  "regex": "/^\\<(?<pri>\\d+)\\>\\s*(?<time>\\S+\\s[A-Z]+)\\s(?<host>\\S+)\\s\\S+[\\s:]+default\\s(?<logtype>\\S+)\\s(?<eventtype>\\S+).*?-\\s(?<therest>ClientIP.*)/"  },  "id": "regex_extract"  },  {  "filter": "true",  "conf": {  "comment": "Now extract the key value pairs from the rest"  },  "id": "comment"  },  {  "filter": "true",  "conf": {  "source": "therest",  "iterations": 100,  "overwrite": false,  "regex": "/(?<_KEY_0>\\w+)\\s(?<_VALUE_0>[^\\-]+)/"  },  "id": "regex_extract"  },  {  "filter": "true",  "conf": {  "comment": "Rebuild into raw json payload (optional)"  },  "id": "comment"  },  {  "filter": "true",  "conf": {  "type": "json",  "dstField": "_raw",  "fields": [  "!cribl",  "!_*",  "!source",  "!therest",  "*"  ]  },  "id": "serialize"  },  {  "filter": "true",  "conf": {  "keep": [  "_*",  "cribl*"  ],  "remove": [  "*"  ]  },  "id": "eval"  }  ]  }}

Wow, did you do that off the top of your head, or are there some tools which can help create this?


The pipeline was created in Cribl Stream, and then just exported.

The regex… I've had a 30+ year relationship with regex. 🙂 I dream in regex.


Reply