Deep Dive: Medium-Scale Self-Service with Worker Groups

Forum|Forum|2 months ago
November 19, 2025
0 replies
45 views

kprior
Employee

While we generally recommend the workspace approach for self-service setup, the same results can be achieved by leveraging a development worker group. Think about it: a specific worker group that you provide access to for your data owners to develop their pipelines. As long as you set up RBAC appropriately, leverage packs and git, and have some guidelines for your developers, this will also work as a contained environment to provide self-service options to your log owners.

Keep Control

Same as the large-scale approach: Cribl administrators get final say on what gets put into production. Leverage change control systems (JIRA, ServiceNow, etc.) in order to track what onboarding is underway, what items within the pack must be updated for production readiness, who is responsible for the pack, and to simply track why changes are occurring in the environment. You can automate a lot of these tasks by leveraging forms in ServiceNow, JIRA, or Azure; creating a templated change control approach will cut down on mistakes or missed information during the onboarding process.

Make Use of Git

Again the same as the large-scale approach: the best way to operationalize this effort is to leverage Git to move packs throughout your environment while having easy tracking of changes that occur within each pack. For simplicity’s sake, have a single Git repository for each pack that is created. Have teams write to Git as packs are created and managed (coming soon within the product!). So, first thing to do: get you a Git repository for your pack. I would recommend both a dev and prod (or main) branch here so that any changes made during production readiness are recorded appropriately.

Create a Development Worker Group

The first step here is to create a worker group for your development environment. If you have a Cribl.Cloud environment, this is exceptionally easy. You can spin up workers as you need them and make sure that you’re only paying for the resources that you’re using. If you are on-premise, you can do the same using a Cloud provider.

Creating the Worker Group

This is pretty easy. Follow the instructions available in the documentation to create a worker group. I recommend making this a very small size and using virtual machines that you can destroy when not in use, as mentioned above. Name the worker group something that indicates the development environment.

Set up RBAC for the Worker Group

In order to properly segregate the development worker group from the rest of your environment, you need to pay close attention to the permissions you assign here. Determine if you want your log owners to be able to commit and deploy changes for the WG or just commit them. If you’re nervous about making sure that the permission are assigned properly, start with basic Editor permissions for the WG and move up. It’s always easy to add permissions than to take them away.

The easiest way to apply your permissions properly is to leverage Teams. So, as an administrator, create a Team for your developers by doing the following:

Navigate to Organization -> Members & Teams
Select Teams
Select Create Team.
Name your team something that signifies its permission level (i.e. Development WG Admins)
Allocate the proper Workspace access. This should be the workspace in which the Development WG is contained. Member is the level of access required.
For Product Permissions, choose User level access for Stream and leave the rest at No Access.
Now, Navigate to Stream -> Worker Groups.
Select the 3 dots at the end of the WG you created for Development. Select Members and Teams.
Select the Team you created above, and assign it the appropriate permissions level (Editor or Admin).

Operationalize Access to the Worker Group

Awesome, so now you have your Worker Group set up as well as your permissions properly assigned. Now you just need an easy way to get your future developers access to the environment. If you already have SSO set up, I recommend creating a group in your identity provider just for this access. Then, you can map that group to the Team you created above. This will minimize the amount of changes you will have to make within the Cribl platform. Instead, users will be able to request addition/removal to that identity group through your identity team, and the permissions will apply automatically through SSO.

Leverage Git and Packs for Version Control

This part is very similar to the large-scale deployment as well. The only real difference in this part of the process is that you’re doing it in an isolated worker group rather than an isolated workspace.

Create Pack

So, now we have an empty Git repository and a Cribl worker group to develop in that is relatively isolated from the production environment. The next step is to create the pack. To do this, simply go to Stream -> choose your Development Worker Group -> Processing -> Packs. Then pick Create Pack. Fill in as much of this information as you can. Make sure you adhere to the standards outlined in the data governance for your organization. For example, make sure you are following a naming scheme, tag sources and destinations, add versioning, etc.

Ingest Data

Once the barebones pack is created, we need to click into it and create our source. Where are you ingesting data from? Let’s say it’s from a custom application that has a REST API. Well, thankfully, in Cribl 4.14.0, we can now add REST collectors in packs, so let’s do that. Navigate to the Worker Group -> Data -> Sources -> REST API, and configure it in accordance with your data governance rules. Be sure you’re actually getting data in before moving on.

If, for some reason, you do not want to or cannot set up sources in this pack and you need to get data to it for testing, here are some other options you can do to get sample data into the disconnected environment:

If the data is already flowing to production, copy a Capture from the production worker group over to the development worker group
- In Production: Edit Sample, Select All, Copy
- In Dev: Import Sample, Paste events
If the data is already flowing to production, replay the data to the development worker group.

Build Pipelines

Awesome, data is connected to Cribl, and we can see that data coming in. Now we need to determine what to do with that data. Making application owners responsible for this step reduces the likelihood of not understanding the data that is coming in, not keeping up with changes to the platform (we’ve ALL been hit with an upgrade that changed the logging format), and it reduces the load on your Cribl administration team immensely.

Follow the custom application example from the Ingest step: as the application owner, you know that you require the fields timestamp, user, and message. Nothing else from these logs really matters, so you build a pipeline that parses the data, keeps those fields, and reserializes it into KV pairs. Now, when you send it to your SIEM, it will take up far less space and reduce your licensing, which is what security cares about. Be sure to test the data as you’re working on the pipeline to make sure it is parsing as expected.

Set Up Destination

Okay, so we have our source and pipeline set up in our pack now. Now we need to set up the destination. This part is optional. If needed, the Cribl Administration team could take care of this section of the pack. If the responsibility matrix established during the data governance phase indicates that application/log owners are responsible for designing the data from source to destination, then they should continue setting it up here. Data governance may have also decided that the application/log owners are only responsible for getting data into Cribl and in the right format for the destination. Either of these approaches are acceptable - it just depends on what is best for your organization.

If the destination is meant to be set up within the pack, however, now is the time to do that and test it. Be sure there is an indicator somewhere in the data that demonstrates that it is a development environment, and this is not production data. Document that indication in the change ticket so that whoever is promoting the pack into production knows what needs to be updated for production readiness.

Create Route

Just like any other time that we’re ingesting data, we need to put all the pieces together. Make sure to set up the route in your pack to get data from your configured source, through your new pipeline, into your destination.

Export Pack to Git

Follow these steps to get your newly made pack into Git. You should already have a Git repository created and ready to be used for this pack. We recommend creating a dev branch as well.

Clone down your Git repository to your local machine and switch to the dev branch.
Double-check that all of your configurations created for this integration (source, route, pipeline, destination, etc.) are all properly included within the Pack that you’re working on. You can do this by navigating to Processing->Packs->Your new pack and double-checking configurations.
Once you have confirmed the above, go back to the Packs page and export the pack by clicking on the 3 dots on the right-hand side of the screen next to the pack information and choosing Export.
Unzip the .crbl file that you have downloaded, and copy the contents of the pack folder to your Git repository folder. You want the base folder structure in Git to be the contents of the pack itself - should have things like package.json, README.md, and data and default folders.
Add, Commit, and Push this configuration to Git.
Update the change control ticket to indicate development is complete and route it to the Cribl administrator team.

Promote Code to Production

At this point, your new pack is in Git on the dev branch. The change control ticket is in the queue for Cribl Admins. What’s next? The Cribl Administrators should now review the configuration either via Git or via the Development Worker Group.

If the Cribl Administrator team has any feedback about the configuration or changes that need to be made, then the pack should be returned to the dev team for further updates, and the change ticket should be routed back to them.

If the Cribl Administrator team is happy with the current version, then they should promote the Git configuration to the prod/main branch and import it into the production environment.
- Ensure that tags are updated to reflect the production environment
- Commit and deploy the changes
- Test the route and pipeline
- Close the change ticket as complete

Example Scenario

The application development team needs to onboard their Shine logs into Cribl. They’ve developed an API in their application just to do this. The Cribl administration team has a Worker Group set up just for such a scenario, and they have an SSO group set up to allow Admin access to this Worker Group as needed (Cribl administrators didn’t want to have to deploy the changes). They have also created a dataset in Cribl Lake for the development team to send data to. The application development team has been through the Cribl User training and has a decent understanding of the environment.

Change control process is kicked off; a Jira ticket is created and assigned to the application development team.
Application development team requests to be added to the identity group that is tied to the Development Worker Group.
Application development team creates a Git repository called shine_stream_ingest; they update the ticket with this information.
Application development team navigates to Cribl, accesses the Development Worker Group, and creates a pack with the pack ID of shine_stream_ingest and the name Shine Stream Ingest. They also fill out the details of the pack, like the README and tags, in accordance with the organization’s data governance conditions.
Application development team creates a REST API collector Source, a pipeline to parse their data efficiently, a Cribl Lake Destination, and a route that pieces these things together.
Application development team commits and deploys their changes.
Application development team tests their data ingest from end to end within the Testing workspace.
Application development team ensures that everything is added to the pack, save it, and export it. Then they import it to a dev branch in Git and notify the Cribl Administration team by moving the change control ticket into their queue.
Cribl Administration team reviews the pack. They import it into the appropriate production worker group and test it. They deem it safe and efficient. They update the tags to reflect production, save the changes, commit and deploy the changes, and export it to the main branch in Git.
Cribl Administration team updates the change ticket with the new Git information and closes the change as complete.