Skip to main content

Cribl–Cortex XSIAM Data Source Onboarding Guide

  • April 10, 2026
  • 0 replies
  • 11 views

Jim Apger

 

This guide outlines a practical, end-to-end process for onboarding third‑party data sources into Cortex XSIAM using Cribl Stream, and clarifies how datasets, parsers, and data models (XDM) work together

 


 

High-Level Onboarding Flow

  1. Start from the XSIAM perspective

  • Get certified on using XSIAM.  Get certified on Cribl Stream (It’s free!).  
  • Conduct a data source assessment so that a solid understanding of what data sources will be supported by XSIAM out of the box.  Everything regarding ingest revolves around XSIAM regardless of whether XSIAM connectors or Cribl Stream is used to route data into XSIAM.  
  • Identify the expected event format and whether an XSIAM data model (XDM) exists for the target data source.
  • Everything else flows from this: if the data model expects a specific format or field set, that is what Cribl Stream must emit.  Generally speaking, XSIAM expects the events to look exactly as they did when produced by the source.  If no Content Pack exists in the Palo Alto XSIAM Marketplace that contains a “data model”, a data model must be created in XSIAM.
     
  1. Format and tag events correctly in Cribl Stream

  • Ensure events are properly event-broken and formatted to match the XSIAM expectation.
  • Set the following fields in Cribl Stream for all events that will go to XSIAM:
    • __sourceIdentifier
    • __vendor
    • __product
  • These are mapped by the XSIAM destination tile into the HTTP headers XSIAM uses for parsing and data model mapping.
     
  1. Send data to Cortex XSIAM

  • Configure the XSIAM destination in Cribl Stream and route the relevant sources to it.
  • Validate that events are arriving in XSIAM using the Investigator (XQL search).
     
  1. Confirm dataset and data model behavior

  • Data is first written to Cortex Data Lake (the underlying data store), where it is searchable in the XSIAM Investigator.
  • For XSIAM analytics to apply, the dataset must be mapped to an XSIAM data model (XDM). This mapping is what exposes fields like XDM.* for analytics, correlation, and stitching.
  • Generally, it does not matter whether you use:
    • A standard XSIAM connector / Broker VM, or
    • The Cribl XSIAM connector
      as long as the events arrive in the correct format with the required identifiers.
       
  1. Verify XDM mapping (when a data model exists)

  • In the XSIAM data model rules, look for a match on:
    • dataset = <vendor>_<product>_raw
  • If the dataset isn’t present:
    • Install the appropriate content pack from the XSIAM Marketplace, or
    • Build your own data model rules for mapping into XDM.
  • If no data model is available and you do not build one, data remains fully searchable in the XSIAM Investigator; it just won’t be normalized into XDM for analytics.
  • Use a query such as:

datamodel dataset = <vendor>_<product>_raw

to verify that fields beginning with XDM. are populated when a data model exists.

 


 

Current / In-Progress Enhancements (FY26)

These items are being worked on jointly to improve the Cribl + XSIAM experience:

  • UUID-aligned sample events
    • Representative sample events for key data sources are being added into XSIAM, each using a unique data source UUID, to simplify testing and validation for partners.
  • Windows logs via Cribl Edge
    • Support for ingesting Windows event logs sourced from Cribl Edge is planned to begin rolling out in late March 2026.
  • Regional enablement
    • Palo Alto Networks is running regional enablement sessions focused on XSIAM + Cribl onboarding patterns and best practices.
  • Focused XSIAM Support routing
    • Palo Alto is putting in place a Technical Support routing model that directs inbound questions related to XSIAM data onboarding to a specialized team, to reduce inconsistent guidance and customer frustration.

 


 

How XSIAM Ingests and Normalizes Events

 

3.1 XSIAM ingest sequence

  1. Raw log ingested

  • Events arrive via a collector, Broker VM, or the Cribl integration endpoint.
  1. Parsing and dataset assignment

  • XSIAM’s parsing pipeline attempts to match an existing parser.
  • On a successful parser match, XSIAM sets:

dataset = "<vendor>_<product>_raw"

  • If no parser matches but the payload is in a structured or semi-structured format (JSON, CEF, LEEF, Syslog with key/value fields), XSIAM still parses KV pairs into fields when writing to Cortex Data Lake.
  • If no parser matches and the payload is unstructured text, XSIAM primarily writes:
    • _time
    • _raw or _raw_log (depending on source type; see Section 5)
  1. Write to Cortex Data Lake

  • Parsed and/or raw fields are written to the Cortex Data Lake.
  • At this stage, data models are not yet involved; you are just seeing raw + parsed fields.
  1. Schema-at-read normalization

  • XSIAM uses a schema-at-read approach (similar in spirit to Splunk ES):
    • XDR/XSIAM normalize events into the Cortex Data Model (XDM) at read time.
    • Analytics, stitching, and high-level use cases rely on this XDM layer.

 

 


 

Configuring the Cribl XSIAM Destination

4.1 XSIAM destination behavior

  • Cribl provides a dedicated Cortex XSIAM destination and a supporting XSIAM Pack.
  • The destination:
    • Validates and converts required fields into HTTP headers expected by XSIAM.
    • Handles batching, rate limiting, and JSON serialization for ingestion.

For full details, see:

4.2 Required identifiers and headers

Data is sent from Cribl Stream to XSIAM with key identifiers in HTTP headers. In Cribl:

  • __sourceIdentifier
    • Required for all events.
    • The XSIAM destination maps this to the Source-Identifier header.
  • __inputId
    • Required for all events and automatically set by Xcribl Stream.  Do not accidentally remove this field.
    • The XSIAM destination maps this to the Integration-Identifier header.
  • __vendor
    • Mapped to the vendor HTTP header.
  • __product
    • Mapped to the product HTTP header.
  • format
    • Automatically set by the XSIAM destination tile, usually to "raw" or “JSON”.

These headers tell XSIAM:

  • Which integration / connector the events belong to.
  • Which vendor/product combo they represent.
  • How to route the events to the proper parsers, datasets, and (where available) data models.

4.3 UUIDs and black-box analytics

  • Some data sources in XSIAM are associated with data source UUIDs that enable advanced, black-box analytics (e.g., baselining, ML models, and specialized stitching) that are applied in parallel to what is written into Cortex Data Lake.
  • When you send events from Cribl Stream:
    • If a UUID is known and documented for the data source, set that UUID in the Cribl XSIAM destination.
    • XSIAM can then:
      • Apply existing parsers,
      • Leverage existing data models, and
      • Trigger any analytics models tied to that UUID.

Reference:

4.4 Configuration patterns

  1. Data sources with an assigned UUID

  • Look up the UUID in the official XSIAM documentation (link above).
  • In the Cribl XSIAM destination:
    • Set the UUID so XSIAM recognizes the source.
    • Ensure __vendor and __product match what XSIAM expects for that UUID.
  • This enables direct mapping into known parsers and data models.
  1. Data sources without an assigned UUID

  • Locate the relevant XSIAM content pack in the Marketplace.
  • In the pack’s parser and/or data model rules, identify:
    • The expected __vendor
    • The expected __product
  • Configure Cribl Stream to:
    • Set __vendor and __product to those values.
    • Set the generic but consistent __sourceIdentifier value (examples are provided in the Cribl XSIAM Pack).  Use this:  af01292940d7426594d3d3e55ae17ee0.
  • XSIAM will use these headers to:
    • Route events into the correct dataset and
    • Apply any available parsing/data modeling for that pack.
  1. First-party Palo Alto Networks data

  • Recommendation: Do not route Palo Alto first-party data (e.g., NGFW, Cortex XDR native telemetry) through Cribl if you intend to rely on all out-of-the-box advanced analytics.
  • When first-party data is sent directly into Cortex Data Lake / Strata Logging Service:
    • It is enriched and processed through data-source-specific analytics (baselining, ML) and
    • It participates fully in stitching, where XSIAM correlates related observations into high-value incidents and attack stories.
  • Sending this data through an intermediate processor may bypass or alter that enrichment, which can affect out-of-the-box analytics and stitching.

 


 

Understanding _raw_log, _raw_json, and _raw in XSIAM

When troubleshooting or verifying ingest behavior, it is important to understand how XSIAM stores the original payload for different data paths.

5.1 Field breakdown

  • _raw_log
    • Most common for text-based / unstructured sources.
    • Contains the original log message as a plain text string, exactly as received.
    • Populated when data arrives via:
      • Syslog
      • CEF
      • LEEF
      • CSV
      • Other raw-text formats
    • When you write parsing logic that uses functions like regextract(), split(), or arrayindex() against the raw text, you’re typically operating on _raw_log.
  • _raw_json
    • Used when the ingested data arrives as structured JSON.
    • Typical for:
      • API-based integrations
      • HTTP collectors receiving JSON payloads
      • Cloud integrations (e.g., Azure Event Hub, AWS services)
      • Strata Logging Service forwarding JSON-formatted logs
    • XSIAM stores these events as JSON objects, so you can use:
      • json_extract_scalar()
      • Arrow notation (field->'child')
      • Other JSON-native operations
        without first converting from string.
  • _raw
    • More of a generic / internal field.
    • Common in datasets like:
      • xdr_data from XDR agent telemetry
      • Some native Palo Alto datasets
    • Represents XSIAM’s internal view of the original event.
    • In some contexts, it behaves like an alias or superset concept, but exact usage depends on the dataset schema.

5.2 Why one vs. another for a given event?

The deciding factors are:

  • Ingestion method
    • Broker VM, HTTP Collector, XDR Collector, etc.
  • Declared / detected log format
    • What you specify (or XSIAM auto-detects) during data source onboarding.

In general:

  • Syslog / CEF / LEEF / raw text
    _raw_log (string)
  • JSON via API / HTTP / cloud integrations
    _raw_json (JSON object)
  • Native XDR agent data
    _raw (internal representation)

 


 

Cribl–XSIAM Integration References

These resources provide deeper technical detail and end-to-end configuration examples:

 


 

By aligning the Cribl Stream output format, headers, and UUIDs with XSIAM’s documented expectations for each data source, joint customers and partners can:

  • Onboard third‑party data quickly and predictably.
  • Maintain searchability in XSIAM even when no data model exists.
  • Fully leverage XDM-based analytics and stitching where data models and UUID‑tied analytics are available.