Introduction
Exporting data from Splunk is a common necessity for various reasons, including migrating to a new platform, cleaning data, long-term archiving, or analytical processing outside of Splunk. While traditional methods have been effective, new approaches leveraging Splunk's Dynamic Data Self-Storage (DDSS) offer streamlined solutions, especially for migrating Splunk Cloud data.
Exporttool Use Cases
In practical applications, we've observed distinct use cases for each export approach. For instance, the traditional exporttool remains invaluable for compliance and data remediation scenarios in self-hosted environments. We recently saw a customer utilize exporttool when PII (Personally Identifiable Information) sensitive data was inadvertently indexed. Rather than incurring the massive overhead of dropping and re-indexing the entire bucket, they used exporttool to extract the raw data, processed it through an external stream processor like Cribl Stream to sanitize or remove the PII field, and then re-ingested the cleaned data. Conversely, for Splunk Cloud customers, the DDSS-centric method offers flexibility: once data is funneled into the DDSS export bucket via the re-indexing pipeline, it sits in accessible object storage where it can be cost-effectively queried, analyzed, and integrated into other downstream systems or compliance archives without further taxing the Splunk indexers.
Scaled Approaches to Data Export
Historically, users relied on several methods to get their data out of Splunk.
- Export via the UI/Search: For smaller, ad-hoc exports, running a search and using the "Export" button is straightforward. However, this is not scalable for large datasets.
- The exporttool utility: This command-line utility, used on the Splunk indexer, is the most robust native way to export raw data and is crucial for large-scale operations. It can be found in the Splunk bin directory.
- Splunk REST API: Programmatic access via the REST API allows for scheduled and automated exports, often used in conjunction with scripts or applications.
- Splunk collect command: Leverage the Splunk collect command to send data from an existing index into an export index that has DDSS enabled.
For those interested in the details of the exporttool, the latest community repository can be found here: https://github.com/Exporttool/exporttool
Migrating Splunk Cloud Data using DDSS
For Splunk Cloud customers, DDSS provides a powerful new mechanism for efficiently managing and exporting aged data. DDSS stores aged data in a cost-effective, external object storage (typically AWS S3 or equivalent) and can be used to dramatically simplify the process of migrating or exporting large volumes of cold data.
The key to this new approach is combining two elements:
- DDSS Setup: An existing index has DDSS configured, meaning data meeting the age criteria is automatically moved to the DDSS export bucket.
- Dynamic Data Active Archive (DDAA): This add-on can be used to re-ingest or restore data into Splunk, with the ability to target a specific index.
Step-by-Step Migration Process
This process outlines how to migrate data from an old Splunk index (or instance) to a new Splunk instance, leveraging DDSS for the actual export mechanism.
1. Setup DDSS on the Target Index
On your target Splunk instance (the one you are migrating to), ensure you have an index setup with DDSS enabled. This index will be the initial landing zone for the restored data.
| Configuration Item | Value/Description |
|---|---|
| Index Name | migrated_data_hot |
| DDSS Enabled | Yes |
| DDSS Export Bucket | s3://your-ddss-export-bucket/<index_name> |
| DDSS Age Policy | A very short policy (e.g., 5 minutes) |
By setting a very short DDSS age policy (e.g., 5 minutes), any data written to the migrated_data_hot index will be almost instantly aged out into your DDSS export bucket. This means the data effectively passes through the Splunk indexer and immediately becomes a flat file export in object storage.
2. Configure DDAA on the Source Instance (or an Interim Instance)
The Dynamic Data Active Archive (DDAA) add-on is used to restore data. In a Splunk Cloud migration scenario, you would typically use DDAA to restore the data you want to migrate.
- Source Data Identification: Use DDAA to initiate a restore of the data from your source index/instance.
- Target Index Mapping: When configuring the DDAA restore job, instruct it to restore the data to the newly created index on your target environment: migrated_data_hot.
3. Execution and Instant Export
When the DDAA restore job executes:
- DDAA restores the cold data from the source environment.
- The data is re-indexed into the migrated_data_hot index on the target system.
- Because the DDSS age policy is set to be very short, the newly re-indexed data almost immediately meets the aging criteria.
- Splunk's DDSS mechanism ages the data out of the migrated_data_hot index and writes it as raw data files to the configured DDSS Export Bucket.
This process effectively uses the Splunk indexer and DDSS configuration as a highly efficient pipeline to transform cold Splunk data into an object storage export, bypassing the need for manual, high-overhead export scripts.
4. Alternative Method: Using the `collect` Command for Targeted Export
While the DDAA method is ideal for bulk migration of cold data, an alternative approach using the `collect` search command allows for selective, real-time data export from any active Splunk index. This is particularly useful for exporting subsets of data defined by a specific search query.
Step-by-Step `collect` Export Process
This method uses the `collect` command to take data from a source search and write it to the DDSS-enabled index (e.g., `migrated_data_hot`), which then immediately ages the data out to object storage.
| Step | Action | Description |
| 1 | Source Data Search | Run a standard Splunk search to identify the specific data you wish to export. This could be based on time range, sourcetype, host, or any other Splunk field. |
| 2 | Apply collect Command | Pipe the results of your search into the collect command, specifying the DDSS-enabled index as the target. |
| 3 | Execution and Export | The data is indexed into the DDSS index, instantly meets the short age policy, and is written to the DDSS Export Bucket. |
Example Search Query
The following example demonstrates a search that targets all web server access logs from the last 24 hours and pipes them to the `migrated_data_hot` index for immediate export:
index="web_access" sourcetype="access_combined" earliest=-24h | collect index=migrated_data_hot
Advantages of the collect Method
- Granularity: Allows for precise, query-based selection of data for export.
- Active Data Export: Can be used to export data that is still hot or warm, unlike the DDAA method which primarily targets cold/archived data.
- Simplicity: Leverages standard Splunk Search Processing Language (SPL) and existing DDSS configuration.
Summary of Benefits
This DDSS-centric approach offers significant advantages for large migrations:
- Efficiency: Leveraging DDSS's native file handling for aged data is generally faster and more efficient than running custom export scripts.
- Scalability: It works seamlessly with Splunk Cloud's architecture and is built to handle massive data volumes.
- Clean Export: The final output is raw data stored in your own object storage, ready for other consumption or long-term archive.
- Flexibility: The export is controlled by the DDSS policy, allowing fine-tuning of what data is exported and when.
Important Note: For large data exports, it is advised to consult with Splunk professional services per their Splunk Cloud Platform Service Details under their Subscription expansions, renewals, and terminations section.
Additional reference guides :
