# Data pipelines

Data pipelines automate large-scale data replication by extracting, transforming, and loading data from source applications or file systems into destination data warehouses. Unlike standard recipes that process records individually or in small batches, pipelines sync multiple objects in parallel and operate at scale. This improves performance, reduces maintenance, and ensures consistent schema mapping across systems.

# Why use a data pipeline?

Standard recipes require separate workflows for each object and process records in small batches. This approach increases setup time, extends sync durations, and complicates failure recovery.

Data pipelines streamline replication by consolidating multiple object syncs into a single workflow. Pipelines begin with a full historical sync, then switch to incremental sync using Change Data Capture (CDC). This captures inserts, updates, and deletes automatically. Pipelines also detect schema changes and apply updates to keep destinations aligned.

# Key benefits

Data pipelines provide the following capabilities:

Automated schema management: Detects schema changes and applies them to the destination.
Optimized change tracking: Uses CDC to capture new, modified, and deleted records.
Reduced maintenance effort: Replaces multiple recipes with a single pipeline for simplified setup, monitoring, and error handling.
Improved observability: View schema changes, data volume, and errors through the Data Orchestration dashboard and pipeline run history.

# How data pipelines work

A data pipeline follows the extract, replicate, load, and sync process to automate data movement:

graph LR A[Extract] --> B[Replicate] B --> C[Load] classDef default fill:#67eadd,stroke:#67eadd,stroke-width:2px,color:#000;

Extract: The trigger retrieves data from the source application, such as Salesforce.
Replicate: The pipeline replicates the schema and ensures compatibility with the destination.
Load: The load action transfers records in bulk to the destination, such as Snowflake.

The pipeline syncs data on a scheduled interval. It executes the extract, replicate, and load process for all selected objects. The trigger extracts data from the source, and the load action replicates the schema and transfers records to the destination.

# Get started with data pipelines

Refer to the following guides to configure a data pipeline recipe to sync data between applications:

Connect to sources and destinations: Establish connections to source applications and destination data warehouses.
Configure a data pipeline: Set up the pipeline, define source objects, and choose sync settings.
Monitor and manage pipelines: Track sync progress and troubleshoot errors.

Last updated: 2/6/2026, 5:48:07 PM