# Data pipeline key concepts

Workato data pipelines extract, replicate, and sync data to maintain accurate and up-to-date datasets. Pipelines connect source applications to destination data warehouses, move data in bulk, and preserve schema integrity. The following sections define key concepts that explain how data pipelines process and manage data.

# Source applications and destinations

Data pipelines extract data from a source application, such as Salesforce, and sync it to a destination, such as Snowflake. A single pipeline retrieves data from multiple objects or fields within the source application and replicates that data in the specified destination.

# Object syncs

A sync refers to the overall process where the pipeline extracts data from the source and loads it into the destination. Each sync processes multiple objects in parallel and uses one of the following types:

Full sync
Extracts all available records from the source and loads them into the destination. This ensures the destination table contains a complete snapshot of the source data at the time of sync.
Incremental sync
Each scheduled sync extracts only new, updated, or deleted records since the last successful sync.
Re-sync
A manual one-time sync for a specific object. Re-syncs extract and load the current data for that object immediately. This type is useful when only one object needs to be synced again due to errors, skipped syncs, or changes in data.

A data pipeline starts with a full historical sync, which transfers all data from the source or from a specified date. After the initial sync completes, the pipeline switches to incremental syncs to capture new, updated, or deleted records. Re-syncs allow users to reprocess specific objects outside the normal schedule.

Refer to the Sync types and execution guide for more information.

# Sync frequency

Data pipelines run on a scheduled cadence to keep source and destination systems in sync. The default frequency is 15 minutes. A 5-minute interval is available upon request and can be enabled on a per-customer basis.

Each scheduled run performs an incremental sync for all selected objects, capturing only the changes since the last successful sync. Choose a schedule that aligns with your operational requirements and system capacity.

# Pipeline runs

Each data pipeline sync consists of multiple runs, with one run per selected object. A sync represents the entire activity for all objects, while a run tracks the execution for a single object within that sync.

The pipeline executes runs in parallel to improve performance. Each run extracts data from the source and loads it into the destination. Run-level data appears in the Runs tab, which helps you monitor pipeline execution. Refer to the Object runs section for more information.

# Schema replication and schema drift management

Schema drift refers to inconsistencies between the source and destination that occur when changes appear in the source data. These changes may include added or deleted fields, modified field types, or other structural updates. Unmanaged schema drift can cause transformation errors, data loss, and inaccurate analysis.

Workato pipelines detect schema drift during syncs and apply schema changes based on your pipeline configuration. Use the Auto-sync new fields option to apply schema updates automatically, or use Block new fields to review and manage changes manually. You can configure this behavior during pipeline setup.

# Data masking

Data masking helps protect sensitive information by transforming values during sync. Pipelines offer two masking options at the field level:

Replicate as is: Retains the original field values from the source and syncs them to the destination.
Hash: Hashes field values to obscure sensitive data before writing to the destination. This option enables teams to replicate structure while protecting content.

Field-level masking applies per object column. You can configure masking behavior when you select and map fields during pipeline setup.

Last updated: 2/6/2026, 5:48:07 PM