# Configure Amazon S3 as your data pipeline source
Set up Amazon S3 as a data pipeline source to extract and sync records into your destination. This guide includes connection setup, pipeline configuration, and key behavior for working with .csv files in S3 buckets.
# Features supported
The following features are supported when using Amazon S3 as a data pipeline source:
- Extract and sync data from
.csvfiles stored in S3 - Support for full and incremental sync via file detection
- Field-level selection for object extraction
- Schema drift detection and handling
- Field-level data masking
# Prerequisites
You must have the following configuration and access:
- An AWS account with an S3 bucket that stores
.csvfiles - IAM role authentication configured for Workato
- Required S3 permissions for the pipeline to list and read files
- Folder paths and file patterns for the files to sync
# How to connect
Complete the following steps to connect to Amazon S3 as a data pipeline source. This connection allows the pipeline to read and extract .csv files from S3.
ACCESS KEY AUTHENTICATION DEPRECATED
Workato recommends using IAM role authentication. Access key authentication remains functional for existing connections, but Workato doesn't recommend it for new environments.
Connect to Amazon S3
Select Create > Connection.
Search for and select Amazon S3 on the New connection page.
Enter a name in the Connection name field.
Amazon S3 connection setup
Use the Location drop-down to select the project where you plan to store the connection.
Select Cloud in the Connection type field, unless you need to connect through an on-prem group.
Select IAM role as the Authorization type.
Enter the ARN of the role configured in your AWS account in the IAM role ARN field. The IAM role must include permissions that grant Workato access to the buckets and folders required for pipeline extraction.
Optional. Enter a value in the Restrict to bucket field to limit the connection to a specific bucket. Use this setting when the IAM role has limited list permissions.
Enter the Region for the S3 bucket. For example, if your S3 console URL is https://us-west-1.console.aws.amazon.com/, enter us-west-1.
Optional. Enter a value in the Download threads field to increase file download concurrency. The default value is 1, and the maximum is 20.
Select Connect to verify and store the connection.
# Configure the pipeline
Complete the following steps to configure Amazon S3 as your data pipeline source:
Select Create > Data pipeline.
Provide a Name for the data pipeline.
Data pipeline setup
Use the Location drop-down menu to select the project where you plan to store the data pipeline.
Select Start building.
Click the Extract new/updated records from source app trigger. This trigger defines how the pipeline retrieves data from the source application.
Configure the Extract new/updated records from source app trigger
Select Amazon S3 from Your Connected Source Apps.
Choose the Amazon S3 connection you plan to use for this pipeline. Alternatively, click + New connection to create a new connection.
Choose an Amazon S3 connection
Select the S3 bucket you plan to monitor in the Bucket field.
Select S3 bucket
Click Add object to configure files you plan the pipeline to monitor and sync.
Workato doesn't browse your S3 bucket. You must manually enter the folder path and file pattern. Ensure the path and filenames match your S3 structure.
Enter the folder within the bucket to monitor in the Source Folder path field. The pipeline supports .csv files only.
Configure file settings
Define which files to fetch using a pattern in the Filename pattern field. Use wildcards such as orders_*.csv to include multiple files.
Click Fetch matching files to preview files matching the defined pattern.
Select a Reference file to define the schema for the destination table.
Configure CSV settings:
Set whether the CSV includes a header in the Does CSV file include a header line? field.
Choose a delimiter in the Column delimiter field.
Set the Force quotes field to True to quote all values.
Click Fetch schema to load and preview columns from the reference file.
Review the schema to ensure it matches your expected table structure.
Review schema
Configure how rows are merged in the destination table in the Choose a merge strategy field. Workato supports the following merge strategies:
- Upsert: Inserts new rows and updates existing rows. When you choose Upsert, the Merge method field appears. You must select a column that uniquely identifies each row. This key is used to determine whether a row exists in the destination and whether it's updated or inserted.
- Append only: Inserts all rows without attempting to match or update existing records. When you choose Append only, the pipeline doesn't match on a key and doesn't update existing rows.
Click Review object to confirm your setup. This screen displays your file settings, CSV options, and merge details.
Review object
Enter an Object name. This name defines the destination table name.
Click Finish to save the object configuration.
Review and customize the schema for each selected object. When you select an object, the pipeline automatically fetches its schema to ensure the destination matches the source.
Expand object
Expand any object to view its fields. Keep all fields selected to extract all available data, or deselect specific fields to exclude them from data extraction and schema replication.
Optional. Configure field-level data protection. After you expand an object, choose how to handle each field:
- Replicate as is (default): Data values at the source are replicated identically to the destination.
- Hash: Hash sensitive data values in the column before syncing to your destination.
Configure field-level data protection
Click Add object again to add more objects using the same flow. You can repeat this step to include multiple Amazon S3 objects in your pipeline.
Choose how to handle schema changes:
- Select Auto-sync new fields to detect and apply schema changes automatically.
- Select Block new fields to manage schema changes manually. This option may cause the destination to fall out of sync if the source schema updates.
Unsynchronized schema changes, also known as schema drift, can cause issues if not managed. Refer to the Schema drift section for more information.
Configure how often the pipeline syncs data from the source to the destination in the Frequency field. Choose either a standard time-based schedule or define a custom cron expression.
Last updated: 2/6/2026, 5:48:07 PM
Configure sync frequency
Configure sync frequency