SnowflakeBulkWriter

CloverDX Designer > Component Reference > Writers > SnowflakeBulkWriter

Short Description

Ports

Metadata

SnowflakeBulkWriter Attributes

Details

Short Description

SnowflakeBulkWriter loads large volumes of data into Snowflake database. It is faster than DatabaseWriter, but the target table must have the same structure as the input metadata. If the target table does not exist, the component can create it based on input metadata.

Component	Data output	Input ports	Output ports	Transformation	Transf. required	Java	CTL	Auto-propagated metadata
SnowflakeBulkWriter	database	1	1	⨯	⨯	⨯	⨯	⨯

Ports

Port type	Number	Required	Description	Metadata
Input	0	✓	For data records to be inserted	Any
Output	0	⨯	Load statistics	See below

Metadata

SnowflakeBulkWriter does not propagate metadata.

SnowflakeBulkWriter has a metadata template on the output port.

Table 56.14. Output Metadata for SnowflakeBulkWriter

Field number	Field name	Data type	Description
0	file	string	Name and relative path of the staged chunk file. See Chunk Files for more details.
1	status	string	Status: LOADED, LOAD_FAILED or PARTIALLY_LOADED.
2	parsedRows	long	Number of rows in the CSV file.
3	loadedRows	long	Number of rows that were successfully loaded.
4	errorLimit	long	If the number of errors reaches this limit, then abort. Either 1 or the same as `parsedRows`, if Continue on error.
5	errorCount	long	Number of rejected rows. Each of these rows could include multiple errors.
6	firstError	string	First error message in the file.
7	firstErrorLine	long	Line number of the first error in the file.
8	firstErrorCharacter	long	Position of the first error character.
9	firstErrorColumnName	string	Column name and index of the first error, e.g. "TABLE_NAME"["COLUMN_NAME":1]

SnowflakeBulkWriter Attributes

Attribute	Req	Description	Possible values
Basic
DB connection	✓	A Snowflake connection.	e.g. ${CONN_DIR}/snowflake.cfg (id:JDBC0)
DB table	✓	Name of the target table.	e.g. Account
Create table if not exists		Creates the target table if it does not exist. Generates DDL from input port metadata. CloverDX must have CREATE TABLE privilege for the target schema.	true (default) \| false
Continue on error		Prevents the component from failing if an error occurs.	false (default) \| true
Advanced
Upload threads		Max number of concurrent uploads, 5 by default.	e.g. 5
Remove files after import		Set to `false` to prevent staged files from being deleted from Snowflake after the import.	true (default) \| false

Details

SnowflakeBulkWriter is a fast writer that can insert records into a Snowflake table at the speed of 25 MB/s or more (about 100 GB per hour). You can expect about 8 times speedup when compared with DatabaseWriter, depending on settings.

Input fields are mapped to the target columns by position, so the number of input fields and their types must match the target table columns.

The component has full Unicode support. It also allows writing byte fields into BINARY columns and JSON string data into VARIANT columns.

For information about writing date/time data types and handling time zones, see Date/Time Data Types in Snowflake.

If the target table does not exist, it is created based on the input metadata by default. See Create table if not exists attribute.

Loading process:

Split the data into chunks.
Upload the chunks into Snowflake table stage in parallel.
Load data from the stage to the table using the COPY INTO statement.

If the output port is connected, the component produces one output record for every chunk. The output records contain load statistics (e. g. the number of loaded rows, error count and more). If Continue on error is enabled, there may also be error messages.

Staged chunks are deleted when the process finishes, see Remove files after import.

Chunk Files

Incoming data is internally split into gzipped CSV files. These chunks are uploaded and inserted into Snowflake in parallel to improve performance. But due to the parallel loading, there is no guarantee input records will be written in the order they were received by the component.

The staged files are named according to the following pattern:

CloverDX_[RUN_ID]_[COMPONENT_ID]_[RANDOM_NUMBER]/chunk_[CHUNK NUMBER].csv.gz

When a chunk is successfully loaded, it is deleted automatically.

Error Handling

By default, any error causes the component to fail and the transaction is rolled back.

If Continue on error is enabled, the component continues loading data even if an error occurs. The data is internally split into chunks and the component produces one output record for every chunk, regardless of errors. If a chunk contains an error, Snowflake continues loading the file and rejects only the invalid rows. The output record contains the number of errors in the chunk and the first error message and its details. At most one error is reported for each chunk.

The total number of errors is logged to the execution log as a warning.

Resource Utilization

The component chunks incoming data into gzipped CSVs and uploads them in parallel to increase throughput. The maximum number of concurrent uploads can be set using the Upload threads attribute.

Data is buffered in memory before being uploaded to Snowflake. Memory consumption of the component is approximately 15 MB x [Upload threads].

Notes and Limitations

SnowflakeBulkWriter cannot write maps and lists. If you need to write the content of the map or list field, convert the field into string using Reformat first.

The component does not preserve the order of incoming records. The COPY INTO statement loads data in parallel, so the records may appear out of order in Snowflake.

SnowflakeBulkWriter

Short Description

Ports

Metadata

SnowflakeBulkWriter Attributes

Details

Chunk Files

Error Handling

Resource Utilization

Notes and Limitations

See also