Version

    SnowflakeBulkWriter

    Short Description
    Ports
    Metadata
    SnowflakeBulkWriter Attributes
    Details
    See also

    Short Description

    SnowflakeBulkWriter loads large volumes of data into Snowflake database. It is faster than DatabaseWriter, but the target table must have the same structure as the input metadata. If the target table does not exist, the component can create it based on input metadata.

    ComponentData outputInput portsOutput portsTransformationTransf. requiredJavaCTLAuto-propagated metadata
    SnowflakeBulkWriterdatabase11

    Ports

    Port typeNumberRequiredDescriptionMetadata
    Input0For data records to be insertedAny
    Output0Load statisticsSee below

    Metadata

    SnowflakeBulkWriter does not propagate metadata.

    SnowflakeBulkWriter has a metadata template on the output port.

    Table 56.16. Output Metadata for SnowflakeBulkWriter

    Field numberField nameData typeDescription
    0filestringName and relative path of the staged chunk file. See Chunk Files for more details.
    1statusstringStatus: LOADED, LOAD_FAILED or PARTIALLY_LOADED.
    2parsedRowslongNumber of rows in the CSV file.
    3loadedRowslongNumber of rows that were successfully loaded.
    4errorLimitlongIf the number of errors reaches this limit, then abort. Either 1 or the same as parsedRows, if Continue on error.
    5errorCountlongNumber of rejected rows. Each of these rows could include multiple errors.
    6firstErrorstringFirst error message in the file.
    7firstErrorLinelongLine number of the first error in the file.
    8firstErrorCharacterlongPosition of the first error character.
    9firstErrorColumnNamestringColumn name and index of the first error, e.g. "TABLE_NAME"["COLUMN_NAME":1]

    SnowflakeBulkWriter Attributes

    AttributeReqDescriptionPossible values
    Basic
    DB connectionA Snowflake connection.e.g. ${CONN_DIR}/snowflake.cfg (id:JDBC0)
    DB tableName of the target table.e.g. Account
    Create table if not exists Creates the target table if it does not exist. Generates DDL from input port metadata. CloverDX must have CREATE TABLE privilege for the target schema.true (default) | false
    Continue on error Prevents the component from failing if an error occurs.false (default) | true
    Advanced
    Upload threads Max number of concurrent uploads, 5 by default.e.g. 5
    Remove files after import 

    Set to false to prevent staged files from being deleted from Snowflake after the import.

    true (default) | false

    Details

    SnowflakeBulkWriter is a fast writer that can insert records into a Snowflake table at the speed of 25 MB/s or more (about 100 GB per hour). You can expect about 8 times speedup when compared with DatabaseWriter, depending on settings.

    Input fields are mapped to the target columns by position, so the number of input fields and their types must match the target table columns.

    The component has full Unicode support. It also allows writing byte fields into BINARY columns and JSON string data into VARIANT columns.

    For information about writing date/time data types and handling time zones, see Date/Time Data Types in Snowflake.

    If the target table does not exist, it is created based on the input metadata by default. See Create table if not exists attribute.

    Loading process:

    1. Split the data into chunks.
    2. Upload the chunks into Snowflake table stage in parallel.
    3. Load data from the stage to the table using the COPY INTO statement.

    If the output port is connected, the component produces one output record for every chunk. The output records contain load statistics (e. g. the number of loaded rows, error count and more). If Continue on error is enabled, there may also be error messages.

    Staged chunks are deleted when the process finishes, see Remove files after import.

    Chunk Files

    Incoming data is internally split into gzipped CSV files. These chunks are uploaded and inserted into Snowflake in parallel to improve performance. But due to the parallel loading, there is no guarantee input records will be written in the order they were received by the component.

    The staged files are named according to the following pattern:

    CloverDX_[RUN_ID]_[COMPONENT_ID]_[RANDOM_NUMBER]/chunk_[CHUNK NUMBER].csv.gz

    When a chunk is successfully loaded, it is deleted automatically.

    Error Handling

    By default, any error causes the component to fail and the transaction is rolled back.

    If Continue on error is enabled, the component continues loading data even if an error occurs. The data is internally split into chunks and the component produces one output record for every chunk, regardless of errors. If a chunk contains an error, Snowflake continues loading the file and rejects only the invalid rows. The output record contains the number of errors in the chunk and the first error message and its details. At most one error is reported for each chunk.

    The total number of errors is logged to the execution log as a warning.

    Resource Utilization

    The component chunks incoming data into gzipped CSVs and uploads them in parallel to increase throughput. The maximum number of concurrent uploads can be set using the Upload threads attribute.

    Data is buffered in memory before being uploaded to Snowflake. Memory consumption of the component is approximately 15 MB x [Upload threads].

    Notes and Limitations

    SnowflakeBulkWriter cannot write maps and lists. If you need to write the content of the map or list field, convert the field into string using Reformat first.

    The component does not preserve the order of incoming records. The COPY INTO statement loads data in parallel, so the records may appear out of order in Snowflake.