Version

    SnowflakeBulkWriter

    SnowflakeBulkWriter 64x64

    Short description

    Ports

    Metadata

    SnowflakeBulkWriter attributes

    Details

    See also

    Short description

    SnowflakeBulkWriter loads large volumes of data into Snowflake database. It is faster than DatabaseWriter, but the target table must have the same structure as the input metadata. If the target table does not exist, the component can create it based on input metadata.

    Data output Input ports Output ports Transformation Transf. required Java CTL Auto-propagated metadata

    database

    1

    1

    Ports

    Port type Number Required Description Metadata

    Input

    0

    For data records to be inserted

    Any

    Output

    0

    Load statistics

    See below

    Metadata

    SnowflakeBulkWriter does not propagate metadata.

    SnowflakeBulkWriter has a metadata template on the output port.

    Table 68. Output Metadata for SnowflakeBulkWriter
    Field number Field name Data type Description

    0

    file

    string

    Name and relative path of the staged chunk file. See Chunk files for more details.

    1

    status

    string

    Status: LOADED, LOAD_FAILED or PARTIALLY_LOADED.

    2

    parsedRows

    long

    Number of rows in the CSV file.

    3

    loadedRows

    long

    Number of rows that were successfully loaded.

    4

    errorLimit

    long

    If the number of errors reaches this limit, then abort. Either 1 or the same as parsedRows, if Continue on error.

    5

    errorCount

    long

    Number of rejected rows. Each of these rows could include multiple errors.

    6

    firstError

    string

    First error message in the file.

    7

    firstErrorLine

    long

    Line number of the first error in the file.

    8

    firstErrorCharacter

    long

    Position of the first error character.

    9

    firstErrorColumnName

    string

    Column name and index of the first error, e.g. "TABLE_NAME"["COLUMN_NAME":1]

    SnowflakeBulkWriter attributes

    Attribute Req Description Possible values

    Basic

    DB connection

    A Snowflake connection.

    e.g. ${CONN_DIR}/snowflake.cfg (id:JDBC0)

    DB table

    Name of the target table.

    e.g. Account

    Create table if not exists

    Creates the target table if it does not exist. Generates DDL from input port metadata. CloverDX must have CREATE TABLE privilege for the target schema.

    true (default) | false

    Continue on error

    Prevents the component from failing if an error occurs.

    false (default) | true

    Advanced

    Upload threads

    Max number of concurrent uploads, 5 by default.

    e.g. 5

    Remove files after import

    Set to false to prevent staged files from being deleted from Snowflake after the import.

    true (default) | false

    Details

    SnowflakeBulkWriter is a fast writer that can insert records into a Snowflake table at the speed of 25 MB/s or more (about 100 GB per hour). You can expect about 8 times speedup when compared with DatabaseWriter, depending on settings.

    Input fields are mapped to the target columns by position, so the number of input fields and their types must match the target table columns.

    The component has full Unicode support. It also allows writing byte fields into BINARY columns and JSON string data into VARIANT columns.

    For information about writing date/time data types and handling time zones, see Date/Time data types in Snowflake.

    If the target table does not exist, it is created based on the input metadata by default. See Create table if not exists attribute.

    Loading process:

    1. Split the data into chunks.

    2. Upload the chunks into Snowflake table stage in parallel.

    3. Load data from the stage to the table using the

      COPY INTO statement.

    If the output port is connected, the component produces one output record for every chunk. The output records contain load statistics (e. g. the number of loaded rows, error count and more). If Continue on error is enabled, there may also be error messages.

    Staged chunks are deleted when the process finishes, see Remove files after import.

    Chunk Files

    Incoming data is internally split into gzipped CSV files. These chunks are uploaded and inserted into Snowflake in parallel to improve performance. But due to the parallel loading, there is no guarantee input records will be written in the order they were received by the component.

    The staged files are named according to the following pattern:

    CloverDX_[RUN_ID][COMPONENT_ID][RANDOM_NUMBER]/chunk_[CHUNK NUMBER].csv.gz

    When a chunk is successfully loaded, it is deleted automatically.

    Error Handling

    By default, any error causes the component to fail and the transaction is rolled back.

    If Continue on error is enabled, the component continues loading data even if an error occurs. The data is internally split into chunks and the component produces one output record for every chunk, regardless of errors. If a chunk contains an error, Snowflake continues loading the file and rejects only the invalid rows. The output record contains the number of errors in the chunk and the first error message and its details. At most one error is reported for each chunk.

    The total number of errors is logged to the execution log as a warning.

    Resource Utilization

    The component chunks incoming data into gzipped CSVs and uploads them in parallel to increase throughput. The maximum number of concurrent uploads can be set using the Upload threads attribute.

    Data is buffered in memory before being uploaded to Snowflake. Memory consumption of the component is approximately 15 MB x [Upload threads].

    Using SnowflakeBulkWriter in Server Core

    If you want to use SnowflakeBulkWriter in a graph that runs in Server Core, the server needs to be configured to allow reflective access. See reflective access.

    Notes and limitations

    SnowflakeBulkWriter cannot write maps and lists. If you need to write the content of the map or list field, convert the field into string using Map first.

    The component does not preserve the order of incoming records. The COPY INTO statement loads data in parallel, so the records may appear out of order in Snowflake.