Version

    Error handling steps

    Remove rows with errors

    The Remove rows with errors step removes all rows that have error in the specified column or columns.

    Parameters
    • Error position: required, select where to look for errors:

      • in any column: consider all columns from the whole data set and remove rows that contain errors in any column.

      • in selected column: only consider selected column and remove rows that contain error in that column only.

    Example

    Consider a data set like the one below where we have payments data and we were unable to convert $orderId column to integer. This means that we would not be able to join the data with order data and then further with customer data.

    remove rows with errors example source

    To avoid transforming rows that cannot be used, we can easily remove them all from the process by using Remove rows with errors configured to look into $orderId column. The output will then look like this:

    remove rows with errors example result
    Replace errors

    The Replace Errors step replaces all errors in the selected column with the specified constant value or the result of the specified formula. Rows that do not have any error in the selected columns will not be changed.

    When replacing with a constant, all errors in the selected column will be replaced with the same value.

    When replacing with a formula, the formula will be calculated only for rows where there is an error in the selected column.

    Parameters
    • Input Column: required, column in which you’d like to replace the errors.

    • Replace errors with: required, configures what is the replacement value for errors in the column:

      • Value: allows you to configure constant value to replace all errors with. The replacement value type must match the type of your column. The type then determines the formatting you have to use. See more in the Remarks section.

      • Formula: allows you to write a formula to calculate for each row with an error. The formula can access the original value of the cell in case the error is a data error. See more details in examples below.

    Example

    In our example invoices data set we could get data that looks like this:

    error handling generic
    Figure 63. Original data set with errors.

    Notice the Date paid column - it contains a variety of errors that we can fix with the Replace errors step. We’ll be able to recover data from the cells that contain incorrectly formatted dates and we’ll be able to remove n/a values from cells that contain it.

    Fixing wrong formatting

    To fix the cells with the wrong format, we can use a formula that attempts to parse the data in a different format:

    replace errors fix format example

    Once we apply this step, we will see our data quality improve:

    replace errors fix format example after step1
    Figure 64. Data set after the first fix is applied.

    We can fix the Invoice amount column in a similar way - some of the values in that column look like (9426.13). This is a number format sometimes used in accounting to represent negative values. To parse this format, we can use the same approach as for fixing the date, but with a slightly different step formula in the step:

    str2decimal($Invoice_amount, "0.00;(0.00)")

    This formula relies on the fact that we can specify two different patterns for numbers - the part of the pattern before the semicolon specifies how to read positive numbers while the second part specifies how to read negative numbers.

    To clean the rest of the errors we can use the Clear error cells step on the Data paid column.

    replace error fix example last step
    Remarks
    • When entering a value of the Replace errors with parameter, follow these rules:

      • Integer columns: replacement value must be an integer written without any spaces or other symbols used for grouping (i.e., 123456 is ok, 123,456 is not).

      • Decimal columns: replacement value must be a decimal written with a period as the decimal symbol and without any digit grouping (i.e., 123456.05 is ok, 123,456.05 is not).

      • String columns: replacement value must be a string and must be written without enclosing double quotes (i.e., write example instead of "example" otherwise quotes will be used in your data).

      • Date columns: replacement value must be a date or date and time using one of the following formats:

        • yyyy-MM-dd HH:mm:ss (e.g., 2023-08-01 08:09:07)

        • yyyy-MM-dd HH:mm (e.g., 2023-08-01 08:09)

        • yyyy-M-d H:m:s (e.g., 2023-8-1 8:9:7)

        • yyyy-M-d H:m (e.g., 2023-8-1 8:9)

        • yyyy-M-d (e.g., 2023-8-1) - in this case the date is considered as 2023-08-01 0:00:00

      • Boolean columns: you must use either true or false.

    Replace empty values

    Replace empty values allows you to replace No value or empty strings with a constant value. The step is useful, for example, to provide default values when the input does not provide a value.

    Parameters
    • Input column: required, a column of any type in which you’d like replacements to be made.

    • Replacement: required, value to use when an empty string or No value is encountered in specified input column. The data type and formatting of the replacement value depends on the data type of the input column:

      • boolean: use either true or false.

      • integer: use numbers without any grouping, e.g., use 123456 instead of 123 456.

      • decimal: write decimal values without any grouping and with a period as the decimal character, e.g., 123456.789 instead of 123,456.789.

      • string: write a string without enclosing quotes, e.g. replacement text.

      • date: write date or date and time using one of the following formats:

        • yyyy-MM-dd HH:mm:ss (e.g., 2023-08-01 08:09:07)

        • yyyy-MM-dd HH:mm (e.g., 2023-08-01 08:09)

        • yyyy-M-d H:m:s (e.g., 2023-8-1 8:9:7)

        • yyyy-M-d H:m (e.g., 2023-8-1 8:9)

        • yyyy-M-d (e.g., 2023-8-1) - in this case the date is considered as 2023-08-01 0:00:00

    • Target column: required, configure the column which will receive the output. The data type of the output column will remain without change.

      • Write result to the current column: overwrite the input column with the result.

      • Create new column with name: create a new column with the specified name. The name of the new column can contain spaces or special characters - the technical column name will be created automatically. The new column will be placed right after the input column.)

    Examples
    String values

    In this example, we read data from the Color column and write the result into the Color 2 column. Notice that the empty string in the second row was also replaced with n/a.

    replace empty string
    Date values

    When specifying the date replacement, the formatting of the Replacement parameter is always yyyy-MM-dd HH:mm:ss.SSS regardless of the formatting of the input column. In the following screenshot, we show two replacements:

    • The Transaction datetime column uses replacement 2023-05-01 without the time part. Midnight on the given date is used. The result is in the Transaction time 2 column.

    • The Due date uses replacement 2023-12-24 even though the column formatting is configured as d MMMM yyyy. The result is in the Due date 2 column.

    replace empty date
    Integer values

    We read from the Seats column and write the output into the Seats 2 column.

    replace empty integer
    Decimal values

    We read from the Paid amount column and write the output into the Paid amount 2 column.

    replace empty decimal
    Boolean values

    Replacement written as false is used. Note that you cannot use yes or no etc. in the replacement value, only true and false are allowed.

    replace empty boolean
    Clear error cells

    The Clear error cells step replaces all errors in the selected column with null value so they will show up as No value in the data preview. Cells that do not have errors are left without any changes.

    Parameters
    • Input Column: required, column in which to clear all errors. All data types are allowed.

    Example

    In the example below we are trying to calculate the how many days past the due date is an invoice. This is done in step #6 on the screenshot with Calculate formula step with dateDiff($Date_paid, $Due_date, day) formula.

    This formula does not take into consideration No value cells in the Date paid column - these are there for invoices which have not been paid yet. This means that for those rows, the formula results in an error:

    clear error cells example input
    Figure 65. An error in Paid after days caused by not handling No value cells in the Date paid column.

    To solve this, we can change the formula and only run it if we also have value in Paid date. This makes the formula harder to understand and especially in cases where the formula is more complex with more parameters it can be quite hard to see what it really does.

    A simpler solution is to leave the formula as is and simply clear the error values in Paid after days columns after the formula step. This can be done with Clear error cells step. Note that in this case using Replace errors step does not make sense as there is no default value to use for unpaid invoices.

    clear error cells example output
    Figure 66. Data set after clearing the error values in Paid after days column.
    Remarks
    • The step works with every data type.

    • The step does not distinguish between different kinds of errors and will replace the error for both data errors as well as step execution errors.

    Fill down

    The Fill down step replaces empty values in a column with the last preceding available non-empty value from the same column. This helps maintain consistency in datasets by ensuring that gaps in data are filled with relevant preceding values, making it useful for structured and sequential data processing.

    Parameters
    • Input column: required, select the column where the step will be applied to replace empty values.

    Example

    Imagine you have a sales report that includes the Region for each sale, but only the first sale in each region is labeled, and the following rows are left empty. To organize the data better, you want to fill down the region name for all sales in the same region.

    Region Sales Region Sales

    Before the Fill down step is applied

    After the Fill down step is applied

    North

    1000

    North

    1000

    No value

    1500

    North

    1500

    No value

    2000

    North

    2000

    South

    1200

    South

    1200

    No value

    1300

    South

    1300

    East

    1700

    East

    1700

    No value

    1800

    East

    1800