Version

    Replace errors

    The Replace Errors step replaces all errors in the selected column with the specified constant value or the result of the specified formula. Rows that do not have error in selected column will not be changed.

    When replacing with a constant, all errors in the selected column will be replaced with the same value.

    When replacing with a formula, the formula will be calculated only for rows where there is an error in selected column.

    Parameters
    • Input Column: required, column in which you’d like to replace the errors.

    • Replace errors with: required, configures what is the replacement value for errors in the column:

      • Value: allows you to configure constant value to replace all errors with. Replacement value type must match the type of your column. The type then determines the formatting you have to use. See more in Remarks section.

      • Formula: allows you to write a formula to calculate for each row with an error. The formula can access the original value of the cell in case the error is a data error. See more details in examples below.

    Example

    In our example invoices data set we could get data which looks like this:

    error handling generic
    Figure 56. Original data set with errors.

    Notice the Date paid column - it contains variety of errors that we can can fix with Replace errors step. We’ll be able to recover data from cells that contain incorrectly formatted dates and we’ll be able to remove n/a values from cells that contain it.

    Fixing the wrong formatting

    To fix the cells with wrong format, we can use a formula that attempts to parse the data in different format:

    replace errors fix format example

    Once we apply this step, we will see our data quality improve:

    replace errors fix format example after step1
    Figure 57. Data set after the first fix is applied.

    We can fix the Invoice amount column in similar way - some of the values in that column look like (9426.13). This is a number format sometimes used in accounting to represent negative values. To parse this format, we can use the same approach as for fixing date, but with slightly different step formula in the step:

    str2decimal($Invoice_amount, "0.00;(0.00)")

    This formula relies on the fact that we can specify two different patterns for numbers - the part of the pattern before semicolon specifies how to read positive number while the second part specifies how to read negative numbers.

    To clean the rest of the errors we can use Clear error cells step on Data paid column.

    replace error fix example last step
    Remarks
    • When writing value of Replace errors with parameter, follow these rules:

      • Integer columns: replacement value must be an integer written without any spaces or other symbols used for grouping (i.e., 123456 is ok, 123,456 is not).

      • Decimal columns: replacement value must be a decimal written with period as decimal symbol and without any digit grouping (i.e., 123456.05 is ok, 123,456.05 is not).

      • String columns: replacement value must be a string and must be written without enclosing double quotes (i.e., write example instead of "example" otherwise quotes will be used in your data).

      • Date columns: replacement must be a valid date or date + time. You must write dates in format yyyy-MM-dd and date-time values with format yyyy-MM-dd HH:mm:ss.SSS (you can omit seconds or millisecond if not needed). Examples: 2023-04-01, 2023-04-04 17:30, 2023-04-04 17:30:05.438.

      • Boolean columns: you must use either true or false.