Version

    Data validation (data quality) steps

    Validation steps allow you to verify that your data is in the desired format. All values that fail the validation are highlighted in the dataset and marked as errors in the data preview to easily locate them. The number of errors is also included in the data quality bar.

    When you run a job that includes data errors, all rows with invalid values are automatically excluded from the output and placed in a reject file. See Job Run Details for more information on the reject files.

    For our recommendations on how to deal with data errors, see fixing errors.

    List of validation steps:

    Validate against list

    The Validate against list step allows you to validate values in a string column against a list of specified values (i.e., against enumeration). All values that are not found in the list are marked as invalid. Invalid values will be visible in the data preview and in the reject file. See Job Run Details for more information on reject files and fixing errors for our recommendations on how to deal with data errors.

    Parameters
    • Input column: required, select a string column to validate.

    • Values: required, enter the values to validate against. Values are entered as plain text with each value specified on a separate line. Lines are trimmed (their leading and trailing spaces are removed) before they are considered. Ordering of values does not matter.

    • Ignore case: checked by default. When checked, the validation is not case-sensitive (i.e., lower-case and upper-case letter are considered to be the same). If unchecked, value comparison is case sensitive, so "red" is not the same as "RED".

    • Accept empty values: checked by default. When checked, empty values (see working with empty values) are considered valid. If not checked, empty values are considered invalid, and the row will be rejected.

    • Error message: required, configure an error message that will be associated with invalid values. The error message is displayed when hovering over an identified empty value in the data set and included as the reject reason in the reject file. Default error message is Value is not in the allowed values list.

    Examples

    In the following examples we will test whether currency code is EUR or USD.

    Input value Ignore case Accept empty values Result Description

    EUR

    Yes (checked)

    Any

    Valid

    Value is in the list of valid values.

    eur

    Yes (checked)

    Any

    Valid

    Value is in the list of valid values and the case is ignored.

    eur

    No (unchecked)

    Any

    Invalid

    Value is in the list of valid values but the Ignore case option is unchecked.

    CZK

    Yes (checked)

    Any

    Invalid

    Value is not in the list of valid values.

    No value

    Any

    Yes (checked)

    Valid

    Valid because the Accept empty values option is checked.

    No value

    Any

    No (unchecked)

    Invalid

    Invalid because the Accept empty values option is unchecked.

    "    " (4 spaces)

    Any

    Yes (checked)

    Invalid

    String consisting of all white spaces is not considered empty and also does not appear in the list of allowed values.

    Remarks
    • If you have multi-line values or values with tab characters to validate, you can use escape sequences (\n, \t, etc.). For example enter Baker Street 221b\nNW1 6XE\nLondon to validate the following multi-line value:

    Baker Street 221b
    NW1 6XE
    London
    • To use a backslash itself as a delimiter, enter two backslashes.

    Validate credit card

    The Validate credit card step allows you to validate credit card numbers. The step validates the format of the number using the Luhn algorithm. The step can, therefore, detect card numbers with typos or missing digits, but it does not try to contact any credit card provider to verify that the credit card exists.

    All values that fail the validation are marked as errors in the dataset to easily identify them in the data preview, and rows with these errors are automatically rejected and included in the reject file when running the job. See Job Run Details for more information on reject files, and refer to fixing errors for our recommendations on how to deal with data errors.

    Parameters
    • Input column: required, a string or integer column containing credit card numbers with one card per row.

    • Accept empty values: configure whether an empty value is considered a valid credit card. Empty value is a value that is either null (shown as No value in data preview), zero-length string or a string composed entirely of whitespaces. By default, this is checked (i.e., empty values are considered valid).

    Examples
    Input value Accept empty values Result Description

    3579099126677753

    Any

    Valid

    Credit card written without any delimiters is accepted.

    3579099126677754

    Any

    Invalid

    In this case the checksum digit is not valid (last digit in the number).

    3576-9755-3407-6980

    Any

    Valid

    Cards with en dashes (-) are accepted.

    3576—​9755—​3407—​6980

    Any

    Invalid

    Cards with em dashes (—) are considered invalid.

    3576 9755 3407 6980

    Any

    Valid

    Cards with spaces are accepted.

    3576-- 9-- 755-3407-6980

    Any

    Valid

    Position and number of en dashes (-) and/or spaces do not impact the validity of the card number.

    3576*9755*3407*6980

    Any

    Invalid

    Star symbol is not allowed as a delimiter and this number is, therefore, considered invalid.

    No value

    Yes (checked)

    Valid

    This is a valid number since Accept empty values is checked.

    No value

    No (unchecked)

    Invalid

    Invalid since empty values are not allowed as per Accept empty setting.

    Error

    Any

    Error

    Calling the step on an Error will not run any validation and return an Error.

    Remarks
    • Only the syntax of the credit card number is verified. When checking the syntax, spaces and en dashes (-) are ignored. Any other delimiter characters will cause the card number to be considered invalid.

    • The step does not attempt to contact any credit card or payment provider to verify the existence of the credit card number.

    • Running validation on an Error value does not try to validate the value and instead produces an Error right away.

    See also

    Validate email

    The Validate email step allows you to validate syntax of an email address. The step only verifies the syntax of the email address - for example, that it has proper local and domain names, that it contains the "@" symbol etc. This step does not try to send an email to verify the existence of the email address.

    All values that fail the validation are marked as errors in the dataset to easily identify them in the data preview, and rows with these errors are automatically rejected and included in the reject file when running the job. See Job Run Details for more information on reject files, and refer to fixing errors for our recommendations on how to deal with data errors.

    Parameters
    • Input column: required, a string column containing email addresses with one address per row.

    • Accept empty values: configure whether an empty value is considered a valid email. Empty value is a value that is either null (shown as No value in data preview), zero-length string or a string composed entirely of whitespaces. By default, this is checked (i.e., empty emails are considered valid).

    Examples
    Input value Accept empty values Result Description

    example@example.com

    Any

    Valid

    A valid email.

        example@example.com   

    Any

    Valid

    The step strips out leading and trailing whitespaces before validation so this is a valid email.

    example @ example.com

    Any

    Invalid

    Email address cannot contain whitespaces in the middle - these are not removed during the validation.

    example.example.com

    Any

    Invalid

    An invalid email missing the domain part.

    No value

    Yes (checked)

    Valid

    This is valid email since Accept empty values is checked.

    No value

    No (unchecked)

    Invalid

    Invalid since empty values are not allowed as per Accept empty values setting.

    Remarks
    • The step only validates syntax of the email. It does not verify that the email (or even its domain) exists.

    • Running validation on an Error value does not try to validate the value and instead produces an Error right away.

    Validate if not empty

    The Validate if not empty step allows you to identify all empty values (see working with empty values) in the specified columns and mark them as validation errors. All invalid rows are automatically rejected at the end of the job and written to the reject file. See Job Run Details for more information on reject files and fixing errors for our recommendations on how to deal with data errors.

    Parameters
    • Columns to validate: required, select one or more columns to test whether they are empty or not. If multiple columns are selected, all of them must have non-empty values for the row to be considered valid.

    • Error message: specify the error message that will be associated with the validated values of the columns that are empty. If multiple columns have empty values, all of them will be marked as invalid. This error message will be displayed when hovering over an identified empty value in the data set and included as the reject reason in the reject file. It defaults to "Value cannot be empty".

    Examples

    The step was configured to identify empty values in a column containing email addresses.

    Input value Result Description

    john.smith@testcompany.com

    Valid

    The value is not empty.

    No value

    Invalid

    An empty value is identified.

    "    " (all spaces)

    Valid

    Space characters are not considered empty values. Use isBlank function and Validate with formula step to validate value like this.

    Validate pattern match

    The Validate pattern match allows you to validate values in a string column using a pattern defined by a regular expression. You can read more about regular expressions in CloverDX here.

    All values that fail the validation are marked as errors in the data set to easily identify them in the data preview. Rows with these errors are automatically rejected and included in the reject file when running the job. See Job Run Details for more information on reject files, and refer to fixing errors for our recommendations on how to deal with data errors.

    Parameters
    • Input column: required, select a string column to validate.

    • Pattern: required, the regular expression to use in the validation.

    • Accept empty values: checked by default. When checked, empty values (see working with empty values) are considered valid. When not checked, empty values are reported as validation errors.

    • Error message: specify the error message that will be associated with the validated value. This message will be displayed when hovering over an invalid value in the dataset or included as the reject reason in the reject file. The default value is Value does not match regular expression.

    Examples

    Validation using regular expression patterns is frequently used to validate basic data patterns - for example various identification numbers, date formats and more.

    In the following text you can see few basic example patterns that should help you get an idea about how the step works. For more details about how to write complex regular expressions, we recommend looking online for regular expression tutorials. Note that there are many types of regular expression engines that use slightly different syntax. CloverDX uses Java regular expression syntax.

    • Social security numbers: social security numbers are 9-digit numbers frequently written in three groups - for example 123-45-6789. To validate that a text column matches this format, use regular expression pattern [0-9]{3}-[0-9]{2}-[0-9]{4}. Alternatively, you can use \d{3}-\d{2}-\d{4}.

    • Tax identification number in Czech Republic starts with letters CZ and then contains 8 to 10 digits. This can be validated using pattern CZ[0-9]{8,10} or CZ\d{8.10}. Note that regular expressions are case-sensitive, so this expression will not match value cz12345678 while it will match CZ12345678. If you wish to also accept lower-case form of the number, you can use (?i) modifier to enable case-insensitive matching: (?i)CZ[0-9]{8,10}.

    • National Insurance Number in UK is formatted as two prefix letters, six digits and one suffix letter. To validate this in a simple manner, you could use expression like [A-Z]{2}[0-9]{6}[A-Z]. However, this will be only very rough validation since not all letters are allowed to be used in prefix or suffix. A more complete pattern would need to exclude certain letters to get [A-Z&&[^DFIQUV]][A-Z&&[^DFIQUVO]][0-9]{6}[A-D].

    Note that when using regular expressions, you might be tempted to write regular expressions for credit cards or emails. In those cases we recommend using the Validate credit card or Validate email steps instead - they are more powerful and precise and will provide better error message than pure regular expression.

    Input value Regular expression Accept empty values Result

    "123-45-6789"

    [0-9]{3}-[0-9]{2}-[0-9]{4}

    Yes

    Valid

    "" (empty value)

    [0-9]{3}-[0-9]{2}-[0-9]{4}

    Yes

    Valid

    "" (empty value)

    [0-9]{3}-[0-9]{2}-[0-9]{4}

    No

    Invalid - empty values are not accepted.

    null

    (any)

    Yes

    Valid - empty values are accepted.

    null

    (any)

    No

    Invalid - empty values are not accepted.

    "XYZ"

    X`

    Yes

    Invalid. Validation applies to the whole value.

    "XYZ"

    .*X.*

    Yes

    Valid. We explicitly allow any characters before and after X.

    Remarks
    • Validation always applies to whole input value. To match a substring, use wildcards like .* before and after your pattern.

    • Pattern matching is case-sensitive. To perform case-insensitive matching, add (?i) at the beginning of your pattern.

    Validate phone number

    The Validate phone number step allows you to validate phone numbers. The step understands the phone number formats and regional differences between phone numbers (like the number of digits, prefixes, …​) and will verify that the number follows the rules for the selected region.

    This step does not try to validate whether the number exists - it validates the format of the number only.

    All values that fail the validation are marked as errors in the dataset to easily identify them in the data preview, and rows with these errors are automatically rejected and included in the reject file when running the job. See Job Run Details for more information on reject files, and refer to fixing errors for our recommendations on how to deal with data errors.

    Parameters
    • Input column: required, a string or integer column containing phone numbers with one number per row.

    • Region: select which region to use when validating the numbers. The region defines rules that the phone numbers must follow - for example, the number of digits, prefixes, area codes, and more. Typically, regions represent countries - like the Unites States, Germany, and many others. One special region - None - is provided in case you do not have any specific regional requirements for the numbers. This region requires phone numbers to have international dialing the prefix (e.g., +1 etc.).

    • Accept empty: configure whether an empty value is considered a valid phone number. An empty value is a value that is either null (shown as No value in data preview), a zero-length string, or a string composed entirely of whitespaces. By default, this is checked (i.e., empty values are considered valid).

    Examples
    Input value Accept empty values Region Result Description

    478.629.2062

    Any

    US

    Valid

    Valid US phone number.

      478   629.2062   

    Any

    US

    Valid

    Valid US phone number - whitespaces before and in the middle do not affect the validity of the number.

    +44 (0) 203 789 2070

    Any

    US

    Valid

    Numbers with international dialing code (+44 - UK in this example) are valid even if a specific region is selected (e.g., US). These numbers can be dialed from any country and if they match the local rules as defined by their dialing code, they are considered valid.

    No value

    Yes (checked)

    Any

    Valid

    This is a valid number since Accept empty is checked.

    No value

    No (unchecked)

    Any

    Invalid

    Invalid since empty values are not allowed as per the Accept empty setting.

    Remarks
    • Only the syntax of the phone number is checked. The step does not try to verify that the phone number exists or is assigned to a user.

    • Running validation on an Error value does not try to validate the value and instead produces an Error right away.

    Validate text length

    The Validate text length step allows you to validate if values in a string column match the specified length range (i.e., number of characters). Strings that are too long or too short are marked as invalid and will be shown in the data preview and written to a reject file when running the job. See Job Run Details for more information on reject files and fixing errors for our recommendations on how to deal with data errors.

    Parameters
    • Input column: required, select a string column to validate.

    • Minimum length: optional [1], specify the minimum[2] value length.

      • When not specified, all strings with at most Maximum length characters are considered valid.

    • Maximum length: optional [1], specify the maximum[2] value length.

      • When not specified, all strings with at least Minimum length characters are considered valid.

    • Accept empty values: configure whether empty values are considered empty or not. By default, this option is checked (i.e., empty values are valid).

    • Error message: configure custom error message that is associated with invalid values and is be displayed when hovering over an invalid value in the data preview. Default error message is Value length is outside of allowed range.

    Note that all characters, including trailing and leading white spaces, are counted when calculating the total number of characters in a string.

    1

    At least one of the boundaries must be specified.

    2

    The Maximum value must be greater than the Minimum value.

    Examples
    Input value Minimum length Maximum length Accept empty Values Input value length Result Description

    INV7854784

    8

    10

    Any

    10

    Valid

    Value is within the 8-10 range.

    INV00045565

    8

    10

    Any

    11

    Invalid

    Value is not within the 8-10 range.

    INV4589

    8

    10

    Any

    7

    Invalid

    Value is not within the 8-10 range.

    "  INV4589    "

    8

    10

    Any

    11

    Invalid

    Value is not within the 8-10 range because 2 leading and 2 trailing space characters are present.

    T

    1

    1

    Any

    1

    Valid

    Value has exactly 1 character.

    True

    1

    1

    Any

    4

    Invalid

    Value has more than 1 character.

    INV35

    -

    10

    Any

    5

    Valid

    Value has less than 10 characters.

    INV0056544589

    8

    -

    Any

    13

    Valid

    Value has more than 8 characters.

    No value

    Any

    Any

    Yes (checked)

    0

    Valid

    Valid because the Accept empty values option is checked.

    No value

    Any

    Any

    Yes (checked)

    0

    Invalid

    Invalid because the Accept empty values option is unchecked.

    Remarks
    • If you want to ensure that there are no trailing or leading spaces prior to the validation, use the Trim step first.

    Validate value range

    The Validate value range step allows you to validate if values in a column match the specified value range. All values that fail the validation are marked as validation errors in the data set and rows with these errors are automatically rejected and included in the reject file when running the job. See Job Run Details for more information on reject files and fixing errors for our recommendations on how to deal with data errors.

    Parameters
    • Input column: required, select a column to validate. You can select any column with types that can be sorted - string, decimal, integer as well as date columns.

    • Minimum value: optional[1], enter the minimum[2] value. If not specified, the minimum is not tested (i.e., there is no lower bound for the value).

    • Maximum value: optional[1], enter the maximum[2] value. If not specified, the maximum is not tested (i.e., there is no upper bound for the value).

    • Include minimum: configure whether the minimum is included in the range or not (i.e., whether the range is closed or open on the lower bound). When checked, the minimum value is included in the range - values larger than or equal to the minimum are valid. When unchecked, only values larger than the minimum are valid. By default, this is checked.

    • Include maximum: configure whether the maximum is included in the range or not (i.e., whether the range is closed or open on the upper bound). When checked, the maximum value is included in the range - values smaller than or equal to the maximum are valid. When unchecked, only values smaller than the maximum are valid. By default, this is checked.

    • Ignore case for string values: configure whether string comparisons are case-sensitive or not. If checked, the value comparison is not case-sensitive. By default, this option is checked.

    • Accept empty values: configure whether empty values (see working with empty values for more information) are considered valid or not. If checked, empty values are considered valid. This option is checked by default.

    • Error message: configure a custom error message that will be associated with invalid values. The error message is displayed when hovering over an invalid value in the data preview and is included as the reject reason in the reject file. The default error message is Value is out of allowed range.

    Entering value range

    When configuring Minimum value and Maximum value for the step, the following rules apply depending on the data type:

    • Integer columns: do not use any digit grouping symbol (i.e., write 1234 instead of 1,234).

    • Decimal columns: you must use period as the decimal separator and you must not use any digit grouping symbols (i.e., write 1234.56 and not 1,234.56).

    • Date columns: dates must be entered in one of the following formats:

      • yyyy-MM-dd HH:mm:ss (e.g., 2023-08-01 08:09:07)

      • yyyy-MM-dd HH:mm (e.g., 2023-08-01 08:09)

      • yyyy-M-d H:m:s (e.g., 2023-8-1 8:9:7)

      • yyyy-M-d H:m (e.g., 2023-8-1 8:9)

      • yyyy-M-d (e.g., 2023-8-1) - in this case, the date is considered as 2023-08-01 0:00:00

    1

    At least one of the boundaries must be specified.

    2

    The Maximum value must be greater than the Minimum value.

    Examples
    String columns

    The following examples show how the step can be used when validating string columns.

    Input value Min value Max value Include minimum Include maximum Ignore case Accept empty values Result Description

    INV33333

    INV30000

    INV40000

    Yes (checked)

    Yes (checked)

    Yes (checked)

    Any

    Valid

    Value is within the specified range.

    inv33333

    INV30000

    INV40000

    Yes (checked)

    Yes (checked)

    No (unchecked)

    Any

    Invalid

    Value is not valid because the Ignore case for string values option is unchecked.

    INV30000

    INV30000

    INV40000

    Yes (checked)

    Yes (checked)

    Yes (checked)

    Any

    Valid

    Value is within the specified range because the Include minimum option is checked.

    INV30000

    INV30000

    INV40000

    No (unchecked)

    Yes (checked)

    Yes (checked)

    Yes (checked)

    Invalid

    Value is not within the specified range because the Include minimum option is unchecked.

    INV40000

    INV30000

    INV40000

    Yes (checked)

    Yes (checked)

    Yes (checked)

    Any

    Valid

    Value is within the specified range because the Include maximum option is checked.

    INV40000

    INV30000

    INV40000

    Yes (checked)

    No (unchecked)

    Yes (checked)

    Any

    Invalid

    Value is not within the specified range because the Include maximum option is unchecked.

    Decimal and integer columns

    The following table provides examples of the step usage for numeric columns - decimals or integers.

    Input value Min value Max value Include minimum Include maximum Ignore case Accept empty values Result Description

    9.99

    8.50

    14.50

    Yes (checked)

    Yes (checked)

    Any

    Any

    Valid

    Value is within the specified range.

    8.50

    8.50

    14.50

    Yes (checked)

    Yes (checked)

    Any

    Any

    Valid

    Value is within the specified range because the Include minimum option is checked.

    8.50

    8.50

    14.50

    No (unchecked)

    Yes (checked)

    Any

    Any

    Invalid

    Value is not within the specified range because the Include minimum option is unchecked.

    14

    8

    14

    Yes (checked)

    Yes (checked)

    Any

    Any

    Valid

    Value is within the specified range because the Include maximum option is checked.

    14

    8

    14

    Yes (checked)

    No (unchecked)

    Any

    Any

    Invalid

    Value is not within the specified range because the Include maximum option is unchecked.

    Date columns

    The following table provides examples of how to use the step to validate date columns.

    Input value Min value Max value Include minimum Include maximum Ignore case Accept empty values Result Description

    2023-03-03

    2023-01-01 00:00:00

    2023-07-01 00:00:00

    Yes (checked)

    Yes (checked)

    Any

    Any

    Valid

    Value is within the specified range.

    2023-01-01

    2023-01-01 00:00:00

    2023-07-01 00:00:00

    Yes (checked)

    Yes (checked)

    Any

    Any

    Valid

    Value is within the specified range because the Include minimum option is checked.

    2023-01-01

    2023-01-01 00:00:00

    2023-07-01 00:00:00

    No (unchecked)

    Yes (checked)

    Any

    Any

    Invalid

    Value is not within the specified range because the Include minimum option is unchecked.

    2023-07-01

    2023-01-01 00:00:00

    2023-07-01 00:00:00

    Yes (checked)

    Yes (checked)

    Any

    Any

    Valid

    Value is within the specified range because the Include maximum option is checked.

    2023-07-01

    2023-01-01 00:00:00

    2023-07-01 00:00:00

    Yes (checked)

    No (unchecked)

    Any

    Any

    Invalid

    Value is not within the specified range because the Include maximum option is unchecked.

    Empty values
    Input value Min value Max value Include minimum Include maximum Ignore case Accept empty values Result Description

    No value

    Any

    Any

    Any

    Any

    Any

    Yes (checked)

    Valid

    Valid because the Accept empty values option is checked.

    No value

    Any

    Any

    Any

    Any

    Any

    No (unchecked)

    Invalid

    Invalid because the Accept empty values option is unchecked.

    Validate with formula

    The Validate with formula step allows you to validate values in a column based on result of validation formula. If the validation formula returns true, the value is considered valid. Otherwise, the value is considered invalid, and the step allows you to configure a custom error message that will be reported in the data preview.

    Rows with these errors are automatically rejected and included in the reject file when running the job. See Job Run Details for more information on reject files, and refer to fixing errors for our recommendations on how to deal with data errors.

    Parameters
    • Formula: the validation formula that is calculated for each row in your data set. The formula can use any number of columns - it does not have to be limited to just one column in your data. The formula must return true or false.

    • Error message: an error message that will be reported on each column that is contained within the formula if the formula returned false. The error message is a static text - it cannot contain references to column names or any calculations.

    Usage

    The Validate with formula step works by evaluating the validation formula for each row in your data set. Based on the result (whether the formula returns true or false) it can mark values in a row as invalid.

    The values are considered invalid if the formula returned false. In such case, values from all columns that are referenced in the formula will be marked as invalid.

    step validate with formula
    Figure 62. Simple configuration to validate that last name is at least 1 character long.

    See below for additional examples of how to use the step to validate your data in common situations.

    Note that the Validate with formula step never changes your data in any way - it can attach a validation error to a value but cannot change the value itself.

    Common validation rules

    In many cases the validation rules you’d need for your data will be quite simple. This section provides overview of the most common types of validation and corresponding validation formulas. You can use these as guidelines for writing your own formulas for similar validations.

    Empty and non-empty values

    Frequently you’ll need to validate whether a column value is not empty - i.e., the value is required. The formula depends on the type of your column:

    • numeric columns (decimals or integers): $amount != null, same way for integer as well as decimal columns.

    • string columns: multiple different formulas exist depending on what is considered to be an empty string:

      • !isBlank($name) uses isBlank function to determine if the value is a zero-length string, null or if the string is composed entirely of whitespaces. isBlank also considers strings consisting of whitespaces to be empty - for example, a single space will be considered an empty string, too.

      • $name != null tests for null values only. Zero-length strings will be considered valid.

      • $name != "" tests zero-length strings only, null values will be considered valid.

      • $name != "" && $name != null tests whether the $name is zero-length or null. Any other string (including whitespaces) is considered valid.

    • date columns: $dueDate != null

    • boolean columns: $isActive != null

    Value in range

    You can use conditional expressions to test whether value is in specific range - whether the range is bounded on the lower bound only, upper bound only or both.

    • $amount > 0 to validate that the value is greater than 0. Note that this will fail if the value is null (empty). To reject empty values as well as values that are less than zero, you’d need to use a formula like $amount != null && $amount > 0.

    • $yearOfBirth >= 1900 && $yearOfBirth <= 2023 to test whether $yearOfBirth is between 1900 and 2023 (including both endpoints).

    Values matching patterns

    To match against patterns, regular expressions can be used like this:

    • $accountId ~ "A[0-9]+" will validate that the account id consists of upper case letter A followed by at least one digit.

    • matches($accountId, "A[0-9]+") this is an alternative way - using matches function instead of ~ operator. The result is the same as in the previous example.

    • matches($zipCode, "[0-9]{5}(-[0-9]{4})?") to verify that $zipCode is either 5-digit format or 5+4 digits.

    String length

    Often times you need to ensure that values are long enough or not too long. This can be accomplished by using length function:

    • length($lastName) > 1 to ensure that $lastName is at least one character long.

    • length($lastName) < 50 to ensure that $lastName is 50 characters at maximum.

    Values in lists (enumerations)

    To test whether value is in a list or not, you can use containsValue function. This works for all data types although string values are most common.

    • containsValue(["red", "green", "blue"], $color) test whether the $color is one of "red", "green" or "blue". The test is exact - i.e. "RED" is considered invalid.

    • containsValue(["red", "green", "blue"], lowerCase($color)) is similar to above but also will allow values like "RED" as valid values. The value of $color is first converted to lower-case using lowerCase function before it is tested against the list of allowed values.

    • containsValue([1, 2, 3], $accountType) validates whether $accountType is one of 1, 2 or 3. Note the list does not contain quotes - the values are integers and must not be quoted.

    If you wish to test value against a large list (more than handful of items), it is better to use Lookup step. Lookup step allows you to store even larger number of items to test against in a file or the values can be provided by a connector available in the Data Catalog.

    Complex validation rules

    The validation expression can be of any complexity. We recommend that you split the expression into multiple steps if it becomes too long or too complicated - this will improve the readability of your transformation.

    Multiple columns in formula

    It is possible to use more than one column when validating your data. For example, consider an invoice where you have columns for total without tax, tax amount and total amount with tax. All three values must match for the invoice to be considered valid. You can validate this with a formula like $totalBeforeTax + $tax == $totalWithTax. If this formula returns false, all three values will be marked as invalid since without additional information it is not possible to say which of the values is valid or not (i.e., is the tax calculated incorrectly and total amount is ok? Or is the total amount incorrect while the tax and total before tax are ok?).

    Remarks
    • If the calculation of a formula in the validation step failed for any reason, the value is marked as "in error" with the error message describing the validation formula failure.

    • If the validation formula returns null, an error message is generated and the value is marked as invalid.

    • Running validation on an Error value does not try to validate the value and instead produces an Error right away.