Baker Street 221b NW1 6XE London
Data validation (data quality) steps
Validation steps allow you to verify that your data is in the desired format. All values that fail the validation are highlighted in the dataset and marked as errors in the data preview to easily locate them. The number of errors is also included in the data quality bar.
When you run a job that includes data errors, all rows with invalid values are automatically excluded from the output and placed in a reject file. See Job Run Details for more information on the reject files.
For our recommendations on how to deal with data errors, see fixing errors.
List of validation steps:
-
Validate against list: validate if a value is in a list of allowed values.
-
Validate credit card: validate format of credit card numbers.
-
Validate email: validate format of email addresses.
-
Validate if not empty: validate if values are not empty.
-
Validate pattern match: validate if values match a pattern.
-
Validate phone number: validate format of phone numbers with support for international number formats.
-
Validate text length: validate text length.
-
Validate value range: validate if values are within specified range.
-
Validate with formula: validate data using your custom formula.
Validate against list
The Validate against list step allows you to validate values in a string column against a list of specified values (i.e., against enumeration). All values that are not found in the list are marked as invalid. Invalid values will be visible in the data preview and in the reject file. See Job Run Details for more information on reject files and fixing errors for our recommendations on how to deal with data errors.
Parameters
-
Input column: required, select a string column to validate.
-
Values: required, enter the values to validate against. Values are entered as plain text with each value specified on a separate line. Lines are trimmed (their leading and trailing spaces are removed) before they are considered. Ordering of values does not matter.
-
Ignore case: checked by default. When checked, the validation is not case-sensitive (i.e., lower-case and upper-case letter are considered to be the same). If unchecked, value comparison is case sensitive, so "red" is not the same as "RED".
-
Accept empty values: checked by default. When checked, empty values (see working with empty values) are considered valid. If not checked, empty values are considered invalid, and the row will be rejected.
-
Error message: required, configure an error message that will be associated with invalid values. The error message is displayed when hovering over an identified empty value in the data set and included as the reject reason in the reject file. Default error message is Value is not in the allowed values list.
Examples
In the following examples we will test whether currency code is EUR or USD.
Input value | Ignore case | Accept empty values | Result | Description |
---|---|---|---|---|
|
Yes (checked) |
Any |
Valid |
Value is in the list of valid values. |
|
Yes (checked) |
Any |
Valid |
Value is in the list of valid values and the case is ignored. |
|
No (unchecked) |
Any |
Invalid |
Value is in the list of valid values but the Ignore case option is unchecked. |
|
Yes (checked) |
Any |
Invalid |
Value is not in the list of valid values. |
No value |
Any |
Yes (checked) |
Valid |
Valid because the Accept empty values option is checked. |
No value |
Any |
No (unchecked) |
Invalid |
Invalid because the Accept empty values option is unchecked. |
" " (4 spaces) |
Any |
Yes (checked) |
Invalid |
String consisting of all white spaces is not considered empty and also does not appear in the list of allowed values. |
Remarks
-
If you have multi-line values or values with tab characters to validate, you can use escape sequences (
\n
,\t
, etc.). For example enterBaker Street 221b\nNW1 6XE\nLondon
to validate the following multi-line value:
-
To use a backslash itself as a delimiter, enter two backslashes.
Validate credit card
The Validate credit card step allows you to validate credit card numbers. The step validates the format of the number using the Luhn algorithm. The step can, therefore, detect card numbers with typos or missing digits, but it does not try to contact any credit card provider to verify that the credit card exists.
All values that fail the validation are marked as errors in the dataset to easily identify them in the data preview, and rows with these errors are automatically rejected and included in the reject file when running the job. See Job Run Details for more information on reject files, and refer to fixing errors for our recommendations on how to deal with data errors.
Parameters
-
Input column: required, a string or integer column containing credit card numbers with one card per row.
-
Accept empty values: configure whether an empty value is considered a valid credit card. Empty value is a value that is either
null
(shown as No value in data preview), zero-length string or a string composed entirely of whitespaces. By default, this is checked (i.e., empty values are considered valid).
Examples
Input value | Accept empty values | Result | Description |
---|---|---|---|
|
Any |
Valid |
Credit card written without any delimiters is accepted. |
|
Any |
Invalid |
In this case the checksum digit is not valid (last digit in the number). |
|
Any |
Valid |
Cards with en dashes (-) are accepted. |
|
Any |
Invalid |
Cards with em dashes (—) are considered invalid. |
|
Any |
Valid |
Cards with spaces are accepted. |
|
Any |
Valid |
Position and number of en dashes (-) and/or spaces do not impact the validity of the card number. |
|
Any |
Invalid |
Star symbol is not allowed as a delimiter and this number is, therefore, considered invalid. |
No value |
Yes (checked) |
Valid |
This is a valid number since Accept empty values is checked. |
No value |
No (unchecked) |
Invalid |
Invalid since empty values are not allowed as per Accept empty setting. |
Error |
Any |
Error |
Calling the step on an Error will not run any validation and return an Error. |
Remarks
-
Only the syntax of the credit card number is verified. When checking the syntax, spaces and en dashes (-) are ignored. Any other delimiter characters will cause the card number to be considered invalid.
-
The step does not attempt to contact any credit card or payment provider to verify the existence of the credit card number.
-
Running validation on an Error value does not try to validate the value and instead produces an Error right away.
Validate email
The Validate email step allows you to validate syntax of an email address. The step only verifies the syntax of the email address - for example, that it has proper local and domain names, that it contains the "@" symbol etc. This step does not try to send an email to verify the existence of the email address.
All values that fail the validation are marked as errors in the dataset to easily identify them in the data preview, and rows with these errors are automatically rejected and included in the reject file when running the job. See Job Run Details for more information on reject files, and refer to fixing errors for our recommendations on how to deal with data errors.
Parameters
-
Input column: required, a string column containing email addresses with one address per row.
-
Accept empty values: configure whether an empty value is considered a valid email. Empty value is a value that is either
null
(shown as No value in data preview), zero-length string or a string composed entirely of whitespaces. By default, this is checked (i.e., empty emails are considered valid).
Examples
Input value | Accept empty values | Result | Description |
---|---|---|---|
|
Any |
Valid |
A valid email. |
Any |
Valid |
The step strips out leading and trailing whitespaces before validation so this is a valid email. |
|
|
Any |
Invalid |
Email address cannot contain whitespaces in the middle - these are not removed during the validation. |
|
Any |
Invalid |
An invalid email missing the domain part. |
No value |
Yes (checked) |
Valid |
This is valid email since Accept empty values is checked. |
No value |
No (unchecked) |
Invalid |
Invalid since empty values are not allowed as per Accept empty values setting. |
Remarks
-
The step only validates syntax of the email. It does not verify that the email (or even its domain) exists.
-
Running validation on an Error value does not try to validate the value and instead produces an Error right away.
Validate if not empty
The Validate if not empty step allows you to identify all empty values (see working with empty values) in the specified columns and mark them as validation errors. All invalid rows are automatically rejected at the end of the job and written to the reject file. See Job Run Details for more information on reject files and fixing errors for our recommendations on how to deal with data errors.
Parameters
-
Columns to validate: required, select one or more columns to test whether they are empty or not. If multiple columns are selected, all of them must have non-empty values for the row to be considered valid.
-
Error message: specify the error message that will be associated with the validated values of the columns that are empty. If multiple columns have empty values, all of them will be marked as invalid. This error message will be displayed when hovering over an identified empty value in the data set and included as the reject reason in the reject file. It defaults to "Value cannot be empty".
Examples
The step was configured to identify empty values in a column containing email addresses.
Input value | Result | Description |
---|---|---|
|
Valid |
The value is not empty. |
No value |
Invalid |
An empty value is identified. |
|
Valid |
Space characters are not considered empty values. Use |
Validate pattern match
The Validate pattern match allows you to validate values in a string column using a pattern defined by a regular expression. You can read more about regular expressions in CloverDX here.
All values that fail the validation are marked as errors in the data set to easily identify them in the data preview. Rows with these errors are automatically rejected and included in the reject file when running the job. See Job Run Details for more information on reject files, and refer to fixing errors for our recommendations on how to deal with data errors.
Parameters
-
Input column: required, select a string column to validate.
-
Pattern: required, the regular expression to use in the validation.
-
Accept empty values: checked by default. When checked, empty values (see working with empty values) are considered valid. When not checked, empty values are reported as validation errors.
-
Error message: specify the error message that will be associated with the validated value. This message will be displayed when hovering over an invalid value in the dataset or included as the reject reason in the reject file. The default value is Value does not match regular expression.
Examples
Validation using regular expression patterns is frequently used to validate basic data patterns - for example various identification numbers, date formats and more.
In the following text you can see few basic example patterns that should help you get an idea about how the step works. For more details about how to write complex regular expressions, we recommend looking online for regular expression tutorials. Note that there are many types of regular expression engines that use slightly different syntax. CloverDX uses Java regular expression syntax.
-
Social security numbers: social security numbers are 9-digit numbers frequently written in three groups - for example 123-45-6789. To validate that a text column matches this format, use regular expression pattern
[0-9]{3}-[0-9]{2}-[0-9]{4}
. Alternatively, you can use\d{3}-\d{2}-\d{4}
. -
Tax identification number in Czech Republic starts with letters CZ and then contains 8 to 10 digits. This can be validated using pattern
CZ[0-9]{8,10}
orCZ\d{8.10}
. Note that regular expressions are case-sensitive, so this expression will not match valuecz12345678
while it will matchCZ12345678
. If you wish to also accept lower-case form of the number, you can use(?i)
modifier to enable case-insensitive matching:(?i)CZ[0-9]{8,10}
. -
National Insurance Number in UK is formatted as two prefix letters, six digits and one suffix letter. To validate this in a simple manner, you could use expression like
[A-Z]{2}[0-9]{6}[A-Z]
. However, this will be only very rough validation since not all letters are allowed to be used in prefix or suffix. A more complete pattern would need to exclude certain letters to get[A-Z&&[^DFIQUV]][A-Z&&[^DFIQUVO]][0-9]{6}[A-D]
.
Note that when using regular expressions, you might be tempted to write regular expressions for credit cards or emails. In those cases we recommend using the Validate credit card or Validate email steps instead - they are more powerful and precise and will provide better error message than pure regular expression.
Input value | Regular expression | Accept empty values | Result |
---|---|---|---|
|
|
Yes |
Valid |
"" (empty value) |
|
Yes |
Valid |
"" (empty value) |
|
No |
Invalid - empty values are not accepted. |
null |
(any) |
Yes |
Valid - empty values are accepted. |
null |
(any) |
No |
Invalid - empty values are not accepted. |
"XYZ" |
|
Yes |
Invalid. Validation applies to the whole value. |
"XYZ" |
|
Yes |
Valid. We explicitly allow any characters before and after X. |
Remarks
-
Validation always applies to whole input value. To match a substring, use wildcards like
.*
before and after your pattern. -
Pattern matching is case-sensitive. To perform case-insensitive matching, add
(?i)
at the beginning of your pattern.
Validate phone number
The Validate phone number step allows you to validate phone numbers. The step understands the phone number formats and regional differences between phone numbers (like the number of digits, prefixes, …) and will verify that the number follows the rules for the selected region.
This step does not try to validate whether the number exists - it validates the format of the number only.
All values that fail the validation are marked as errors in the dataset to easily identify them in the data preview, and rows with these errors are automatically rejected and included in the reject file when running the job. See Job Run Details for more information on reject files, and refer to fixing errors for our recommendations on how to deal with data errors.
Parameters
-
Input column: required, a string or integer column containing phone numbers with one number per row.
-
Region: select which region to use when validating the numbers. The region defines rules that the phone numbers must follow - for example, the number of digits, prefixes, area codes, and more. Typically, regions represent countries - like the Unites States, Germany, and many others. One special region - None - is provided in case you do not have any specific regional requirements for the numbers. This region requires phone numbers to have international dialing the prefix (e.g., +1 etc.).
-
Accept empty: configure whether an empty value is considered a valid phone number. An empty value is a value that is either
null
(shown as No value in data preview), a zero-length string, or a string composed entirely of whitespaces. By default, this is checked (i.e., empty values are considered valid).
Examples
Input value | Accept empty values | Region | Result | Description |
---|---|---|---|---|
|
Any |
US |
Valid |
Valid US phone number. |
|
Any |
US |
Valid |
Valid US phone number - whitespaces before and in the middle do not affect the validity of the number. |
|
Any |
US |
Valid |
Numbers with international dialing code (+44 - UK in this example) are valid even if a specific region is selected (e.g., US). These numbers can be dialed from any country and if they match the local rules as defined by their dialing code, they are considered valid. |
No value |
Yes (checked) |
Any |
Valid |
This is a valid number since Accept empty is checked. |
No value |
No (unchecked) |
Any |
Invalid |
Invalid since empty values are not allowed as per the Accept empty setting. |
Remarks
-
Only the syntax of the phone number is checked. The step does not try to verify that the phone number exists or is assigned to a user.
-
Running validation on an Error value does not try to validate the value and instead produces an Error right away.
Validate text length
The Validate text length step allows you to validate if values in a string column match the specified length range (i.e., number of characters). Strings that are too long or too short are marked as invalid and will be shown in the data preview and written to a reject file when running the job. See Job Run Details for more information on reject files and fixing errors for our recommendations on how to deal with data errors.
Parameters
-
Input column: required, select a string column to validate.
-
Minimum length: optional [1], specify the minimum[2] value length.
-
When not specified, all strings with at most Maximum length characters are considered valid.
-
-
Maximum length: optional [1], specify the maximum[2] value length.
-
When not specified, all strings with at least Minimum length characters are considered valid.
-
-
Accept empty values: configure whether empty values are considered empty or not. By default, this option is checked (i.e., empty values are valid).
-
Error message: configure custom error message that is associated with invalid values and is be displayed when hovering over an invalid value in the data preview. Default error message is Value length is outside of allowed range.
Note that all characters, including trailing and leading white spaces, are counted when calculating the total number of characters in a string.
Examples
Input value | Minimum length | Maximum length | Accept empty Values | Input value length | Result | Description |
---|---|---|---|---|---|---|
|
8 |
10 |
Any |
10 |
Valid |
Value is within the 8-10 range. |
|
8 |
10 |
Any |
11 |
Invalid |
Value is not within the 8-10 range. |
|
8 |
10 |
Any |
7 |
Invalid |
Value is not within the 8-10 range. |
|
8 |
10 |
Any |
11 |
Invalid |
Value is not within the 8-10 range because 2 leading and 2 trailing space characters are present. |
|
1 |
1 |
Any |
1 |
Valid |
Value has exactly 1 character. |
|
1 |
1 |
Any |
4 |
Invalid |
Value has more than 1 character. |
|
- |
10 |
Any |
5 |
Valid |
Value has less than 10 characters. |
|
8 |
- |
Any |
13 |
Valid |
Value has more than 8 characters. |
No value |
Any |
Any |
Yes (checked) |
0 |
Valid |
Valid because the Accept empty values option is checked. |
No value |
Any |
Any |
Yes (checked) |
0 |
Invalid |
Invalid because the Accept empty values option is unchecked. |
Remarks
-
If you want to ensure that there are no trailing or leading spaces prior to the validation, use the Trim step first.
Validate value range
The Validate value range step allows you to validate if values in a column match the specified value range. All values that fail the validation are marked as validation errors in the data set and rows with these errors are automatically rejected and included in the reject file when running the job. See Job Run Details for more information on reject files and fixing errors for our recommendations on how to deal with data errors.
Parameters
-
Input column: required, select a column to validate. You can select any column with types that can be sorted - string, decimal, integer as well as date columns.
-
Minimum value: optional[1], enter the minimum[2] value. If not specified, the minimum is not tested (i.e., there is no lower bound for the value).
-
Maximum value: optional[1], enter the maximum[2] value. If not specified, the maximum is not tested (i.e., there is no upper bound for the value).
-
Include minimum: configure whether the minimum is included in the range or not (i.e., whether the range is closed or open on the lower bound). When checked, the minimum value is included in the range - values larger than or equal to the minimum are valid. When unchecked, only values larger than the minimum are valid. By default, this is checked.
-
Include maximum: configure whether the maximum is included in the range or not (i.e., whether the range is closed or open on the upper bound). When checked, the maximum value is included in the range - values smaller than or equal to the maximum are valid. When unchecked, only values smaller than the maximum are valid. By default, this is checked.
-
Ignore case for string values: configure whether string comparisons are case-sensitive or not. If checked, the value comparison is not case-sensitive. By default, this option is checked.
-
Accept empty values: configure whether empty values (see working with empty values for more information) are considered valid or not. If checked, empty values are considered valid. This option is checked by default.
-
Error message: configure a custom error message that will be associated with invalid values. The error message is displayed when hovering over an invalid value in the data preview and is included as the reject reason in the reject file. The default error message is Value is out of allowed range.
When configuring Minimum value and Maximum value for the step, the following rules apply depending on the data type:
-
Integer columns: do not use any digit grouping symbol (i.e., write 1234 instead of 1,234).
-
Decimal columns: you must use period as the decimal separator and you must not use any digit grouping symbols (i.e., write 1234.56 and not 1,234.56).
-
Date columns: dates must be entered in one of the following formats:
-
yyyy-MM-dd HH:mm:ss
(e.g.,2023-08-01 08:09:07
) -
yyyy-MM-dd HH:mm
(e.g.,2023-08-01 08:09
) -
yyyy-M-d H:m:s
(e.g.,2023-8-1 8:9:7
) -
yyyy-M-d H:m
(e.g.,2023-8-1 8:9
) -
yyyy-M-d
(e.g.,2023-8-1
) - in this case, the date is considered as2023-08-01 0:00:00
-
Examples
The following examples show how the step can be used when validating string columns.
Input value | Min value | Max value | Include minimum | Include maximum | Ignore case | Accept empty values | Result | Description |
---|---|---|---|---|---|---|---|---|
|
INV30000 |
INV40000 |
Yes (checked) |
Yes (checked) |
Yes (checked) |
Any |
Valid |
Value is within the specified range. |
|
INV30000 |
INV40000 |
Yes (checked) |
Yes (checked) |
No (unchecked) |
Any |
Invalid |
Value is not valid because the Ignore case for string values option is unchecked. |
|
INV30000 |
INV40000 |
Yes (checked) |
Yes (checked) |
Yes (checked) |
Any |
Valid |
Value is within the specified range because the Include minimum option is checked. |
|
INV30000 |
INV40000 |
No (unchecked) |
Yes (checked) |
Yes (checked) |
Yes (checked) |
Invalid |
Value is not within the specified range because the Include minimum option is unchecked. |
|
INV30000 |
INV40000 |
Yes (checked) |
Yes (checked) |
Yes (checked) |
Any |
Valid |
Value is within the specified range because the Include maximum option is checked. |
|
INV30000 |
INV40000 |
Yes (checked) |
No (unchecked) |
Yes (checked) |
Any |
Invalid |
Value is not within the specified range because the Include maximum option is unchecked. |
The following table provides examples of the step usage for numeric columns - decimals or integers.
Input value | Min value | Max value | Include minimum | Include maximum | Ignore case | Accept empty values | Result | Description |
---|---|---|---|---|---|---|---|---|
|
8.50 |
14.50 |
Yes (checked) |
Yes (checked) |
Any |
Any |
Valid |
Value is within the specified range. |
|
8.50 |
14.50 |
Yes (checked) |
Yes (checked) |
Any |
Any |
Valid |
Value is within the specified range because the Include minimum option is checked. |
|
8.50 |
14.50 |
No (unchecked) |
Yes (checked) |
Any |
Any |
Invalid |
Value is not within the specified range because the Include minimum option is unchecked. |
|
8 |
14 |
Yes (checked) |
Yes (checked) |
Any |
Any |
Valid |
Value is within the specified range because the Include maximum option is checked. |
|
8 |
14 |
Yes (checked) |
No (unchecked) |
Any |
Any |
Invalid |
Value is not within the specified range because the Include maximum option is unchecked. |
The following table provides examples of how to use the step to validate date columns.
Input value | Min value | Max value | Include minimum | Include maximum | Ignore case | Accept empty values | Result | Description |
---|---|---|---|---|---|---|---|---|
|
2023-01-01 00:00:00 |
2023-07-01 00:00:00 |
Yes (checked) |
Yes (checked) |
Any |
Any |
Valid |
Value is within the specified range. |
|
2023-01-01 00:00:00 |
2023-07-01 00:00:00 |
Yes (checked) |
Yes (checked) |
Any |
Any |
Valid |
Value is within the specified range because the Include minimum option is checked. |
|
2023-01-01 00:00:00 |
2023-07-01 00:00:00 |
No (unchecked) |
Yes (checked) |
Any |
Any |
Invalid |
Value is not within the specified range because the Include minimum option is unchecked. |
|
2023-01-01 00:00:00 |
2023-07-01 00:00:00 |
Yes (checked) |
Yes (checked) |
Any |
Any |
Valid |
Value is within the specified range because the Include maximum option is checked. |
|
2023-01-01 00:00:00 |
2023-07-01 00:00:00 |
Yes (checked) |
No (unchecked) |
Any |
Any |
Invalid |
Value is not within the specified range because the Include maximum option is unchecked. |
Input value | Min value | Max value | Include minimum | Include maximum | Ignore case | Accept empty values | Result | Description |
---|---|---|---|---|---|---|---|---|
No value |
Any |
Any |
Any |
Any |
Any |
Yes (checked) |
Valid |
Valid because the Accept empty values option is checked. |
No value |
Any |
Any |
Any |
Any |
Any |
No (unchecked) |
Invalid |
Invalid because the Accept empty values option is unchecked. |
Validate with formula
The Validate with formula step allows you to validate values in a column based on result of validation formula. If the validation formula returns true, the value is considered valid. Otherwise, the value is considered invalid, and the step allows you to configure a custom error message that will be reported in the data preview.
Rows with these errors are automatically rejected and included in the reject file when running the job. See Job Run Details for more information on reject files, and refer to fixing errors for our recommendations on how to deal with data errors.
Parameters
-
Formula: the validation formula that is calculated for each row in your data set. The formula can use any number of columns - it does not have to be limited to just one column in your data. The formula must return
true
orfalse
. -
Error message: an error message that will be reported on each column that is contained within the formula if the formula returned
false
. The error message is a static text - it cannot contain references to column names or any calculations.
Usage
The Validate with formula step works by evaluating the validation formula for each row in your data set. Based on the result (whether the formula returns true
or false
) it can mark values in a row as invalid.
The values are considered invalid if the formula returned false
. In such case, values from all columns that are referenced in the formula will be marked as invalid.
See below for additional examples of how to use the step to validate your data in common situations.
Note that the Validate with formula step never changes your data in any way - it can attach a validation error to a value but cannot change the value itself.
In many cases the validation rules you’d need for your data will be quite simple. This section provides overview of the most common types of validation and corresponding validation formulas. You can use these as guidelines for writing your own formulas for similar validations.
Frequently you’ll need to validate whether a column value is not empty - i.e., the value is required. The formula depends on the type of your column:
-
numeric columns (decimals or integers):
$amount != null
, same way for integer as well as decimal columns. -
string columns: multiple different formulas exist depending on what is considered to be an empty string:
-
!isBlank($name)
usesisBlank
function to determine if the value is a zero-length string,null
or if the string is composed entirely of whitespaces.isBlank
also considers strings consisting of whitespaces to be empty - for example, a single space will be considered an empty string, too. -
$name != null
tests fornull
values only. Zero-length strings will be considered valid. -
$name != ""
tests zero-length strings only,null
values will be considered valid. -
$name != "" && $name != null
tests whether the$name
is zero-length ornull
. Any other string (including whitespaces) is considered valid.
-
-
date columns:
$dueDate != null
-
boolean columns:
$isActive != null
You can use conditional expressions to test whether value is in specific range - whether the range is bounded on the lower bound only, upper bound only or both.
-
$amount > 0
to validate that the value is greater than 0. Note that this will fail if the value isnull
(empty). To reject empty values as well as values that are less than zero, you’d need to use a formula like$amount != null && $amount > 0
. -
$yearOfBirth >= 1900 && $yearOfBirth <= 2023
to test whether$yearOfBirth
is between 1900 and 2023 (including both endpoints).
To match against patterns, regular expressions can be used like this:
-
$accountId ~ "A[0-9]+"
will validate that the account id consists of upper case letter A followed by at least one digit. -
matches($accountId, "A[0-9]+")
this is an alternative way - usingmatches
function instead of~
operator. The result is the same as in the previous example. -
matches($zipCode, "[0-9]{5}(-[0-9]{4})?")
to verify that$zipCode
is either 5-digit format or 5+4 digits.
Often times you need to ensure that values are long enough or not too long. This can be accomplished by using length
function:
-
length($lastName) > 1
to ensure that$lastName
is at least one character long. -
length($lastName) < 50
to ensure that$lastName
is 50 characters at maximum.
To test whether value is in a list or not, you can use containsValue
function. This works for all data types although string values are most common.
-
containsValue(["red", "green", "blue"], $color)
test whether the$color
is one of "red", "green" or "blue". The test is exact - i.e. "RED" is considered invalid. -
containsValue(["red", "green", "blue"], lowerCase($color))
is similar to above but also will allow values like "RED" as valid values. The value of$color
is first converted to lower-case usinglowerCase
function before it is tested against the list of allowed values. -
containsValue([1, 2, 3], $accountType)
validates whether$accountType
is one of 1, 2 or 3. Note the list does not contain quotes - the values are integers and must not be quoted.
If you wish to test value against a large list (more than handful of items), it is better to use Lookup step. Lookup step allows you to store even larger number of items to test against in a file or the values can be provided by a connector available in the Data Catalog. |
The validation expression can be of any complexity. We recommend that you split the expression into multiple steps if it becomes too long or too complicated - this will improve the readability of your transformation.
It is possible to use more than one column when validating your data. For example, consider an invoice where you have columns for total without tax, tax amount and total amount with tax. All three values must match for the invoice to be considered valid. You can validate this with a formula like $totalBeforeTax + $tax == $totalWithTax
. If this formula returns false
, all three values will be marked as invalid since without additional information it is not possible to say which of the values is valid or not (i.e., is the tax calculated incorrectly and total amount is ok? Or is the total amount incorrect while the tax and total before tax are ok?).
Remarks
-
If the calculation of a formula in the validation step failed for any reason, the value is marked as "in error" with the error message describing the validation formula failure.
-
If the validation formula returns
null
, an error message is generated and the value is marked as invalid. -
Running validation on an Error value does not try to validate the value and instead produces an Error right away.