Transforming data

End user’s guide > Wrangler user guide > Transforming data

Wrangler allows you to build data transformations in an interactive way without having to learn coding. As you work with your data, you will be able to see real-time preview of your transformation on a data sample. Data sample is selected from the beginning of your data set as provided by the connector. By default, first 1000 rows are included in the sample.

My Jobs screen

My Jobs screen is the "front page" of the Wrangler - it is a place which allows you to quickly see which jobs you have in your workspace, run them, view their results and more.

Figure 37. My Jobs page showing jobs and their history.

To learn more about various statuses and how to work with reject files, see the Running jobs in Wrangler section below.

Creating your job

To create a transformation, click on the Create new job button on the My Jobs page or Use in a new job button in the Data Catalog or My Sources.

When using the Create new job button, you’ll first have to pick your data source. You can either pick an existing data source that you have already added to My Sources, or you can drop a new file at the top of your screen.

Figure 38. Selecting a data source when creating a new job

Regardless of how you create your data source, Wrangler will show you preview of the data, and you’ll be able to continue to create your transformation by pressing the Select button in the top-right corner.

Duplicate job

To create a copy of an existing job use the Duplicate job action, which is available under the ⋮ menu on the right side of each job on the My Jobs page. The duplicated job is created with the same source, transformation steps, and target configuration as the original job. The new job is added right above the original job.

Job Export and Import

To share jobs between users, jobs can be exported and imported. To export a job, use the Export job action, which is available under the ⋮ menu on the right side of each job on the My Jobs page.

Figure 39. Job export dialog.

You can optionally select to include the data source and lookup files (this option applies to CSV file sources only). The export will generate a single ZIP file.

To import a job use the Import job button in the top-right corner of the My Jobs page. Drag and drop or upload a job ZIP file and optionally select if you want to overwrite the job if it already exists. The imported job will be added to the bottom of the job list.

Figure 40. Job import dialog.

Test run

The Test run option is available when your job writes data into a data target connector. Use this option to perform a trial run to see how many records would be written to the target interaface and how many would be rejected. If your target is configured to generate a reject file, you can also download the reject file to review further reject details. For more information on reject files, see Job run details. Note that only values from mapped columns are included in the number of rejected records and in the related reject file. For more information on target mapping, refer to the Target mapping section.

Working with transformations

When working with data in Wrangler you will be creating jobs that define transformations to apply to your data. Data transformations are edited in an interactive way in transformation editor screen which shows you large preview of your data and allows you to quickly add, modify or remove transformation steps. Each step defines an operation with your data set - for example "add column", "sort data", "calculate formula" and so on.

Figure 41. Transformation editor screen.

Transformation editor screen consists of several parts that allow you navigate and work with your data:

Toolbar with actions that change depending on the selected column and other context. Toolbar allows quick access to transformation steps you can apply to your data.
Data preview is the largest part of the screen. It shows you what your data looks like after each step. The preview will change dynamically depending on the steps you add to your transformation. The preview table has a header which provides useful information about your data:
Job overview diagram at the top of the screen. It provides quick overview of what your job looks like. Clicking on Source or Target icons allows you to change settings of your data source or your data target.
Steps sidebar shows you the steps you’ve added to your transformation so far. Clicking on different steps in the sidebar allows you to move to different part of your transformation and review what your data looks like after given transformation step.

Data quality bar

The Data quality bar is displayed in the data header preview right under the technical column name. To display it, hover over a column name. It shows you how many values in the sample are valid, invalid or empty. It can help you quickly estimate quality of your data.

Using Steps sidebar

Steps sidebar shows you the operations that are applied to your data starting with the first step at the top going down towards steps with higher number.

Selecting a step in the sidebar will update data preview to show the data as it looks like after the selected step is performed. This way, you can go to different time in your transformation simply by selecting different steps in the sidebar.

You can add steps to your job in few different ways - by selecting them from a column menu, from the toolbar, or by clicking the Add step button in the Steps sidebar.

Both context menu and the toolbar react to currently selected column and may change the steps that you see there. The step list displayed when you click on Add step button will show all steps and will allow you to search based on a step name or its description so it is the best option if you are not sure which step to use.

You can add steps to your job in few different ways - by selecting them for column’s context menu or by clicking Add step button in the Steps sidebar.

For example, renaming a column can be added in following ways that will all lead to the same result:

Right-click on a column (or open the menu by clicking on the three dots in column header) and select Column actions > Rename column step from the context menu.
Click on the column in data preview to select it and then on toolbar select Column actions (3^rd icon) > Rename column.
Click on the Add step button in the Steps sidebar and find the Rename column step in the list of available steps. You can use search at the top to search steps based on their name or description.

Once a step is added, you will immediately see your data change and the preview will show what your data looks like after the transformation step is applied. You can go to previous steps simply by selecting them from the step list in the sidebar.

Wrangler supports many different types of steps that allow you to manipulate your data. For a complete list, please refer to steps reference.

Besides steps you can also add notes that do not apply any transformation to your data but can be used to add explanations to the list of steps to help anyone reading the transformation to understand it better.

Reorder steps

You can reorder a step, note or group by drag and dropping it. You can also move steps within a group or drag them outside of the group for further organization. To revert the changes, use the Undo button in the left upper corner.

Reordering of steps might result in invalid step configuration. Consider the sequence of the steps and step dependencies. Some steps might need to be adjusted after moving them into a different position.

In the example below one step renames a column to "Date1", and the second step adds a new column after it. If these two steps are switched, the first step fails because the column referenced in it is no longer valid. The step needs to be updated and the reference corrected.

Organize steps with groups

You can enhance your transformation workflow by organizing steps into groups.

You can drag and drop existing steps into a group or directly add new ones within it.
Steps within a group execute sequentially, similar to ungrouped steps. For a cleaner view, collapse groups by clicking the Up arrow next to the group name.
To quickly remove steps from a group, click on the three-dot menu on the right and select Ungroup. This removes the group and all steps are moved back to the main step outline.
You can set a single condition for the entire group, affecting all the steps it contains. This streamlines your workflow by managing step execution collectively. To learn more about enabling and using group conditions, see Enabling group conditions.

When a group is deleted, all assigned steps are removed as well. If you want to keep all assigned steps, use the Ungroup feature.

Step options

Each step offers multiple options that are shown next to it when you hover your mouse over the step or when the step is selected:

Figure 42. Contracted Steps options

Figure 43. Steps options on high resolution screens or when the Steps panel width is adjusted

Edit a step (pencil icon) will open a step editor in the sidebar. Layout of the editor depends on the step you selected.
Disable a step (stop sign icon) allows you to disable the step so that it no longer applies to your data. Note that disabling a step may invalidate the rest of your transformation - for example if you disable a step that adds a column which is used later in the transformation. Disabled step shows in gray like this:
Duplicate a step (copy icon) creates a copy of the step and places it right under the original step. Disabled steps can be duplicated as well.
Delete a step (trash can icon) allows you to delete the step from the list of steps. Just like when disabling a step, deleting a step in the middle of the transformation can make the rest of the transformation invalid if it depends on the step result in any way.
Set condition (option is available in the three-dot menu) allows you to set step conditions. For more information, see Step and group conditions.

Data types in Wrangler

Each column in Wrangler has its data type that defines how the data in that column is stored and what kind of operations can be applied to it. Data types are visible in the header of data preview:

Figure 44. Transformation editor screen showing columns of different types.

Wrangler supports 5 data types: integer, decimal, string, date and boolean. Conversions between different data types can be done using Data conversion steps.

Integer data type

Integer columns can store whole numbers. Smallest integer value is -9223372036854775808 while the largest integer number is 9223372036854775807 (i.e., Wrangler’s integers use 64 bits and correspond to CTL long data type).

Decimal data type

Decimal columns store decimal numbers with fixed precision of 32.10 - i.e., they can have 32 significant digits with 10 of those digits after a decimal point. The smallest decimal value is -9999999999999999999999.9999999999 and the largest value is 9999999999999999999999.9999999999.

You can store up to 10 decimal places in decimal columns. Decimal places after 10th decimal digit will be ignored - i.e. writing 1.1234567890555 will result in 1.123456789.

Decimal in Wrangler corresponds to CTL type decimal with length set to 32 and scale set to 10.

Date data type

Date columns in Wrangler store both date and time - there are no separate types to store just the date or just the time. The dates are stored with millisecond precision and you can use date formatting options to configure what the dates look like.

Date columns can store any date between (approximately) 290 million years ago to 290 million years in the future.

When entering dates in formulas or in target mapping, dates need to be entered in a specific format. See Working with dates for more information.

String data type

String columns represent text data in Wrangler. They can store text in any language (UTF-16 is used to store the data). You can store strings up to 2.1 billion characters long (less for complex languages like Chinese or Japanese).

Data containing multi-line strings will be displayed with a preview of the first line followed by ellipses (…). To view the full content, simply hover your cursor over the value.

Figure 45. Example of a multi-line string

Boolean data type

Boolean columns store results of logical expressions and can store a "yes" or "no" value (shown as true or false).

Empty values

Besides storing values, each column in Wrangler can also store empty value (sometimes called null values). Empty values are shown as No value in the data preview:

Figure 46. Sample data preview showing display of empty values.

Note the String column which shows two different kinds of empty value - a null (shown as No value) and empty string (shown as cell with no text inside it).

Empty values must be handled separately if you’d like to perform operations on them. Steps in Wrangler will usually ignore empty values but in some cases may return errors. See documentation for each step to see how it handles empty values.

Formatting your data

When data preview is displayed in Wrangler, each column can have its own display format that defines how the data is shown. Depending on the source, initial format of the column is either provided by the source (in case the data comes from a connector) or is detected by Wrangler (in case of CSV files).

The formatting you select for a column will affect how your data is written to the output file. For CSV files, the output will match the formatting exactly. For Excel data target, the formatting will be configured on given column in the spreadsheet so the format will be the same as well even though the actual data written may have greater precision. See more details in Excel target section.

The formatting options are accessible by selecting Change display format from the column context menu. Note that this option is not available for string and boolean columns - it only applies to date, integer and decimal columns.

Integer formatting

Integer columns offer the following options for formatting:

Format defines the pattern that tells the formatter how many digits and how to write them.

Thousands separator allows you to select character to use between groups when the pattern specifies a format that groups digits (e.g., it allows you to configure whether to write larger numbers as "1 234" or "1,234").

Pattern uses a set of special characters that are interpreted by the formatter:

Character Meaning

Character	Meaning
`0`	A digit. Leading zero is written out.
`#`	A digit. Leading zero is omitted.
`,` (comma)	Grouping separator
`-` (dash, minus)	Minus sign

0

A digit. Leading zero is written out.

#

A digit. Leading zero is omitted.

, (comma)

Grouping separator

- (dash, minus)

Minus sign

Examples: consider numbers -1234 and 1234 and the following patterns with English (United states) setting for Thousands separator (i.e., a comma is used as separator character):

Pattern	Output for 1234	Output for -1234
(no specific format)	1234	-1234
`#,###`	1,234	-1,234
`000,000`	001,234	-001,234
`$###,##0`	$1,234	-$1,234
`$###,##0;($###,##0)`	$1,234	($1,234)

For more information about number formatting options see Numeric formats.

Decimal formatting

Decimal columns offer following formatting options:

Format defines the pattern that tells the formatter how many digits and how to write them. Thousands separator / Decimal mark parameter allows you to select character to use between groups and as a decimal separator.

Pattern uses a set of special characters that are interpreted by the formatter:

Character Meaning

Character	Meaning
`0`	A digit. Leading zero before decimal point or trailing zero after decimal point is written out.
`#`	A digit. Leading zero before decimal point or trailing zero after decimal point is omitted.
`.` (period)	Decimal separator.
`,` (comma)	Grouping separator
`-` (dash, minus)	Minus sign

0

A digit. Leading zero before decimal point or trailing zero after decimal point is written out.

#

A digit. Leading zero before decimal point or trailing zero after decimal point is omitted.

. (period)

Decimal separator.

, (comma)

Grouping separator

- (dash, minus)

Minus sign

Examples: let’s consider two examples: numbers 1234.05 and -1234.05 with language set to English (United states):

Pattern	Output for 1234.05	Output for -1234.05
(no specific pattern)	1234.05	1234.05
`#.#`	1234	-1234
`#.###`	1234.05	-1234.05
`#.000`	1234.050	-1234.050
`###,##0.00`	1,243.05	-1,234.05
`$####,##0.00;($####,##0.00)`	$1,234.05	($1,234.05)

For more information about number formatting options see Numeric formats.

Date formatting

Date columns provide three parameters you can configure:

Format allows you to configure pattern that can be used to format date and time elements, Language allows you to select language and Timezone provides options to configure the timezone in which the dates will be interpreted.

Many different special characters are allowed in the formatting pattern. Following table provides a summary of the most common ones:

Pattern character Meaning

Pattern character	Meaning
`y`	year
`M`	month, value from 1 to 12
`d`	day, value from 1 to 31
`H`	hour (24-hour format), value from 0 to 23
`h`	hour (12-hour format), value from 1 to 12
`m`	minute, value from 0 to 59
`s` (lower case)	second, value from 0 to 59
`S` (upper case)	millisecond, value from 0 to 999
`a`	AM/PM marker

y

year

M

month, value from 1 to 12

d

day, value from 1 to 31

H

hour (24-hour format), value from 0 to 23

h

hour (12-hour format), value from 1 to 12

m

minute, value from 0 to 59

s (lower case)

second, value from 0 to 59

S (upper case)

millisecond, value from 0 to 999

a

AM/PM marker

Using a special character multiple times in a row (like MM) will left pad the numeric values with zero. For date elements that have names (like months), using three or more characters will spell out the whole value in selected language (e.g., for February and US English locale, MMM will produce value "Feb" while MMMM will produce "February").

See more details in For more information about number formatting options see Date and time formats.

Examples: let’s imagine date/time value of 27th February 2023, 16:35:50.456:

Pattern Output

Pattern	Output
`yyyy-MM-dd`	2023-02-27
`M/d/yyyy`	2/27/2023
`MMM d, yyyy`	Feb 27, 2023 (English (United States) language)
`MMM d, yyyy`	févr. 27, 2023 (French (France) language)
`HH:mm:ss`	16:35:50
`hh:mm:ss a`	04:35:50 PM
`yyyy-MM-dd HH:mm:ss`	2023-02-27 16:35:50
`yyyy-MM-dd hh:mm:ss a`	2023-02-27 04:35:50 PM
`yyyy-MM-dd HH:mm:ss.SSS`	2023-02-27 16:35:50.456
`yyyy-MM-dd’T’HH:mm:ss.SSSXXX`	2023-02-27T16:35:50.456+01:00 (English (United States) language, +1:00 time zone)

yyyy-MM-dd

2023-02-27

M/d/yyyy

2/27/2023

MMM d, yyyy

Feb 27, 2023
(English (United States) language)

MMM d, yyyy

févr. 27, 2023
(French (France) language)

HH:mm:ss

16:35:50

hh:mm:ss a

04:35:50 PM

yyyy-MM-dd HH:mm:ss

2023-02-27 16:35:50

yyyy-MM-dd hh:mm:ss a

2023-02-27 04:35:50 PM

yyyy-MM-dd HH:mm:ss.SSS

2023-02-27 16:35:50.456

yyyy-MM-dd’T’HH:mm:ss.SSSXXX

2023-02-27T16:35:50.456+01:00
(English (United States) language, +1:00 time zone)

For more information about possible patterns and formatting options see full documentation in Date and time formats.

Using formulas

CloverDX Wrangler offers powerful formula functionality to manipulate and analyze your data. Formulas can be be used in the following places:

Transformation steps:

The following transformation steps can be used to perform a wide range of data manipulation tasks through the use of formulas.

Calculate formula: This step allows you to perform calculations involving existing columns and constants. You can leverage various built-in functions for mathematical operations, string manipulation, date calculations, and more.
Filter rows based on formula: This step enables you to filter your data set by defining a formula. Rows where the formula evaluates to true are retained, while others are excluded. This is useful for isolating specific data subsets based on conditions.
Replace errors: This step lets you replace erroneous values in a column with a user-defined value or the result of a formula. You can use conditional logic within the formula to replace errors selectively based on specific criteria.
Validate with formula: This step allows you to define a formula that validates the data in a specific column. If the formula evaluates to false for any row, an error message is generated, indicating a data quality issue.

Step and group conditions

Formulas are also used to add conditions to your data transformations. For more information, see Step and group conditions.

Referencing data set columns in formulas

You’ll often need to reference various columns from your data set in your formulas. To refer to a column, use its technical name, an identifier (unique name) starting with a dollar sign ($).

Technical column names cannot contain special characters like spaces, commas, parentheses and so on. Wrangler creates them for you from column names by removing those characters and replacing them with underscore. Technical column names are case-sensitive - a column called $AMOUNT is different from $amount.

If you type a wrong column name (a name that does not exist in your data set), you’ll get an error like this:

Figure 47. Incorrect technical name in Formula editor

You can easily find and copy the technical name of a column by hovering your mouse over the column header in the data preview and clicking on Copy. Alternatively, you can take advantage of the autocomplete feature to quickly display the list of all available technical names or formulas.

Figure 48. How to find and copy a technical name

Autocomplete feature in Formula editor

Take advatage of the autocomplete feature in Formula editor to help you quickly find the desired column names or functions:

As you type the $ sign, a list of all existing data set preview columns appears for your selection.
While writing formulas, autocomplete suggests available functions as you type. By selecting functions from the autocomplete hints, functions are inserted with their required parameters to provide a clear view of what needs to be entered.
You can also display a list of all available functions and technical column names using the CTRL + Space shortcut.

Quick example

To add 20% VAT to a price column, you first check the technical name of the column by hovering mouse over the Price (no VAT) column (technical name is $Price_no_VAT) and then use the Calculate formula step with the following formula:

$Price_no_VAT * 1.2

The result might look like this:

Mathematical operators

You can use several operators when working with your formulas to compute values or to work with text data:

Operator Description

Operator	Description
`+`	Add two numbers, concatenate string columns
`-`	Subtraction
`*`	Multiplication
`/`	Division
`%`	Modulus

+

Add two numbers, concatenate string columns

-

Subtraction

*

Multiplication

/

Division

%

Modulus

Operator priorities work as expected - multiplication, division and division remainder have higher priority than addition or subtraction. You can also use parentheses to group your operations.

Examples:

Convert from one currency to another while also applying a fixed conversion fee: $originalAmount * $exchangeRate + $fixedFee
Add VAT to price with variable VAT in percent: $price * (1.0 + $vatPercent / 100.0)

The operators all work on all numeric types - integers as well as decimals. The only exception is the + operator which also works on string and can be used to concatenate strings like this:

$firstName + " " + $lastName

Note that it is also possible to concatenate string with another type (e.g., "VAT is " + $vatPercent) but you need to be careful about data formatting. Default formatting will be applied when converting to string. For greater control of the formatting you can use CTL string conversion functions like date2str and so on (the functions you’d most likely use all end with 2str suffix).

Comparison operators

If you’d like to compare various values when writing conditions in Filter rows based on formula step, you can use the following operators:

Operator Description

Operator	Description
`<`	Less than
`>`	Greater than
`<=` or `=<`	Less than or equal to
`>=` or `=>`	Greater than or equal to
`==`	Equal to
`!=` or `<>`	Does not equal to
`~=`	Matches regular expression
`?=`	Contains regular expression

<

Less than

>

Greater than

<= or =<

Less than or equal to

>= or =>

Greater than or equal to

==

Equal to

!= or <>

Does not equal to

~=

Matches regular expression

?=

Contains regular expression

These operators can be applied to any data type but both sides of the comparison must be of similar or same data type - i.e., you can compare integer and decimal but you cannot directly compare integer and string.

The operators always return boolean value so they can be used directly as a condition.

Note that the last two operators can only be applied to string values.

Examples:

Test if payment type is "CREDIT": $paymentType == "CREDIT"
Test if a person is at least 18 years old: $age >= 18
Test if a date occurred before 1st March 2023: $dateValue < 2023-03-01
Test if string value is a number followed by a dash and a single letter (e.g. "12345-A" but not "12345-AB"): $value ~= "[0-9]+-[A-Z]"

Most common use for these operators is in the Filter rows based on formula step but they can also be used in Calculate formula step.

Logical operators

Logical operators allow you to connect multiple conditions together to create more complex conditions. They can only be applied to boolean values and will produce another boolean value.

Operator Alternative form

Operator	Alternative form
`&&`	`AND`
`\|\|`	`OR`
`!`	`NOT`

&&

AND

||

OR

!

NOT

Examples:

Test if a row represents an oversized package - a package where one of the dimensions (width, height, length) is larger than 100: $width > 100 || $height > 100 || $length > 100
Test if contact has a phone number (either work or cell phone) and an email: ($workPhone != null || $cellPhone != null) && $email != null

These operators are most commonly used in Filter rows based on formula step where they are useful when defining more complex conditions.

Error handling in Wrangler

There are several types of errors you can encounter when working with data in Wrangler - incorrectly formatted data, values rejected by validation, errors caused by incorrect step configuration, runtime errors and so on. In all cases, Wrangler tries to help you by allowing you to continue with your job as far as possible so that you can try to fix the errors when it makes sense.

Most of your errors will be displayed right in the data preview when you are designing your job. The errors are shown as red cells, hovering over them will show you a tooltip with additional information about the error.

The errors are also shown in row number column. This is useful especially if an error happens to be outside of your current view if you have data set with many columns.

Similarly, the errors are also shown as red markers in the background of the scrollbar in the data preview. This can be quickly used to see distribution of errors within your sample.

Data Quality bar in the column header also reflects the quality of your data - it will show how many records from your sample are valid (green), empty (grey) or in error (red):

Errors in data

Your data may not always match the expected format. For example, you may get text in numeric column, wrongly formatted dates etc. Wrangler can detect errors like these and will allow you to work with the data even if it does not match the expected data type or format. Additionally, you can use Validation steps to ensure that your data matches your expectations and add customized error messages to data that does not meet those expectations.

If a row which contains any kind of an error gets through the transformation all the way to the data target, it will be rejected. Rejected records are collected in a file that allows you to inspect error details even in large data sets. See below for additional details about reject file format.

Figure 49. Invalid data shows up as red cells in the data preview.

For data errors you can see that the value read from source is retained even if it does not fit into expected format - for example, the Date paid column in the above screenshot contains values like n/a or incorrectly formatted dates. Wrangler allows you to access the original (invalid) value in variety of steps so that you can attempt to fix the issue in your transformation.

To learn how to fix errors in data, read more in Fixing errors and Fixing data errors sections below.

Validation errors

Some values might be invalid even if they are of correct type and have correct format. For example, age of a person cannot be negative number, invoice amount cannot be empty and so on. Wrangler offers several Validation steps that allow you to validate your your data and flag values that do not match the requirements for given column.

All values that are flagged as invalid - validation errors - are shown just like any other error in Wrangler with red cells in data preview.

Figure 50. Validation errors reported by validation steps - Validate phone number step in this example.

Errors occurring during step execution

These errors happen when you run a step and it encounters a value it cannot handle. For example, this can happen if you try to apply mathematical step to an empty value, if you divide by zero, or use invalid length in string function etc.

Just as before, Wrangler will show the errors in the data preview and will allow you to work with all other records even if some data is in error.

Figure 51. Errors from step execution in data preview.

Notice that these errors do not show any value in the cell since the calculation of the new value could not finish. Instead, they are shown as Error. To fix this type of error, you usually need to find the step that causes it and either fix the data before that step so that the steps does not fail anymore or modify the step settings.

To learn how to fix errors raised during step execution, read more in Fixing errors and Fixing step execution errors sections below.

Step configuration errors

These errors happen during design time and are detected by Wrangler as you work on your transformation. These errors generally mean that the step cannot be executed at all and Wrangler transformation cannot progress past this step.

For example, the following screenshot shows a step configured to use a column that is no longer part of the data set:

Figure 52. Step configuration error that prevent the step from running.

To learn how to fix these errors, read more in Fixing step configuration errors section.

Errors and output

Since Wrangler allows you to continue working with your data even if there are errors in your data, it may happen that some rows that contain errors can get all the way to the end of the transformation into the data target.

Since data targets may not be able to write the malformed or erroneous data into their output, they allow you to configure what to do with rows that have errors. You can pick from two options:

Write rows with errors to reject file (default value): if selected, all rows that contain errors are written to a reject file. Reject file allows you to see the rejected records as well as information about errors that caused those records to be rejected. You can then see number of records that were rejected on My Jobs page. To quickly see that your target has been configured to produce a reject file by looking at the target in the step list:
Fail job on the first invalid cell: when an error is encountered when writing the file, the job is stopped and the error will be reported on My Jobs page. This is shown in the step list like this:

Both of the options above are available in the target configuration dialog. Read more about this in CSV data target configuration and Excel data target configuration.

Figure 53. Sample reject file in Microsoft Excel format showing error message as well as rejected records.

Fixing errors

Wrangler offers several steps that can help you fix various errors:

Clear error cells step to remove data from cells with errors.
Replace errors step to replace errors with constant value or with result of a formula.
Calculate formula step to calculate or fix values before they cause issues in the transformation.

In some cases you might want to remove rows which contain errors from your process. This can be done either at the end of the process by the data target as described above or by using filtering steps:

Fixing data errors

These errors can often be prevented by changing the column type to a more permissive type (e.g., integer to decimal etc.).

If changing type is not viable, you can attempt to parse or interpret the original value in some other way to recover from the error. This is possible since Wrangler keeps the original invalid value and allows you to access that value in error fixing steps like Replace errors step.

For example, common error is that you receive dates with incorrect format like in the Date paid column on the following screenshot:

Figure 54. Original data set with errors in Date paid column.

To fix these, you can use Replace errors step and parse the incorrectly formatted date with different format by using a formula str2date($Date_paid, "d.M.yyyy"). This will result in the following output where the incorrectly formatted dates are fixed:

replace errors fix format example after step1

Figure 55. Data set after incorrectly formatted dates have been fixed with Replace errors step.

Fixing step execution errors

One of the most common ways of getting step execution errors is to use Calculate formula step and forgetting to deal with an edge case in the data. Typically this means not handling No value cells in your formulas or values that are outside of the range of valid inputs for your formula.

For example, math formulas typically do not handle null (No value). Consider a simple formula like this:

$amount * (1 + $vatPercent / 100)

This formula applies VAT specified in $vatPercent column to amount specified in $amount column. If either of those values is null, the formula will fail.

To fix this, you can either decide not to run the step at all and return a default value. To do that, you can use if function. The following formula tests for null values of its inputs and only runs if the inputs are not null:

if($amount != null AND $vatPercent != null, $amount * (1 + $vatPercent / 100))

Alternatively, we may decide that if there is no VAT, we use a default value such as 21% (we are not handling null value of $amount column in this formula):

$amount * (1 + nvl($vatPercent, 21) / 100)

Another option for dealing with such errors is to fix the data before it gets to the step that fails.

For example, to get the same effect as the the last formula above, you can use Replace empty values step configured to set $vatPercent to 21 if it is empty. The computation of the amount with VAT can then remain in its original form since that step will guarantee that there are no null values in the $vatPercent column.

If the above options are not possible, rows with errors can be removed from the data set by using filtering steps like Remove rows with errors or Filter rows based on formula.

Fixing step configuration errors

These errors must be fixed in the step that is failing. Wrangler will run your job as far as it can and will stop on the first step that failed. For example, consider the following job:

In this job we are trying to sort data based on column that is no longer available in the data set. This can happen for example if your data source is a CSV file and it no longer contains the column you used or if the column has been renamed.

You can see that the Wrangler will try to run the job as far as it can and stops on the invalid step. The only way to fix this is to either disable the step (or even remove it) or update the step settings to use a different column (if the column was renamed in the source data set).

Running jobs in Wrangler

There are two ways to run your job from Wrangler: either from the My Jobs page via the Run button on each job, or directly from the transformation editor via the Run job button in the top-right corner.

When you run a job, it will read the complete data source (CSV file or a connector-provided data source), so it may take a while if you are processing a large volume of data.

Viewing job status

The My Jobs page will show you the status of your jobs - it shows each job on a single row and displays status of the last run of the job.

Figure 56. My Jobs page showing job statuses.

Jobs can have one of the following statuses:

Status	Status icon	Description
Not run yet		Job did not run yet. To run it, you can press the Run button.
Running		Job is currently running.
Completed		Job finished successfully and produced an output you can download on My Jobs page.
Completed with rejected records		Job finished successfully but there were rejected rows which are not included in the output.
Failed		Job failed during execution. Clicking on the job row on My Jobs table will show you more details about the job failure. No output is available for failed jobs.
Stopped		Job was stopped by user (e.g., by clicking a Stop button) and did not finish. No output is available for aborted jobs.
Test run		Job was run with the Test run option. Test run is available when writing data into data target connectors.

Status

Status icon

Description

Not run yet

Job did not run yet. To run it, you can press the Run button.

Running

Job is currently running.

Completed

Job finished successfully and produced an output you can download on My Jobs page.

Completed with rejected records

Job finished successfully but there were rejected rows which are not included in the output.

Failed

Job failed during execution. Clicking on the job row on My Jobs table will show you more details about the job failure. No output is available for failed jobs.

Stopped

Job was stopped by user (e.g., by clicking a Stop button) and did not finish. No output is available for aborted jobs.

Test run

Job was run with the Test run option. Test run is available when writing data into data target connectors.

After the job finished successfully, you will be able to download its results directly from My Jobs page by clicking on the Download result button. This will download the output file - Excel or CSV - to your computer.

Job run details

For jobs that produced reject files or ended in an error, you can view additional details in the Last run status dialog. To open the dialog, click on the rejected rows status message (orange text) or on the failed status (red text) on the My Jobs page.

Figure 57. My Jobs table showing two jobs - a failed one and a job that produced a reject file.

The dialog allows you to download result as well as reject files via Download result and Download reject file buttons in the left part of the dialog.

In the right half of the dialog, you can see additional details about the errors that occurred during the job run. The overview table supports two different ways of showing th error summary:

Per step groups the errors based on the step which raised those errors. I.e., for each step that produced an error, there will be a single row in this table and it will show number of errors in the reject file that were produced by given step. This view is useful to see which parts of your job may need to be updated to handle invalid data or edge cases in your data.

Figure 58. Last run detail of a job with reject file showing Per step error summary.
Per column shows number of errors in each column of the data. This view is useful to see whether given column frequently contains invalid data or not which can help to decide whether to update the job to handle the data in a better way or whether to go to the source system and request data to be fixed there.

Figure 59. Last run detail of a job with reject file showing Per column error summary.

If the job ended in an error, the dialog will allow you to see additional details about the error:

Figure 60. Last run detail of a failed job.

How does Wrangler expression language differ from CTL?

Wrangler expression language is heavily inspired by CTL but it is not the same. There are several key differences that we’ll describe below.

Data types

Selection of data types available to use in Wrangler is more limited than types in CTL and in metadata.

The following table summarizes the types supported in Wrangler columns compared to CTL and metadata column types.

CTL / metadata type Corresponding Wrangler column type Note

CTL / metadata type	Corresponding Wrangler column type	Note
`boolean`	`boolean`
`byte`	not supported
`cbyte`	not supported
`date`	`date`	Same type with same range.
`integer`	`integer`	Wrangler `integer` is larger than CTL integer - Wrangler uses 64-bit numbers while CTL uses 32-bit numbers. CTL integer stores values from -2147483648 to 2147483647.
`long`	`integer`	CTL `long` type corresponds to Wrangler `integer` type - both are 64-bit types.
`number`	`decimal`	CTL `number` is a floating-point type that does not store exact values. Its precision and range is much bigger than decimal so not every `number` value can be stored in decimal and the conversion may fail. The largest value number can store is 1.7976931348623157 x 10³⁰⁸, the smallest (in terms of absolute value) value it can store is 4.9406564584124654 x 10^-324. Both boundaries exceed the precision and scale available in Wrangler `decimal` type.
`decimal`	`decimal`	The types are the same except for precision and scale. Wrangler always uses `decimal(32, 10)` (32 significant digits, 10 are after decimal point). In CTL and metadata it is possible to declare decimal type with different scale and precision.
`string`	`string`
`list` of any type	not supported
`map` of any type	not supported
`variant`	not supported

boolean

byte

not supported

cbyte

not supported

date

Same type with same range.

integer

Wrangler integer is larger than CTL integer - Wrangler uses 64-bit numbers while CTL uses 32-bit numbers. CTL integer stores values from -2147483648 to 2147483647.

long

integer

CTL long type corresponds to Wrangler integer type - both are 64-bit types.

number

decimal

CTL number is a floating-point type that does not store exact values. Its precision and range is much bigger than decimal so not every number value can be stored in decimal and the conversion may fail. The largest value number can store is 1.7976931348623157 x 10³⁰⁸, the smallest (in terms of absolute value) value it can store is 4.9406564584124654 x 10^-324. Both boundaries exceed the precision and scale available in Wrangler decimal type.

decimal

The types are the same except for precision and scale. Wrangler always uses decimal(32, 10) (32 significant digits, 10 are after decimal point). In CTL and metadata it is possible to declare decimal type with different scale and precision.

string

list of any type

not supported

map of any type

not supported

variant

not supported

When writing formulas in Wrangler, you can also use CTL types like map, list or variant. However, you cannot declare them (so you cannot create temporary variables). Type conversion rules apply only when interacting with columns. If you use a column reference in a formula, its type will be converted to CTL type. Same applies in reverse - CTL type returned by your expression will be converted to Wrangler column type on write. This conversion may fail if there is no corresponding Wrangler type to map your expression result to.

`if` function

CTL does not have an if function but uses an if statement. They are functionally equivalent with Wrangler syntax being more natural in formulas since it directly returns a value which can be used right away. Both languages have ternary operator which works in the same way.

Wrangler syntax Corresponding CTL

Wrangler syntax	Corresponding CTL
`if(condition, value_if_true)`	`if (condition) { return value_if_true; }`
`if(condition, value_if_true, value_if_false)`	`if (condition) { return value_if_true; } else { return value_if_false; }`

if(condition, value_if_true)

if (condition) {
    return value_if_true;
}

if(condition, value_if_true, value_if_false)

if (condition) {
    return value_if_true;
} else {
   return value_if_false;
}

How to use CTL functions in Wrangler?

You can use CTL functions in Wrangler formulas directly by calling them with their CTL name. When calling CTL functions, be mindful of data types in Wrangler - Wrangler will automatically convert to/from CTL types based on the table and rules mentioned above.

Note that even though Wrangler does not support variant type, lists or maps as column types, they are available in Wrangler expression language. Therefore, following is a valid formula in Wrangler:

in($transactionType, ["CASH", "CARD", "WIRE"])

The formula uses in function to test whether value of a $transactionType column is one of the three values provided as a CTL string[] (list of strings) type. The formula will return true if the $transactionType is one of the three values and false otherwise.

List of common functions

Date functions

List of useful date functions for reference (full list of all date functions can be found here):

Function	Description
dateAdd	adds a number of time units to the specified date and returns a new date
dateDiff	returns the difference of two dates in a specified time unit
extractDate	extracts year, month, and day from a date
extractTime	extracts hours, minutes, seconds, and milliseconds from a date
getYear	extracts years from a date
getMonth	extracts months from a date
getDay	extracts days from a date
getHour	extracts hours from a date
getMinute	extracts minutes from a date
getSecond	extracts seconds from a date
getMillisecond	extracts milliseconds from a date
today	returns today’s date and time

Function

Description

dateAdd

adds a number of time units to the specified date and returns a new date

dateDiff

returns the difference of two dates in a specified time unit

extractDate

extracts year, month, and day from a date

extractTime

extracts hours, minutes, seconds, and milliseconds from a date

getYear

extracts years from a date

getMonth

extracts months from a date

getDay

extracts days from a date

getHour

extracts hours from a date

getMinute

extracts minutes from a date

getSecond

extracts seconds from a date

getMillisecond

extracts milliseconds from a date

today

returns today’s date and time

String functions

List of useful string (text) functions (full list of all string functions can be found here):

Function	Description
chop	removes specified characters or line feed and carriage return characters
concat	concatenates multiple string values
contains	returns true/false values if the specified characters are found
endsWith	returns true if the specified characters are found at the end of a string value, otherwise returns false
isBlank	returns true if input consists of spaces, otherwise returns false
isDate	returns true if input matches the date pattern, otherwise returns false
isDecimal	returns true if input matches a decimal pattern, otherwise returns false
isEmpty	returns true if input is empty, otherwise returns false
isInteger	returns true if input is an integer, otherwise returns false
isNumber	returns true if input is a number, otherwise returns false
isUrl	returns true if input is an URL, otherwise returns false
lowerCase	converts all characters to lowercase
removeBlankSpace	removes blank spaces
removeDiacritic	removes diacritical marks
removeNonAscii	removes non-Ascii characters
reverse	returns reverted string
startsWith	returns true if the specified characters are found at the end of a string value, otherwise returns false
trim	takes one string argument and returns another string with leading and trailing white spaces removed
upperCase	converts all characters to uppercase

Function

Description

chop

removes specified characters or line feed and carriage return characters

concat

concatenates multiple string values

contains

returns true/false values if the specified characters are found

endsWith

returns true if the specified characters are found at the end of a string value, otherwise returns false

isBlank

returns true if input consists of spaces, otherwise returns false

isDate

returns true if input matches the date pattern, otherwise returns false

isDecimal

returns true if input matches a decimal pattern, otherwise returns false

isEmpty

returns true if input is empty, otherwise returns false

isInteger

returns true if input is an integer, otherwise returns false

isNumber

returns true if input is a number, otherwise returns false

isUrl

returns true if input is an URL, otherwise returns false

lowerCase

converts all characters to lowercase

removeBlankSpace

removes blank spaces

removeDiacritic

removes diacritical marks

removeNonAscii

removes non-Ascii characters

reverse

returns reverted string

startsWith

returns true if the specified characters are found at the end of a string value, otherwise returns false

trim

takes one string argument and returns another string with leading and trailing white spaces removed

upperCase

converts all characters to uppercase

Conversion functions

List of useful conversion functions (full list of all conversion functions can be found here):

Function	Description
bool2num	converts boolean values to number 0 or 1
date2num	returns the number of specified time units from the entered date
date2str	converts the input date to the string data type according to the specified pattern, locale and target timeZone
decimal2integer	converts a decimal value to an integer
num2bool	converts a number to a boolean value
num2str	converts any numeric type to the string decimal representation
str2bool	converts a given string argument to the corresponding boolean value
str2date	converts the input to the date data type using the specified pattern, locale and timeZone
str2decimal	converts a given string argument into a decimal value
str2integer	converts a given string argument to the corresponding integer value, the conversion can be determined by a numeral system, format or locale
toString	converts the given argument (integer, decimal or boolean) to its string representation

Function

Description

bool2num

converts boolean values to number 0 or 1

date2num

returns the number of specified time units from the entered date

date2str

converts the input date to the string data type according to the specified pattern, locale and target timeZone

decimal2integer

converts a decimal value to an integer

num2bool

converts a number to a boolean value

num2str

converts any numeric type to the string decimal representation

str2bool

converts a given string argument to the corresponding boolean value

str2date

converts the input to the date data type using the specified pattern, locale and timeZone

str2decimal

converts a given string argument into a decimal value

str2integer

converts a given string argument to the corresponding integer value, the conversion can be determined by a numeral system, format or locale

toString

converts the given argument (integer, decimal or boolean) to its string representation

{{{ highlightedName }}}