Text manipulation steps
-
Trim: trim leading and trailing spaces
-
Substring: extract characters from a specified position in the text
-
Left substring: extract characters from the beginning of the text
-
Right substring: extract characters from the end of the text
-
Pad left: pad value to specified length by adding characters at the beginning
-
Pad right: pad value to specified length by adding characters at the end
-
Normalize whitespaces: replace multiple whitespaces with single space
-
Remove whitespaces: remove all whitespace characters from text
-
Remove accents: remove all accents from characters
-
Lowercase: convert letters in input text to lowercase
-
Propercase: capitalize first letters in input text
-
Uppercase: convert letters in input text to uppercase
-
Remove non-alphanumeric characters: remove non-alphanumeric characters from input text
-
Remove non-ASCII characters: remove non-ASCII characters from input text
-
Remove non-numeric characters: remove non-numeric characters from input text
-
Remove non-printable characters: remove non-printable characters from input text
-
Replace text: replace matches of regular expression within text with another text
-
Calculate formula: use custom formulas to manipulate text
Left substring
Left substring allows you to extract specific number of characters from the beginning of text (i.e., it extracts first characters).
Parameters
-
Input column: required, a string column containing input text.
-
Number of characters: number of characters to extract from the input string. The value must be greater than or equal to zero. You can use values greater than the length of the input string - whole input will be returned in such case (no padding is done).
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Number of characters | Output value | Description |
---|---|---|---|
"INV-20230328" |
3 |
"INV" |
Extract first three characters to determine type of a document ("Invoice"). |
"short" |
10 |
"short" |
Using higher number of characters than the length of the input returns the whole input value. |
"INV-20230328" |
0 |
"" |
Setting number of characters to 0 returns an empty string. Note that empty string is not the same as No value (which is stored as |
No value |
5 |
No value |
Returns No value when called on No value input. |
Error |
any |
Error |
Calling the step on a cell with Error will result in an Error. |
Remarks
-
Calling the step on a No value string results in No value.
-
If number of characters specified in the parameter is greater than the length of the input string, the whole input string is returned.
-
Calling the step on a cell with Error will result in an Error.
Right substring
Right substring allows you to extract specific number of characters from the end of text (i.e., it extracts last characters).
Parameters
-
Input column: required, a string column containing input text.
-
Number of characters: number of characters to extract from the input string. The value must be greater than or equal to zero. You can use values greater than the length of the input string - whole input will be returned in such case (no padding is done).
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Number of characters | Output value | Description |
---|---|---|---|
"INV-20230328" |
8 |
"20230328" |
Extract date part from an invoice number - last 8 characters in this example. |
"short" |
10 |
"short" |
using higher number of characters than the length of the input returns the whole input value. |
"INV-20230328" |
0 |
"" |
Setting the number of characters to 0 returns an empty string. |
No value |
5 |
No value |
Returns No value when called on No value input. |
Error |
any |
Error |
Calling the step on a cell with Error will result in an Error. |
Remarks
-
Calling the step on a No value string results in No value.
-
If number of characters specified in the parameter is greater than the length of the input string, the whole input string is returned.
-
Calling the step on a cell with Error will result in an Error.
Substring
Substring step returns specific number of characters from provided text starting with specified character position. First character has position of 0.
Parameters
-
Input column: required, a string column containing input text.
-
Start position: required, starting position of the substring. First character in the string has position of 0, last character has position of
length($Input) - 1
. Negative start positions are not allowed. Start position greater than or equal to the length of the input is allowed. -
Number of characters: required, number of characters returned to copy from the input string. Value must be greater or equal to 0. Values longer than the remainder of the string from Start position are allowed.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Starting position | Number of characters | Output value | Description |
---|---|---|---|---|
"INV-20230328" |
4 |
4 |
"2023" |
Returns 4 characters starting from the fifth one (character with index 4 is the fifth character). |
"INV-20230328" |
4 |
20 |
"20230328" |
If Number of characters is longer than the rest of the input after Start position, only the available characters are returned. |
"London" |
10 |
5 |
"" (empty string) |
Start position is after the end of the input, empty string is returned. |
"INV-20230328" |
0 |
3 |
"INV" |
First three characters of the input. Same result as using Left substring step with Number of characters set to 3. |
"INV-20230328" |
4 |
8 |
"20230328" |
Simpler solution would be reached by using Right substring step |
No value |
any |
any |
No value |
No value input produces No value output. |
Error |
any |
any |
Error |
Error input value leads to Error on output. |
Remarks
-
If Starting position is greater than the length of the input text, empty string "" is returned.
-
If Starting position is less than the length of the input text and the Number of characters exceeds the length of the remainder of the input, only characters up to the end of the input are returned (no padding is done).
Trim
Trim step removes leading and trailing spaces from text. Spaces within the text (i.e., spaces between words) are left without any changes.
Parameters
-
Input column: required, a string column containing input text.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Output value | Description |
---|---|---|
"Spaces after " |
"Spaces after" |
Trailing spaces are removed. |
" Spaces before" |
"Spaces before" |
Leading spaces are removed. |
" Space before and after " |
"Space before and after" |
Both leading and trailing spaces are removed. |
" Spaces in the middle " |
"Spaces in the middle" |
Leading and trailing spaces are removed but spaces between words are left unchanged. |
" " (spaces only) |
"" (empty text) |
Trim of text containing only spaces results in empty text - a string with zero length. Note this is different than No value. |
"" (empty text) |
"" (empty text) |
Trim of empty text produces empty text (zero-length string). Note this is different than No value. |
No value |
No value |
Trim of a No value cell results in a No value output. |
Error |
Error |
Running Trim on errors propagates error. |
Remarks
-
Trim removes variety of whitespace characters not just "regular" spaces:
-
Space character (various variants of Unicode space characters are removed)
-
New line characters and their combinations as used in Windows, Mac or Linux (line feed
\n
, form feed\f
, carriage return\r
, line separator\u2028
, paragramp separator\u2029
) -
Tab (both horizontal tab
\t
as well as vertical tab\u000B
) -
Unicode separator character (file, group, record, unit separators)
-
-
Calling Trim step on a cell with Error will result in an Error.
Pad left
Pad left allows you to pad string values to specified minimum length by adding a designated character at the beginning of the string. If the input is longer than specified length, no change is made.
Parameters
-
Input column: required, a string column containing input text.
-
Length: a numeric value containing minimal length of an output string. The value must be greater than 0.
-
Padding char: required, a character used as a filler character. Only a single character can be used here.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Length | Padding char | Output value | Description |
---|---|---|---|---|
"1234" |
6 |
"0" |
"001234" |
Pad to 6 characters by adding zeroes at the beginning. |
"1234" |
2 |
"0" |
"1234" |
Length value is smaller than the length of the Input value, no change to input is made. |
"1234" |
6 |
"00" |
Error |
Padding character cannot contain multiple characters. |
No value |
any |
any |
No value |
Calling the step on No value input will return No value. |
Error |
any |
any |
Error |
Calling the step on an Error will return an Error. |
Remarks
-
Padding char must contain exactly one character (including whitespace characters). Longer padding string will result in an error.
-
If length of the input is longer than the specified Length, the input is returned unchanged.
Pad right
Pad right allows you to pad string values to specified minimum length by adding a designated character at the end of the string. If the input is longer than specified length, no change is made.
Parameters
-
Input column: required, a string column containing input text.
-
Length: a numeric value containing minimal length of an output string. The value must be greater than 0.
-
Padding char: required, a character used as a filler character. Only a single character can be used here.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Length | Padding char | Output value | Description |
---|---|---|---|---|
"1234" |
8 |
" " (space) |
"1234 " |
Pad to 8 characters by adding spaces at the end. |
"1234" |
2 |
"0" |
"1234" |
Length value is smaller than the length of the Input value, no change to input is made. |
"1234" |
6 |
"00" |
Error |
Padding character cannot contain multiple characters. |
No value |
any |
any |
No value |
Calling the step on No value input will return No value. |
Error |
any |
any |
Error |
Calling the step on an Error will return an Error. |
Remarks
-
Padding char must contain exactly one character (including whitespace characters). Longer padding string will result in an error.
-
If length of the input is longer than the specified Length, the input is returned unchanged.
Normalize whitespaces
Normalize whitespaces step removes tabs, new line or carriage return characters, and extra spaces from the input text and replaces them with a single space character when in between words, or removes them completely when before the first word or after the last word. This step can help you optimize your data and quickly remove unwanted characters that were entered in error or that are in an unwanted format.
Parameters
-
Input column: required, a string column containing input text.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the Input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the Input column.
-
Examples
Input value | Output value | Explanation |
---|---|---|
"Text with single spaces." |
"Text with single spaces." |
No change because there are no extra whitespace characters. |
"Text with lots of space characters." |
"Text with lots of space characters." |
All extra spaces are removed. |
"Text with tabs and |
"Text with tabs and a new line." |
All extra spaces, tabs, and new line/carriage return characters are removed. |
" Test word " |
"Test word" |
Extra spaces at the beginning and end are removed. |
No value |
No value |
Empty value produces empty value on output. |
Error |
Error |
Applying the step to an error results in another Error. |
Remove whitespaces
Remove whitespaces step removes all whitespace (invisible) characters from text including leading and trailing whitespaces as well as any whitespace characters within the text.
Parameters
-
Input column: required, a string column to remove the whitespaces from.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Output value | Description |
---|---|---|
"Text with spaces." |
"Textwithspaces." |
Any space in the string is removed. |
" Text with lots of whitespace characters. " |
"Textwithlotsofwhitespacecharacters." |
Any space including leading and trailing spaces are removed. |
"Text with |
"Textwithanewline." |
All whitespace characters are removed; including new lines and tabs. |
"1 st" |
"1st" |
All spaces in a sentence or between words are removed. |
Remarks
-
Remove whitespaces step only works with string columns.
-
Removes variety of "invisible" spacing characters not just spaces:
-
Regular space
-
New line (both Windows and Linux type)
-
Tab
-
-
To remove only leading and trailing whitespace characters, use Trim step.
Remove accents
Remove accents removes all accents from the text by converting all accented characters into the same character without an accent.
Parameters
-
Input column: required, a string column containing input text.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Output value | Description |
---|---|---|
Côte d’Azure |
Cote d’Azure |
"ô" → "o". Apostrophe is not removed since it is not an accent. |
Saarbrücken |
Saarbrucken |
"ü" → "u" |
No value |
No value |
Calling the step on No value returns No value. |
Error |
Error |
Calling the step on Error returns Error. |
Remarks
-
Calling the step on No value returns No value.
-
Calling the step on cell with an Error returns Error.
Lowercase
Lowercase step converts all letter characters to their lowercase form.
Parameters
-
Input column: required, a string column containing input text.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Output value | Description |
---|---|---|
"CLOVERDX" |
"cloverdx" |
All uppercase letters are transformed to lowercase. |
"camelCase" |
"camelcase" |
All uppercase letters are transformed to lowercase. |
"Corporation Inc."" |
"corporation inc." |
All uppercase letters are transformed to lowercase. |
"INV07282710PH"" |
"inv0728271ph" |
All uppercase letters are transformed to lowercase while numerical characters stay the same. |
No value |
No value |
Returns No value when called on No value input. |
Error |
Error |
Calling the step on a cell with Error will result in an Error. |
Remarks
-
Non-letter characters (e.g., digits, symbols, …) remain without any changes.
Propercase
Propercase step capitalizes first letters of each word in string columns. Note that it does convert any of the following letters to lowercase. If you want to make sure that there are no capital letters within words, use the Lowercase step first to convert all letters to lowercase.
Parameters
-
Input column: required, a string column containing input text.
-
Target column: required, configure the column which will receive the output. The output will always be of string type.
-
Write result to the current column: overwrite the Input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the Input column.
-
Examples
Input value | Output value | Explanation |
---|---|---|
"product code name" |
"Product Code Name" |
First letters of all words are capitalized. |
"PRODUCT CODE NAME" |
"PRODUCT CODE NAME" |
No change; words in uppercase will remain in uppercase. |
"ProDuct cOde name" |
"ProDuct COde Name" |
First letters of all words are capitalized, and the uppercase letters in the middle of the words are retained. |
Uppercase
Uppercase step converts all letter characters to uppercase form (i.e., it capitalizes all letters in the input).
Parameters
-
Input column: required, a string column containing input text.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Output value | Description |
---|---|---|
"cloverdx" |
"CLOVERDX" |
All lowercase letters are transformed to uppercase. |
"camelCase" |
"CAMELCASE" |
All lowercase letters are transformed to uppercase. |
"Corporation Inc."" |
"CORPORATION INC." |
All lowercase letters are transformed to uppercase. |
"inv07282710ph"" |
"INV0728271PH" |
All lowercase letters are transformed to uppercase while numerical characters stay the same. |
No value |
No value |
Returns No value when called on No value input. |
Error |
Error |
Calling the step on a cell with Error will result in an Error. |
Remarks
-
Non-letter characters (e.g., digits, symbols, …) remain without any changes.
Remove non-alphanumeric characters
Remove non-alphanumeric characters removes whitespaces, punctuation, math operators and other special characters from the text, only letters and numbers are kept.
Parameters
-
Input column: required, a string column containing input text.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Output value | Description |
---|---|---|
(8+4)*2 |
842 |
The math operators (), +, and * are removed. |
34% of books |
34ofbooks |
The math operator % is removed along with whitespaces. |
gâteau |
gâteau |
All text is alphanumeric so it is unmodified. |
おはよう日本。 |
おはよう日本 |
Japanese characters are not letters but are considered alpha-numeric and are not removed. The period character at the end is removed. |
No value |
No value |
Calling the step on No value returns No value. |
Error |
Error |
Calling the step on Error returns Error. |
Remarks
-
Calling the step on No value returns No value.
-
Calling the step on cell with an Error returns Error.
Remove non-ASCII characters
Remove non-ASCII characters removes all characters that are not in US ASCII encoding.
Parameters
-
Input column: required, a string column containing input text.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Output value | Description |
---|---|---|
Voyez le brick géant que j’examine. |
Voyez le brick gant que j’examine |
Only the "é" is a non-US ASCII character therefore it was removed |
Příšerný žluťoučký kůň úpěl ďábelské ódy. |
Pern luouk k pl belsk dy. |
This string has many non-US ASCII characters therefore many parts were removed |
おはよう日本。 |
"" (empty string) |
Japanese text does not contain any ASCII characters and is completely removed. |
No value |
No value |
Calling the step on No value returns No value. |
Error |
Error |
Calling the step on Error returns Error. |
Remarks
-
Calling the step on No value returns No value.
-
Calling the step on cell with an Error returns Error.
Remove non-numeric characters
Remove non-numeric characters removes all characters except for (0-9) or decimal point (period character).
Parameters
-
Input column: required, a string column containing input text.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Output value | Description |
---|---|---|
6.125% |
6.125 |
Only characters 0-9 and decimal point are kept, % character is removed. |
123 2nd Street apartment 3b |
12323 |
All of the whitespaces and alphabetical characters are removed. |
No value |
No value |
Calling the step on No value returns No value. |
Error |
Error |
Calling the step on Error returns Error. |
Remarks
-
Calling the step on No value returns No value.
-
Calling the step on cell with an Error returns Error.
Remove non-printable characters
Remove non-printable characters removes all non-printable characters from value - tabs, line breaks, page breaks and variety of control characters. Note that regular spaces as used between words are not removed.
Parameters
-
Input column: required, a string column containing input text.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Output value | Description |
---|---|---|
Welcome:⟶John Doe! |
Welcome:John Doe! |
The tab (denoted by the ⟶ symbol in the example) is removed while the four regular spaces between words are kept. |
Please sign here: |
Please sign here:John Doe |
The spaces are kept while new line characters are removed. |
No value |
No value |
Calling the step on No value returns No value. |
Error |
Error |
Calling the step on Error returns Error. |
Remarks
-
Remove non-printable characters removes all non-printable characters. you can find a full list here.
-
Calling the step on No value returns No value.
-
Calling the step on cell with an Error returns Error.
Replace text
Replace text replaces all matches of a regular expression within text with the replacement text.
Parameters
-
Input column: required, a string column containing input text.
-
Replaced text: required, a regular expression to search for in the input text. Note that the replace function is case-sensitive.
-
Replacement: optional, text which will be used as replacement for each occurrence of the Replaced text in the input. If left empty, the value in the Replaced text field will be deleted.
-
Target column: required, configure the column which will receive the output. Output will always be of string type.
-
Write result to the current column: overwrite the input column with the result.
-
Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.
-
Examples
Input value | Replaced text | Replacement | Result | Description |
---|---|---|---|---|
Two boxes: red box and blue BOX |
box |
car |
Two cares: red car and blue BOX |
Simple replacements without special symbols can be done directly. Search for instances of "box" and replace with "car". Search is case-sensitive, so word "BOX" is left as is. |
Hello, Emily. |
|
|
H-X-llo, -X-mily |
Square brackets define character group. Here we define a group that contains "E" and "e"All occurrences of letters from the group are replaced with string |
Hello, Emily |
e(l+) |
a$1 |
Hallo, Emily |
In this example we define capture group using round brackets and use it to capture all lower-case Ls. The group is then referenced in the replacement text using |
No value |
No value |
Calling the step on No value returns No value. |
Error |
Error |
Remarks
-
Calling the step on No value returns No value.
-
Calling the step on cell with an Error returns Error.