Data anonymization steps

End user’s guide > Wrangler user guide > Transformation steps > Data anonymization steps

Add noise to date: replace the date with new random date near the original value to anonymize your date values.
Add noise to number: replace the original value with newly generated number that is near the original to anonymize numeric data.
Mask text: mask parts of text to hide personal or private data.

Add noise to date

The Add noise to date step can be used to anonymize date values. This step produces a new date that is within the specified range around the original value. See more details in How are the new values calculated.

Parameters

Input column: required, select a date column.
Noise amount: required, enter the amount and unit by which to change the new date that will be randomly created from the original value. See below for more details about how the new values are generated.
- The default value is 10 days.
- Time units can be: days, weeks, months, years, hours, minutes, seconds, or milliseconds.
Truncate to date: selected by default. When selected, time units are disregarded and always output as 0:00:00 in the result values.
Target column: required, configure the column which will receive the output.
- Write result to the current column: outputs data into the Input column.
- Create new column with name: create a new column with the specified name. Name of the new column can contain spaces or special characters - the technical column name will be created automatically. The new column will be placed right next to the Input column.

How are the new values calculated

The step performs an operation that can be imagined as randomly shifting the original value by an amount that is smaller than the provided maximum.

It works by calculating lower and upper bounds for the new value. These are both based on the Noise amount parameter:

Lower bound is calculated as $originalValue - noiseAmount
Upper bound is calculated as $originalValue + noiseAmount

The calculation properly accounts for complexities of date math - short/long months, DST "jumps", leap years, etc.

Once the lower and upper bound is computed, a new random date is picked in the interval between those two values. This effectively means that you get a new random date and time value that is never more than specified amount of time away from the original value.

Examples

The example date below is in the yyyy-MM-dd format.

Original date Noise amount Truncate to date only Calculated range Result amount

Original date	Noise amount	Truncate to date only	Calculated range	Result amount
2023-03-01	10 days	Any	±10 days	The result date will be between `February 19th, 2023` and `March 11th, 2023`.
2023-03-01	10 weeks	Any	±10 weeks	The result date will be between `December 21st, 2022` and `May 10th, 2023`.
2023-03-01 08:00:00	10 weeks	Yes (checked)	±10 weeks	The result date will be between `December 21st, 2022, 0:00:00` and `May 10th, 2023, 0:00:00`.
2023-03-01 08:00:00	10 weeks	No (unchecked)	±10 weeks	The result date and time will be between `December 21st, 2022, 0:00:00` and `May 10th, 2023, 23:59:59`.
2023-03-01 08:00:00	12 hours	Yes (checked)	±12 hours	The result date and time will be between `February 28th, 2023, 0:00:00` and `March 1st, 2023, 0:00:00`.
2023-03-01 08:00:00	12 hours	No (unchecked)	±12 hours	The result date and time will be between `February 28th, 2023, 20:00:00` and `March 1st, 2023, 20:00:00`.
No value	12 hours	Any	Not calculated	`No value` (the result is an empty value).

2023-03-01

10 days

Any

±10 days

The result date will be between February 19th, 2023 and March 11th, 2023.

2023-03-01

10 weeks

Any

±10 weeks

The result date will be between December 21st, 2022 and May 10th, 2023.

2023-03-01 08:00:00

10 weeks

Yes (checked)

±10 weeks

The result date will be between December 21st, 2022, 0:00:00 and May 10th, 2023, 0:00:00.

2023-03-01 08:00:00

10 weeks

No (unchecked)

±10 weeks

The result date and time will be between December 21st, 2022, 0:00:00 and May 10th, 2023, 23:59:59.

2023-03-01 08:00:00

12 hours

Yes (checked)

±12 hours

The result date and time will be between February 28th, 2023, 0:00:00 and March 1st, 2023, 0:00:00.

2023-03-01 08:00:00

12 hours

No (unchecked)

±12 hours

The result date and time will be between February 28th, 2023, 20:00:00 and March 1st, 2023, 20:00:00.

No value

12 hours

Any

Not calculated

No value (the result is an empty value).

Remarks

The dates are randomized again with every job run - you will get new values every time.
The step does not modify empty values - they will remain empty even after the step. If you wish to handle these in some other way, please see Working with empty values for more information.

Add noise to number

The Add noise to number step can be used to anonymize data in integer or decimal columns. This step generates a new random value that is within specified distance from the original value. See more details in How are the new values calculated.

Parameters

Input column: required, select a decimal or integer column.
Noise amount: required, enter a *number* or *percentage* that will be used to calculate the range from which new random numbers is generated.
- Default value is 10%.
Target column: required, configure the column which will receive the output.
- Write result to the current column: outputs data into the Input column.
- Create new column with name: create a new column with the specified name. Name of the new column can contain spaces or special characters - the technical column name will be created automatically. The new column will be placed right next to the Input column.

How are the new values calculated

The step works by randomly generating a new value that is within the distance specified by the Noise amount parameter from the original value.

It works by calculating lower and upper bounds for the new value. These are both based on the Noise amount parameter:

Lower bound is calculated as $originalValue - noiseAmount if the Noise amount is specified as absolute number or as $originalValue * (100 - noiseAmount) / 100 if the Noise amount is specified as percentage value.
Upper bound is calculated as $originalValue + noiseAmount if the Noise amount is specified as absolute number or as $originalValue * (100 + noiseAmount) / 100 if the Noise amount is specified as percentage value.

Note that in both cases the algorithm properly accounts for values that either too large or too small and will not cause overflows.

Once the lower and upper bound is computed, a new random number is picked in the interval between those two values. This effectively means that you get a new random number that is never more than specified amount of time away from the original value.

Examples

Original amount Noise amount Display format Calculated range Result amount

Original amount	Noise amount	Display format	Calculated range	Result amount
2000	10%	Any	±200 (10% of 2000)	The result amount will be between `1800` and `2200`.
2,000	400	Any	±400	The result amount will be between `1600` and `2400`.
99.99	10%	Not set	±9.99	The result amount will be between `90.0000000000` and `109.9800000000`.
99.99	10%	#.##	±9.99	The result amount will be between `90.00` and `109.98`.
0	10%	Any	0	The result amount will be `0`.
0	10	Any	±10	The result amount will be between `-10` and `10`.
No value	Any amount or percentage	Any	Not calculated	`No value` (the result is an empty value).

2000

10%

Any

±200 (10% of 2000)

The result amount will be between 1800 and 2200.

2,000

400

Any

±400

The result amount will be between 1600 and 2400.

99.99

10%

Not set

±9.99

The result amount will be between 90.0000000000 and 109.9800000000.

99.99

10%

#.##

±9.99

The result amount will be between 90.00 and 109.98.

10%

Any

The result amount will be 0.

Any

±10

The result amount will be between -10 and 10.

No value

Any amount or percentage

Any

Not calculated

No value (the result is an empty value).

Remarks

New random values are generated with every job run.
When adding noise to decimal values, the number of decimal places in the randomized values depends on the display format of the original column. See Working with decimals for more information on how to change the display format. When no display format is specified, the randomized numbers can have up to 10 decimal places.
This step does not affect empty values (see Working with empty values for more information on empty values in Wrangler).

Mask text

Mask text step implements data masking algorithms that allow you to hide private data in simple way. For example, you can use it to mask out parts of phone numbers, credit cards, personal names, and so on.

Parameters

Input column: required, a string column containing input text to mask.
Characters to mask: required, determines what kind of characters to mask. Following options are available:
- Letters and digits: default, only mask letters and digits, whitespaces and other characters will remain as they were.
- Letters only: only mask letters, all other characters will remain as they were.
- Digits only: only mask digits, all other characters will remain as they were.
- All characters: mask all characters regardless of their type (this is the most generic option).
Masking character: required, a single character that will be used as a replacement for each character selected in the Characters to mask parameter.
Mask suffix: configure how many characters from the input value to mask in each group of characters of the same type. Value must be an integer greater than or equal to 0. Two options are available:
- All: all characters are considered for masking.
- Number of characters from a string group to preserve unmasked: allows you to specify the number of characters from the beginning of each group of characters that will not be masked. Value must be at least 1.
Target column: required, configure the column which will receive the output. Output will always be of string type.
- Write result to the current column: overwrite the input column with the result.
- Create new column with name: create a new column with specified name. Name of the new column can contain spaces or special characters - technical column name will be created automatically. The new column will be placed right after the input column.

Examples

Input value	Characters to mask	Masking character	Mask suffix	Result	Description
"123-456-7890"	Digits only	"x"	All	"xxx-xxx-xxxx"	Mask all digits in each group.
"123-456-7890"	Letters only	"x"	All	"123-456-7890"	No change - input value does not contain any letters.
"123-456-7890"	All	"x"	All	"xxxxxxxxxxxx"	Mask all characters - the result will be the same length as input and will consist entirely of the mask characters.
"123-456-7890"	Digits only	"x"	2	"12x-45x-78xx"	Find all digits and from each group of digits keep the first two (based on prefix length) and replace the remaining digits with "x".
"" (empty string)	Any	Any	Any	"" (empty string)	Calling the step on an empty string will return empty string.
No value	Any	Any	Any	No value	Calling the step on No value input will return No value.
Error	Any	Any	Any	Error	Calling the step on an Error will return an Error.

Input value

Characters to mask

Masking character

Mask suffix

Result

Description

"123-456-7890"

Digits only

"x"

All

"xxx-xxx-xxxx"

Mask all digits in each group.

"123-456-7890"

Letters only

"x"

All

"123-456-7890"

No change - input value does not contain any letters.

"123-456-7890"

All

"x"

All

"xxxxxxxxxxxx"

Mask all characters - the result will be the same length as input and will consist entirely of the mask characters.

"123-456-7890"

Digits only

"x"

"12x-45x-78xx"

Find all digits and from each group of digits keep the first two (based on prefix length) and replace the remaining digits with "x".

"" (empty string)

Any

"" (empty string)

Calling the step on an empty string will return empty string.

No value

Any

No value

Calling the step on No value input will return No value.

Error

Any

Error

Calling the step on an Error will return an Error.

Remarks

Calling the step on a No value string results in No value.
Calling the step on a cell with Error will result in an Error.

Data anonymization steps

Add noise to date

Parameters

How are the new values calculated

Examples

Remarks

See also

Add noise to number

Parameters

How are the new values calculated

Examples

Remarks

See also

Mask text

Parameters

Examples

Remarks

See also