Mask
Last updated
Last updated
Mask functions can be especially useful in the following situations:
To mask columns that contain directly identifiable information, such as Personally Identifiable Information (PII).
To mask columns that contain indirectly identifiable information, such as Birthdate columns.
When data needs to remain recognizable in format.
Open your Workspace.
On the Job Configuration tab, select the column icon on the top left of the column where you want to apply a mask function.
Under Column settings > Generation Method, select Mask to view the list of available mask functions.
Select the Mask function that you wish to apply from the dropdown list of available mask functions.
Set the relevant mask parameters.
Select Confirm.
To edit any mask settings you have applied previously:
Open your Workspace.
Now you can either:
On the Job Configuration tab, select the column icon on the top left of the column where you want to edit a mask function.
On the Job Configuration tab, under Applied steps, select the Edit icon next to the column name where you want to edit a mask function.
On the PII tab, select the Edit icon behind the column where you want to edit a mask function.
Under the Generation Method, define the parameters that you want to change.
Select Confirm.
Syntho offers various masking functions. Each function is designed to handle different types of sensitive data. To understand which function is designed for each data type, check the table below:
Data Type | Mask Functions |
---|---|
Datetime | Datetime Noise Hasher |
Integer | Hasher Numeric Noise Random Character Swap |
String (VARCHAR) | Format Preserving Encryption Numeric Hasher Random Character Swap |
Decimal | Mask |
The Format Preserving Encryption (FPE) function utilizes the FF3 algorithm to encrypt sensitive data while preserving its original format and length. This makes it ideal for fields where the data's structure must remain intact (e.g., credit card numbers or dates).
Preserves data format during encryption
Supports unique and randomized subsets for varied datasets
The Numerical Hasher generator provides secure hashing for numerical and categorical values. This method replaces original values with a hashed representation, ideal for ensuring data privacy while maintaining referential integrity in numerical datasets.
Hashing of categorical data
Maintains the original structure of hashed fields
The Random Character Swap function replaces individual characters in strings while preserving the structure of punctuation, spaces, and symbols. Characters are swapped within their respective categories (letters with letters, digits with digits), ensuring that the field's overall format remains usable. This function also supports consistent mapping, where a specific character is always replaced by the same mapped character, allowing for reliable and consistent masking during data anonymization.
Random swapping of characters while characters within their specific categories (e.g., letters are replaced with other letters, digits with other digits), so the original data type and structure of each field (letters, numbers, symbols) are preserved
Supports consistent mapping
Preserves non-alphabetic characters (e.g., punctuation, spaces)
Adds a random shift to a datetime value based on specified minimum and maximum shift parameters.
Date Part: The unit of time for the shift (e.g., Day, Month, Year).
Minimum Shift: The minimum amount the date can be shifted from the original value. Use negative numbers to indicate a shift to the past.
Maximum Shift: The maximum amount the date can be shifted from the original value.
Consistent mapping: A toggle to determine if the shift should be consistent for the same input values.
Adds noise to numeric data based on a uniform distribution, ensuring that the values are randomized while preserving the overall structure of the dataset. This is useful for anonymizing numerical fields where consistency and distribution must be maintained.
Applies uniform noise to numeric data
Supports consistent mapping
This function uses the FarmHash library to hash categorical values consistently. The Numeric Hasher maps one numeric value to another based on the FarmHash implementation, ensuring consistent results for identical inputs. This function is particularly useful when you need to anonymize numerical fields while maintaining a one-to-one relationship between input and output values.
Utilizes the FarmHash algorithm for consistent numeric hashing
Hashed values preserve the uniqueness and the format of the original data
When setting the parameters for a mask function, you have various options to tailor the data according to your needs. Here are the main mask parameters that are shared across mask functions:
Consistent mapping
Description: Enabling the consistent mapping allows you to generate the same masked data values for a given set of original data values every time the mocker is applied.
Options:
Enable: Turn on to consistently generate the same masked values for the same some original values.
Disable: Turn off consistent mapping to generate random masked data.
Considerations: It is possible that same original input value is consistently mapped to the same output mask value. For example, John and Mike in the original data can possibly both be mapped to the same value in the masked data.
Usage: When you need to consistently generate the same masked values for testing or demonstration purposes.
For more information on consistent mapping, please check Consistent mapping.
Date Part Selection:
From the Date Part dropdown list, select the unit of time (e.g., second, minute, hour day, month, year) that will define the granularity of the shift.
The selected unit will be applied to both the minimum and maximum shift fields.
Minimum Shift:
In the Minimum Shift field, input the smallest amount the date can be adjusted, relative to the original date.
Use negative numbers to shift the date into the past.
Example: If the date part is set to "Day" and the minimum shift is set to -5
, this ensures the date will not be shifted earlier than 5 days prior to the original date.
A positive number shifts the date forward.
Example: If the minimum shift is 5
, the date will not shift earlier than 5 days after the original date.
Maximum Shift:
In the Maximum Shift field, input the largest amount by which the date can be adjusted from the original value.
Example: If the date part is set to "Day" and the maximum shift is set to 5
, the date will not be shifted later than 5 days after the original date.
Consistent Mapping:
Toggle the Consistent Mapping setting to specify whether the date shifts should be consistent across similar operations.
Default Setting: Consistent Mapping is disabled, meaning that shifts will vary each time the generator is applied.
Enabled: When enabled, the generator will produce the same shift for identical datetime values during repeated operations.