AI synthesize
AI synthesize allows you to synthesize realistic data using machine learning models trained on your original dataset. This method maintains statistical fidelity while ensuring privacy and unlinkability to the source records.
When to use
To create synthetic datasets for machine learning or analytics
When high statistical accuracy and maximum privacy are required
To expand datasets while preserving original distributions
When not to use
When working with multiple related tables
When data consistency across systems is required
When you need to be able to revert to original records
If entirely new, unseen text values must be generated
If the data needs to follow specific rules with 100% certainty
Supported data types
The Syntho platform supports a wide variety of data types. Under the hood, Syntho uses an encoding scheme where each data type is mapped to one of the following encoding types.
Numerical counts (e.g. number of visits)
Continuous values (e.g. weight, temperature)
Predefined values (e.g. blood type, country)
Timestamps and dates (e.g. created at)
Interactive guide: How to apply AI synthesize
Follow the interactive guide below to apply AI synthesize.
To protect privacy, Syntho can automatically replace infrequent values in categorical columns:
Threshold: minimum frequency before a value is considered rare (default = 10)
Replacement: value used to replace rare categories (default =
*
)
Max rows used for training: limit data for faster performance
Take random sample: randomly sample rows for training
Clipping thresholds: restrict extreme values in numeric/date columns
Locale: set language model context for text/PII
Last updated
Was this helpful?