2. Introduction data anonymization
Syntho offers a flexible set of data generators that help anonymize sensitive data based on the nature of the dataset, privacy requirements, and use case. Below is a summary of the main generator types and when to use each.
Trains a generative model to create synthetic rows that mimic the original dataset, without any one-to-one relation. Use when: you need statistical fidelity and privacy, e.g. for machine learning or testing large datasets. Avoid when: you need data consistency across related tables.
Generate fully random, user-defined values. Use when: format matters, but relationship to original values is not important. Avoid when: consistency or referential integrity is needed.
Maps original values to consistent mock values. Use when: consistent replacement of values is needed across datasets or environments. Avoid when: randomness is more important than consistency.
Directly modifies original values while preserving format. Use when: the output must remain in a recognizable or valid format. Avoid when: preserving exact values or reversibility is required.
Uses business logic to generate values. Use when: you need calculated outputs based on specific conditions. Avoid when: data generation is simple and logic is not required.
Create or transform keys while maintaining or removing relational links. Use when: managing primary and foreign keys across tables. Avoid when: relationships are not needed.
Last updated
Was this helpful?