2. Introduction data anonymization

Syntho offers a flexible set of data generators that help anonymize sensitive data based on the nature of the dataset, privacy requirements, and use case. Below is a summary of the main generator types and when to use each.

Trains a generative model to create synthetic rows that mimic the original dataset, without any one-to-one relation. Use when: you need statistical fidelity and privacy, e.g. for machine learning or testing large datasets. Avoid when: you need data consistency across related tables.

Generate fully random, user-defined values. Use when: format matters, but relationship to original values is not important. Avoid when: consistency or referential integrity is needed.

Maps original values to consistent mock values. Use when: consistent replacement of values is needed across datasets or environments. Avoid when: randomness is more important than consistency.

Directly modifies original values while preserving format. Use when: the output must remain in a recognizable or valid format. Avoid when: preserving exact values or reversibility is required.

Uses business logic to generate values. Use when: you need calculated outputs based on specific conditions. Avoid when: data generation is simple and logic is not required.

Create or transform keys while maintaining or removing relational links. Use when: managing primary and foreign keys across tables. Avoid when: relationships are not needed.

Last updated

Was this helpful?