LogoLogo
Go to Syntho.AI
English
English
  • Welcome to Syntho
  • Overview
    • Get started
      • Syntho bootcamp
        • 1. What is Syntho?
        • 2. Introduction data anonymization
        • 3. Connectors & workspace creation
        • 4. PII scan
        • 5. Generators
          • Mockers
          • Maskers
          • AI synthesize
          • Calculated columns
          • Free text de-identification
        • 6. Referential integrity & foreign keys
        • 7. Workspace synchronization & validation
        • 8. Workspace & user management
        • 9. Large workloads​
        • 10. AI synthesis: Data pre-processing when using
      • Prerequisites
      • Sample datasets
      • Introduction to data generators
      • AI-generated synthetic data
    • Frequently asked questions
  • Setup Workspaces
    • View workspaces
    • Create a workspace
      • Connect to a database
        • PostgreSQL
        • MySQL / MariaDB
        • Oracle
        • Microsoft SQL Server
        • DB2
        • Databricks
          • Importing Data into Databricks
        • Hive
        • SAP Sybase
        • Azure Data Lake Storage (ADLS)
        • Amazon Simple Storage Service (S3)
      • Workspace modes
    • Edit a workspace
    • Duplicate a workspace
    • Transfer workspace ownership
    • Share a workspace
    • Delete a workspace
    • Workspace default settings
  • Configure a Data Generation Job
    • Configure table settings
    • Configure column settings
      • AI synthesize
        • Sequence model
          • Prepare your sequence data
        • QA report
        • Additional privacy controls
        • Cross-table relationships limitations
      • Mock
        • Text
          • Supported languages
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • Other
      • Mask
        • Text
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • UUID
      • Duplicate
      • Exclude
      • Consistent mapping
      • Calculated columns
      • Key generators
        • Differences between key generators
      • JSON de-identification
    • Manage personally identifiable information (PII)
      • Privacy dashboard
      • Discover and de-identify PII columns
        • Identify PII columns manually
        • Automatic PII discovery with PII scanner
      • Remove columns from PII list
      • Automatic PII discovery and de-identification in free text columns
      • Supported PII & PHI entities
    • Manage foreign keys
      • Foreign key inheritance
      • Add virtual foreign keys
        • Add virtual foreign keys
        • Use foreign key scanner
        • Import foreign keys via JSON
        • Export foreign keys via JSON
      • Delete foreign keys
    • Validate and synchronize workspace
    • View and adjust generation settings
  • Deploy Syntho
    • Introduction
      • Syntho architecture
      • Requirements
        • Requirements for Docker deployments
        • Requirements for Kubernetes deployments
      • Access Docker images
        • Online
        • Offline
    • Deploy Syntho using Docker
      • Preparations
      • Deploy using Docker Compose
      • Run the application
      • Manually saving logs
      • Updating the application
      • Backup
    • Deploy Syntho using Kubernetes
      • Preparations
      • Deploy Ray using Helm
        • Upgrading Ray CRDs
        • Troubleshooting
      • Deploy Syntho using Helm
      • Validate the deployment
      • Troubleshooting
      • Saving logs
      • Upgrading the applications
      • Backup
    • Manage users and access
      • Single Sign-On (SSO) in Azure
      • Manage admin users
      • Manage non-admin users
    • Logs and monitoring
      • Does Syntho collect any data?
      • Temporary data storage by application
  • Syntho API
    • Syntho REST API
Powered by GitBook
On this page

Was this helpful?

  1. Overview
  2. Get started
  3. Syntho bootcamp
  4. 5. Generators

AI synthesize

PreviousMaskersNextCalculated columns

Last updated 15 days ago

Was this helpful?

allows you to synthesize realistic data using machine learning models trained on your original dataset. This method maintains statistical fidelity while ensuring privacy and unlinkability to the source records.

When to use

  • To create synthetic datasets for machine learning or analytics

  • When high statistical accuracy and maximum privacy are required

  • To expand datasets while preserving original distributions

When not to use

  • When working with multiple related tables

  • When data consistency across systems is required

  • When you need to be able to revert to original records

  • If entirely new, unseen text values must be generated

  • If the data needs to follow specific rules with 100% certainty​

Interactive guide: How to apply AI synthesize

Follow the interactive guide below to apply AI synthesize.

To protect privacy, Syntho can automatically replace infrequent values in categorical columns:

  • Threshold: minimum frequency before a value is considered rare (default = 10)

  • Replacement: value used to replace rare categories (default = *)

  • Max rows used for training: limit data for faster performance

  • Take random sample: randomly sample rows for training

  • Clipping thresholds: restrict extreme values in numeric/date columns

  • Locale: set language model context for text/PII

AI synthesize
Rare category protection
Advanced settings
Generator-level
Column-level

Supported data types

The Syntho platform supports a wide variety of data types. Under the hood, Syntho uses an encoding scheme where each data type is mapped to one of the following encoding types.

Data type
Description

Numerical counts (e.g. number of visits)

Continuous values (e.g. weight, temperature)

Predefined values (e.g. blood type, country)

Timestamps and dates (e.g. created at)

Discrete
Continuous
Categorical
Datetime