LogoLogo
Go to Syntho.AI
English
English
  • Welcome to Syntho
  • Overview
    • Get started
      • Syntho bootcamp
        • 1. What is Syntho?
        • 2. Introduction data anonymization
        • 3. Connectors & workspace creation
        • 4. PII scan
        • 5. Generators
          • Mockers
          • Maskers
          • AI synthesize
          • Calculated columns
          • Free text de-identification
        • 6. Referential integrity & foreign keys
        • 7. Workspace synchronization & validation
        • 8. Workspace & user management
        • 9. Large workloads​
        • 10. AI synthesis: Data pre-processing when using
      • Prerequisites
      • Sample datasets
      • Introduction to data generators
      • AI-generated synthetic data
    • Frequently asked questions
  • Setup Workspaces
    • View workspaces
    • Create a workspace
      • Connect to a database
        • PostgreSQL
        • MySQL / MariaDB
        • Oracle
        • Microsoft SQL Server
        • DB2
        • Databricks
          • Importing Data into Databricks
        • Hive
        • SAP Sybase
        • Azure Data Lake Storage (ADLS)
        • Amazon Simple Storage Service (S3)
      • Workspace modes
    • Edit a workspace
    • Duplicate a workspace
    • Transfer workspace ownership
    • Share a workspace
    • Delete a workspace
    • Workspace default settings
  • Configure a Data Generation Job
    • Configure table settings
    • Configure column settings
      • AI synthesize
        • Sequence model
          • Prepare your sequence data
        • QA report
        • Additional privacy controls
        • Cross-table relationships limitations
      • Mock
        • Text
          • Supported languages
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • Other
      • Mask
        • Text
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • UUID
      • Duplicate
      • Exclude
      • Consistent mapping
      • Calculated columns
      • Key generators
        • Differences between key generators
      • JSON de-identification
    • Manage personally identifiable information (PII)
      • Privacy dashboard
      • Discover and de-identify PII columns
        • Identify PII columns manually
        • Automatic PII discovery with PII scanner
      • Remove columns from PII list
      • Automatic PII discovery and de-identification in free text columns
      • Supported PII & PHI entities
    • Manage foreign keys
      • Foreign key inheritance
      • Add virtual foreign keys
        • Add virtual foreign keys
        • Use foreign key scanner
        • Import foreign keys via JSON
        • Export foreign keys via JSON
      • Delete foreign keys
    • Validate and synchronize workspace
    • View and adjust generation settings
  • Deploy Syntho
    • Introduction
      • Syntho architecture
      • Requirements
        • Requirements for Docker deployments
        • Requirements for Kubernetes deployments
      • Access Docker images
        • Online
        • Offline
    • Deploy Syntho using Docker
      • Preparations
      • Deploy using Docker Compose
      • Run the application
      • Manually saving logs
      • Updating the application
      • Backup
    • Deploy Syntho using Kubernetes
      • Preparations
      • Deploy Ray using Helm
        • Upgrading Ray CRDs
        • Troubleshooting
      • Deploy Syntho using Helm
      • Validate the deployment
      • Troubleshooting
      • Saving logs
      • Upgrading the applications
      • Backup
    • Manage users and access
      • Single Sign-On (SSO) in Azure
      • Manage admin users
      • Manage non-admin users
    • Logs and monitoring
      • Does Syntho collect any data?
      • Temporary data storage by application
  • Syntho API
    • Syntho REST API
Powered by GitBook
On this page

Was this helpful?

  1. Configure a Data Generation Job
  2. Configure column settings
  3. AI synthesize

Additional privacy controls

PreviousQA reportNextCross-table relationships limitations

Last updated 3 months ago

Was this helpful?

AI-powered data generation offers very high privacy levels. To maximize privacy levels with AI-powered generation, Syntho provides a further set of privacy controls:

1. Overfitting Prevention

Prevents the model from memorizing specific patterns or properties of the original data, thus enhancing data confidentiality. During the training phase, Syntho minimizes overfitting by applying a noise ratio that ensures synthetic data reflects general patterns rather than specific entries. This safeguard prevents individual data points from appearing in the synthetic dataset.

Protects the uniqueness of categorical data by substituting rare values. Rare categories, defined by a user-set threshold, are replaced with a placeholder (default: "*"). This prevents overfitting on unique, infrequent categories and protects against potential identification based on rare data points.

Removes outliers in numerical and date-time data to prevent re-identification based on extreme values. Outliers are detected and removed during the preprocessing phase, ensuring that potentially sensitive or identifiable extreme values do not appear in the synthetic data.

Limits the inclusion of unusually long sequences in subject-based data to prevent potential re-identification. Sequence lengths are capped to a threshold, filtering out excessively long sequences that could lead to confidentiality risks.

Adds random noise to synthetic values to further enhance privacy. Random noise can be injected into generated synthetic data, introducing slight variations that enhance privacy while maintaining data utility. This optional feature is configurable within .

6. Evaluation and Transparency through

Provides transparency and confidence in synthetic data quality and privacy. Syntho leverages open-source synthetic data evaluation libraries like SDMetrics to provide a transparent assessment of synthetic data quality and privacy. The platform includes an evaluation notebook that contains quality and privacy metrics, allowing you to see how your synthetic data performs against industry standards for confidentiality and utility.

Syntho QA Report
2. Rare Category Protection
3. Extreme Value Protection
5. Random Noise Injection
Advanced Settings
4. Extreme Sequence Length Protection