LogoLogo
Go to Syntho.AI
English
English
  • Welcome to Syntho
  • Overview
    • Get started
      • Syntho bootcamp
        • 1. What is Syntho?
        • 2. Introduction data anonymization
        • 3. Connectors & workspace creation
        • 4. PII scan
        • 5. Generators
          • Mockers
          • Maskers
          • AI synthesize
          • Calculated columns
          • Free text de-identification
        • 6. Referential integrity & foreign keys
        • 7. Workspace synchronization & validation
        • 8. Workspace & user management
        • 9. Large workloads​
        • 10. Data pre-processing
        • 11. Continuous Success
      • Prerequisites
      • Sample datasets
      • Introduction to data generators
    • Frequently asked questions
  • Setup Workspaces
    • View workspaces
    • Create a workspace
      • Connect to a database
        • PostgreSQL
        • MySQL / MariaDB
        • Oracle
        • Microsoft SQL Server
        • DB2
        • Databricks
          • Importing Data into Databricks
        • Hive
        • SAP Sybase
        • Azure Data Lake Storage (ADLS)
        • Amazon Simple Storage Service (S3)
      • Workspace modes
    • Edit a workspace
    • Duplicate a workspace
    • Transfer workspace ownership
    • Share a workspace
    • Delete a workspace
    • Workspace default settings
  • Configure a Data Generation Job
    • Configure table settings
    • Configure column settings
      • AI synthesize
        • Sequence model
          • Prepare your sequence data
        • QA report
        • Additional privacy controls
        • Cross-table relationships limitations
      • Mockers
        • Text
          • Supported languages
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • Other
      • Mask
        • Text
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • UUID
      • Duplicate
      • Exclude
      • Consistent mapping
      • Calculated columns
      • Key generators
        • Differences between key generators
      • JSON de-identification
    • Manage personally identifiable information (PII)
      • Privacy dashboard
      • Discover and de-identify PII columns
        • Identify PII columns manually
        • Automatic PII discovery with PII scanner
      • Remove columns from PII list
      • Automatic PII discovery and de-identification in free text columns
      • Supported PII & PHI entities
    • Manage foreign keys
      • Foreign key inheritance
      • Add virtual foreign keys
        • Add virtual foreign keys
        • Use foreign key scanner
        • Import foreign keys via JSON
        • Export foreign keys via JSON
      • Delete foreign keys
    • Validate and Synchronize workspace
    • View and adjust generation settings
  • Deploy Syntho
    • Introduction
      • Syntho architecture
      • Requirements
        • Requirements for Docker deployments
        • Requirements for Kubernetes deployments
      • Access Docker images
        • Online
        • Offline
    • Deploy Syntho using Docker
      • Preparations
      • Deploy using Docker Compose
      • Run the application
      • Manually saving logs
      • Updating the application
    • Deploy Syntho using Kubernetes
      • Preparations
      • Deploy Ray using Helm
        • Upgrading Ray CRDs
        • Troubleshooting
      • Deploy Syntho using Helm
      • Validate the deployment
      • Troubleshooting
      • Saving logs
      • Upgrading the applications
    • Manage users and access
      • Single Sign-On (SSO) in Azure
      • Manage admin users
      • Manage non-admin users
    • Logs and monitoring
      • Does Syntho collect any data?
      • Temporary data storage by application
  • Syntho API
    • Syntho REST API
Powered by GitBook
On this page
  • Enable/Disable QA Report
  • Access to Quality report
  • Synthetic Quality Summary
  • Privacy Configuration

Was this helpful?

  1. Configure a Data Generation Job
  2. Configure column settings
  3. AI synthesize

QA report

Coming Soon

The QA Report in Syntho provides a thorough assessment of the privacy and utility of your synthetic data, ensuring that it meets industry standards for quality and privacy. This section outlines key metrics that help you evaluate the statistical integrity and privacy compliance of generated synthetic data, enabling you to make data-driven decisions with confidence.

Enable/Disable QA Report

If you would like to enable or disable QA report, click on Generate button on your workspace and after validating source and destination databases, go to the Advanced Generation Settings to enable or disable QA Report generation. By default, it is enabled.

Access to Quality report

Open your workspace and go to Jobs section, select any job to see Job Summary. The View Quality Report option is available on the Job Summary panel for columns that use AI-powered generation, allowing you to quickly access QA insights.

Synthetic Quality Summary

  1. Row and Column Counts: The report includes row and column counts for each table, ensuring completeness and consistency across datasets.

  2. Summary Utility Score: Syntho calculates an overall utility score that combines three core metrics (Column Correlation, PCA, and Column Distribution Scores) into a single figure from 0% to 100%. This score reflects how well the synthetic data maintains the statistical properties of the original data, helping you evaluate its suitability for analysis.

  3. Column Correlation Score: This score assesses the stability of field correlations by calculating the average absolute difference between pairwise field correlations in the original and synthetic data. A lower score indicates stronger stability and alignment with the original data.

  4. PCA Score: Principal Component Analysis (PCA) measures the fidelity of deeper, multi-field distributions within the synthetic data. This score compares the principal components of the original and synthetic data to assess structural similarity, making it particularly useful for machine learning applications.

  5. Column Distribution Score: Using Jensen-Shannon Distance, this score measures how closely the field distributions in the synthetic data mirror those in the original data. A lower average score indicates greater alignment, which reflects stable distribution patterns across fields.

Privacy Configuration

  1. Duplicate Row Count: This metric shows the number of duplicate rows in your synthetic data relative to the original, helping ensure the synthetic data does not inadvertently reproduce real data records.

  2. Rare Category Protection: The report shows the number of columns where rare categories have been protected by a user-defined threshold (default <10 occurrences), enhancing privacy.

  3. Overfitting Prevention: Syntho checks for overfitting by evaluating the "sample noise ratio" (recommended to be < 0.001). This ensures that synthetic data maintains privacy while still being statistically representative of the original data.

  4. Privacy Protection Score: Syntho provides an overall privacy score (High, Normal, Low), based on duplicate row counts, rare category protection, and overfitting prevention, giving you a clear view of the privacy level.

PreviousPrepare your sequence dataNextAdditional privacy controls

Last updated 2 months ago

Was this helpful?