QA Report

Coming Soon

The QA Report in Syntho provides a thorough assessment of the privacy and utility of your synthetic data, ensuring that it meets industry standards for quality and privacy. This section outlines key metrics that help you evaluate the statistical integrity and privacy compliance of generated synthetic data, enabling you to make data-driven decisions with confidence.

Enable/Disable QA Report

If you would like to enable or disable QA report, click on Generate button on your workspace and after validating source and destination databases, go to the Advanced Generation Settings to enable or disable QA Report generation. By default, it is enabled.

Access to Quality report

Open your workspace and go to Jobs section, select any job to see Job Summary. View Quality Report option is available on the Job Summary panel for datasets in Synthesize mode. This allows you to quickly access QA insights.

Synthetic Quality Summary

  1. Row and Column Counts: The report includes row and column counts for each table, ensuring completeness and consistency across datasets.

  2. Summary Utility Score: Syntho calculates an overall utility score that combines three core metrics (Column Correlation, PCA, and Column Distribution Scores) into a single figure from 0% to 100%. This score reflects how well the synthetic data maintains the statistical properties of the original data, helping you evaluate its suitability for analysis.

  3. Column Correlation Score: This score assesses the stability of field correlations by calculating the average absolute difference between pairwise field correlations in the original and synthetic data. A lower score indicates stronger stability and alignment with the original data.

  4. PCA Score: Principal Component Analysis (PCA) measures the fidelity of deeper, multi-field distributions within the synthetic data. This score compares the principal components of the original and synthetic data to assess structural similarity, making it particularly useful for machine learning applications.

  5. Column Distribution Score: Using Jensen-Shannon Distance, this score measures how closely the field distributions in the synthetic data mirror those in the original data. A lower average score indicates greater alignment, which reflects stable distribution patterns across fields.

Privacy Configuration

  1. Duplicate Row Count: This metric shows the number of duplicate rows in your synthetic data relative to the original, helping ensure the synthetic data does not inadvertently reproduce real data records.

  2. Rare Category Protection: The report shows the number of columns where rare categories have been protected by a user-defined threshold (default <10 occurrences), enhancing privacy.

  3. Overfitting Prevention: Syntho checks for overfitting by evaluating the "sample noise ratio" (recommended to be < 0.001). This ensures that synthetic data maintains privacy while still being statistically representative of the original data.

  4. Privacy Protection Score: Syntho provides an overall privacy score (High, Normal, Low), based on duplicate row counts, rare category protection, and overfitting prevention, giving you a clear view of the privacy level.

Last updated