LogoLogo
Go to Syntho.AI
English
English
  • Welcome to Syntho
  • Overview
    • Get started
      • Syntho bootcamp
        • 1. What is Syntho?
        • 2. Introduction data anonymization
        • 3. Connectors & workspace creation
        • 4. PII scan
        • 5. Generators
          • Mockers
          • Maskers
          • AI synthesize
          • Calculated columns
          • Free text de-identification
        • 6. Referential integrity & foreign keys
        • 7. Workspace synchronization & validation
        • 8. Workspace & user management
        • 9. Large workloads​
        • 10. AI synthesis: Data pre-processing when using
      • Prerequisites
      • Sample datasets
      • Introduction to data generators
      • AI-generated synthetic data
    • Frequently asked questions
  • Setup Workspaces
    • View workspaces
    • Create a workspace
      • Connect to a database
        • PostgreSQL
        • MySQL / MariaDB
        • Oracle
        • Microsoft SQL Server
        • DB2
        • Databricks
          • Importing Data into Databricks
        • Hive
        • SAP Sybase
        • Azure Data Lake Storage (ADLS)
        • Amazon Simple Storage Service (S3)
      • Workspace modes
    • Edit a workspace
    • Duplicate a workspace
    • Transfer workspace ownership
    • Share a workspace
    • Delete a workspace
    • Workspace default settings
  • Configure a Data Generation Job
    • Configure table settings
    • Configure column settings
      • AI synthesize
        • Sequence model
          • Prepare your sequence data
        • QA report
        • Additional privacy controls
        • Cross-table relationships limitations
      • Mock
        • Text
          • Supported languages
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • Other
      • Mask
        • Text
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • UUID
      • Duplicate
      • Exclude
      • Consistent mapping
      • Calculated columns
      • Key generators
        • Differences between key generators
      • JSON de-identification
    • Manage personally identifiable information (PII)
      • Privacy dashboard
      • Discover and de-identify PII columns
        • Identify PII columns manually
        • Automatic PII discovery with PII scanner
      • Remove columns from PII list
      • Automatic PII discovery and de-identification in free text columns
      • Supported PII & PHI entities
    • Manage foreign keys
      • Foreign key inheritance
      • Add virtual foreign keys
        • Add virtual foreign keys
        • Use foreign key scanner
        • Import foreign keys via JSON
        • Export foreign keys via JSON
      • Delete foreign keys
    • Validate and synchronize workspace
    • View and adjust generation settings
  • Deploy Syntho
    • Introduction
      • Syntho architecture
      • Requirements
        • Requirements for Docker deployments
        • Requirements for Kubernetes deployments
      • Access Docker images
        • Online
        • Offline
    • Deploy Syntho using Docker
      • Preparations
      • Deploy using Docker Compose
      • Run the application
      • Manually saving logs
      • Updating the application
      • Backup
    • Deploy Syntho using Kubernetes
      • Preparations
      • Deploy Ray using Helm
        • Upgrading Ray CRDs
        • Troubleshooting
      • Deploy Syntho using Helm
      • Validate the deployment
      • Troubleshooting
      • Saving logs
      • Upgrading the applications
      • Backup
    • Manage users and access
      • Single Sign-On (SSO) in Azure
      • Manage admin users
      • Manage non-admin users
    • Logs and monitoring
      • Does Syntho collect any data?
      • Temporary data storage by application
  • Syntho API
    • Syntho REST API
Powered by GitBook
On this page
  • What is referential integrity?
  • Foreign key management in Syntho
  • Interactive guide: How to manage foreign keys

Was this helpful?

  1. Overview
  2. Get started
  3. Syntho bootcamp

6. Referential integrity & foreign keys

PreviousFree text de-identificationNext7. Workspace synchronization & validation

Last updated 21 days ago

Was this helpful?

Referential integrity ensures consistency between related tables in a relational database. In Syntho, preserving referential integrity is essential when generating test data or anonymizing production data across linked tables.

Foreign keys are used to link two tables together—for example, a patient table (with primary keys) and a medication table (with foreign keys referencing the patient IDs).


What is referential integrity?

Referential integrity ensures that relationships between tables remain valid. For example, if Patient ID 3456 exists in the Patients table, any reference to this patient in the Medications table must point to that exact ID.

In test or synthetic environments, maintaining referential integrity ensures:

  • Consistency across linked datasets

  • Valid references between primary and foreign keys

  • Reliability of test results, especially in integration testing and staging environments


Interactive guide: How to manage foreign keys

Follow the interactive guide below to manage foreign keys

Best practices

  • Define foreign keys in your source database where possible

  • Use Hash to anonymize keys while keeping relationships intact

  • Use Generate to create entirely new key structures

  • Avoid Duplicate when privacy or transformation of key values is required

Foreign key management in Syntho

Syntho supports three types of key generators to handle referential integrity:

Method
Description
When to use
When not to use

Duplicate

Copies the original key values exactly as they appear in the source data, preserving both the correlations and referential integrity between primary and foreign keys.

When it’s essential to maintain the original key values and relationships, particularly in de-identification scenarios where the data structure must be preserved without generating new keys.

Upsampling is not supported when using Duplicate, as the original keys are simply copied, not expanded. Additionally, it is not recommended when the keys are sensitive and need to be protected, as this method retains the original key values without obfuscation.

Generate

Creates new, synthetic key values that do not correspond to the original keys. It preserves only the referential integrity, but not the correlations between key columns.

Use Generate for upsampling or creating synthetic datasets where there is no need to maintain relationships with the original data. It can also be used when creating data from scratch.

The Generate function creates new keys independently of the original key order, which disrupts correlations. As a result, it is unsuitable for scenarios where maintaining the correlations and order is essential.

Hash

Converts original key values into hashed representations. Both correlations between tables and relational integrity are maintained.

Use Hash when you need to obscure the original key values, while ensuring correlations and referential integrity are preserved.

Upsampling, or situations where the original key values must be maintained for direct referencing, such as cases where exact key values are essential for business logic (e.g. country codes) or traceability in audit scenarios.

Key generators