6. Referential integrity & foreign keys
Last updated
Was this helpful?
Last updated
Was this helpful?
Referential integrity ensures consistency between related tables in a relational database. In Syntho, preserving referential integrity is essential when generating test data or anonymizing production data across linked tables.
Foreign keys are used to link two tables together—for example, a patient table (with primary keys) and a medication table (with foreign keys referencing the patient IDs).
Referential integrity ensures that relationships between tables remain valid. For example, if Patient ID 3456
exists in the Patients
table, any reference to this patient in the Medications
table must point to that exact ID.
In test or synthetic environments, maintaining referential integrity ensures:
Consistency across linked datasets
Valid references between primary and foreign keys
Reliability of test results, especially in integration testing and staging environments
Follow the interactive guide below to manage foreign keys
Define foreign keys in your source database where possible
Use Hash to anonymize keys while keeping relationships intact
Use Generate to create entirely new key structures
Avoid Duplicate when privacy or transformation of key values is required
Syntho supports three types of key generators to handle referential integrity:
Duplicate
Copies the original key values exactly as they appear in the source data, preserving both the correlations and referential integrity between primary and foreign keys.
When it’s essential to maintain the original key values and relationships, particularly in de-identification scenarios where the data structure must be preserved without generating new keys.
Upsampling is not supported when using Duplicate, as the original keys are simply copied, not expanded. Additionally, it is not recommended when the keys are sensitive and need to be protected, as this method retains the original key values without obfuscation.
Generate
Creates new, synthetic key values that do not correspond to the original keys. It preserves only the referential integrity, but not the correlations between key columns.
Use Generate for upsampling or creating synthetic datasets where there is no need to maintain relationships with the original data. It can also be used when creating data from scratch.
The Generate function creates new keys independently of the original key order, which disrupts correlations. As a result, it is unsuitable for scenarios where maintaining the correlations and order is essential.
Hash
Converts original key values into hashed representations. Both correlations between tables and relational integrity are maintained.
Use Hash when you need to obscure the original key values, while ensuring correlations and referential integrity are preserved.
Upsampling, or situations where the original key values must be maintained for direct referencing, such as cases where exact key values are essential for business logic (e.g. country codes) or traceability in audit scenarios.