Table relationships

When using Syntho's AI-powered synthetic data generation, the best possible data utility requiring the least amount of resources, it is best practice to prepare your data as a single entity table. If you must generate data for multiple tables, however, Syntho offers three options:

  1. Synthesize individual tables with automatic key matching: To ensure hardware resources stay within reasonable limits, by default, Syntho synthesizes each table separately from another, and afterwards generates new keys for each table. This method doesn't maintain inherent relationships between tables (i.e. relationships between key and non-key columns). For example, a Pregnancy diagnosis in the synthetic Diagnosis table could point to a Male patient in the synthetic Patients table. Nonetheless, it upholds technical referential integrity by generating new keys, ensuring each foreign key corresponds to an existing primary key in another table. If you must preserve cross-table relationships, you have three options: convert the relevant information from the Diagnosis table and the Patients table into a single entity table and then synthesize, synthesize using Syntho's sequence model (up to 2 tables), or apply PII de-identification (unlimited tables).

  2. Synthesize using sequence model: If you want to preserve cross-table relationships between 2 related tables, where you also preserve relationships between key and non-key columns, you can use Syntho’s synthetic data sequence model. This Syntho feature is especially valuable if you want to synthesize sequence data (e.g., time series or trajectories).

  3. PII de-identification: Other than synthesization, the Syntho platform can be used to de-identify your PII columns with the help of the Syntho PII scanner and Syntho mockers and leave all remaining columns in-tact. This approach has the benefit of preserving cross-table relationships and is most popular for use cases related to testing & development.

See below a summary of the key approaches Syntho offers to preserve table relationships.

ApproachCross-table relationshipsReferential integrityUpsamplingPreserve sequence informationTable limit

Synthesize individual tables with automatic key matching

Unlimited (without preserving cross-table relations)

Synthesize using sequence model

2

PII de-identification

Unlimited

Here are the articles for your reference:

Last updated