Cross-table relationships limitations

When using AI-powered data generation, for the best possible data utility requiring the least amount of resources, it is recommended to prepare your data as a single entity table. If you want to use it for more than one table, you have the following options where each has its limitations:

  1. Synthesize individual tables with automatic key matching: By default, Syntho synthesizes each table separately from another, and afterwards generates new keys for each table. In terms of table relationships, this approach upholds referential integrity by generating new keys, ensuring each foreign key corresponds to an existing primary key in another table. However, cross-table correlations aren't preserved. For example, a Pregnancy diagnosis in the synthetic Diagnosis table could point to a Male patient in the synthetic Patients table. If you must preserve cross-table relationships, you have three options: convert the relevant information from the Diagnosis table and the Patients table into a single entity table and then synthesize, synthesize using Syntho's sequence model (up to 2 tables), or apply de-identification (unlimited tables).

  2. Synthesize using sequence model: If you want to preserve cross-table relationships between 2 related tables, where you also preserve relationships between key and non-key columns, you can use Syntho’s synthetic data sequence model. This Syntho feature is especially valuable if you want to synthesize sequence data (e.g., time series or trajectories).

Approach
Cross-table correlations
Referential integrity
Preserve sequence information
Table limit

Synthesize individual tables with automatic key matching

Unlimited

Synthesize using sequence model

2

Last updated

Was this helpful?