Use Case 8: Cloud & data migration

Validate data workflows during migrations.

Use this use case when you need privacy-safe data to validate migrations. This includes scenarios where schemas and infrastructure are changing.

What problem this use case solves

Teams need to validate data generation workflows across environments. They often need repeatable runs as schemas evolve.

Classic anonymization can over-generalize values and reduce realism. It can also be slow to repeat at scale.

When to choose this use case

Pick this when you validate a migration across environments.

If you’re unsure, start with De-identify and keep generator configs identical across both environments. Validate after every schema change.

  • You compare behavior across old and new platforms.

  • You expect schema drift and need repeatable re-sync cycles.

  • You need privacy-safe data in temporary migration environments.

  • You need the same masking logic on both targets.

When to avoid this use case

Skip this when migration validation is not the target.

This setup is optimized for repeatable migration validation. You expect schema drift. You need the same generation logic on old and new platforms.

1

Prerequisites

Checklist

circle-exclamation
2

Source & destination management

Create separate workspaces per target platform or environment. Example: onprem-test and cloud-test.

Baseline rules

  • Keep the source stable. Prefer snapshots or back-ups.

  • Avoid a live production source for iterative work.

  • Keep the destination isolated. Never write into production.

  • Keep schemas aligned between source, workspace and destination.

  • Use views when you need only a subset of the original database.

Lifecycle rule of thumb

  • Keep the source connection when you expect schema changes.

  • Remove the source connection when you expect a new run only much later.

  • Revalidate after schema changes. Use Validate and synchronize workspace.

Nuances for this use case

  • Use the same source snapshot across environments. Otherwise drift looks like migration defects.

  • Keep generator configs aligned across the two workspaces. Masking differences can look like migration defects.

  • Prefer behavioral equivalence checks. Avoid row-level equality unless you truly need it.

3

Configure generators

Workspace initialization mode

Choose a workspace mode. It applies baseline generator suggestions during workspace creation.

Match the mode to your migration phase. Early phases need iteration. Later phases need locked configs.

Recommended modes for this use case:

  • De-identify when you need migration parity across many related tables and you compare behavior across platforms.

  • Mock or mask all for fast iteration when you mainly test schema compatibility and basic application flows.

  • Synthesize all only when migration validation is based on a single entity table and you’re not comparing record-level behavior.

AI-generated synthesis

Use this only when migration validation is behavioral and scoped to a single entity table. Avoid it when you need parity across many joined tables.

Example (warehouse-only validation): synthesize a migr_orders_entity_view into both old and new platforms to validate query performance and datatype behavior without requiring record-level parity.

Rule-based generation

Use this to stress migration edge cases like length truncation, nullability, and timezone boundaries. Use Calculated columns to inject boundary values.

Example (precision boundary): enforce a destination-safe numeric scale and inject a small % of boundary values.

Masking

This is the default for migration comparisons. It preserves formats and keeps multi-table behavior close to production while removing identifiers.

Example (same masking on both targets): mask email, iban, and phone_number using identical generator settings in both workspaces. Enable consistent mapping for join keys so old vs new comparisons are fair.

Hybrid

Use this when you want parity for the core schema, plus targeted stress cases.

Example (parity + boundary tests): de-identify the full relational dataset for both targets, then inject the same boundary values in both workspaces so comparisons stay fair.

Minimal configuration steps

  1. Keep both workspaces aligned (same tables, same generators).

  2. Use de-identification + masking for multi-table parity.

  3. Add boundary injections (precision, dates) with calculated columns.

  4. Validate and compare using behavioral checks (counts, nulls, distributions).

4

Handle keys and relationships (relational schemas)

If your migration validation is single-table (or you validate only a curated layer), you can skip this step.

Migration failures often show up as broken joins. Make FK behavior explicit.

Use Manage foreign keys and add virtual foreign keys when needed. Keep FK configs consistent across the two workspaces.

5

Validate and sync

Schema drift is normal during migration. Treat validation as part of every run.

Run validation after schema changes and before comparing results across platforms. Use Validate and synchronize workspace.

Compare old vs new (quick checklist)

  • Row counts per table/partition.

  • Null rates and distinct counts for key business fields.

  • Schema-level constraints that changed (nullable, lengths, precision).

chevron-rightOptional: deeper migration checkshashtag

Focus on equivalence of behavior, not identical rows.

Common migration edge cases to watch:

  • VARCHAR length truncation (destination max length < source).

  • Timestamp timezone behavior differences (TIMESTAMP vs TIMESTAMPTZ).

  • Numeric precision/scale mismatches (e.g., DECIMAL(18,2) vs FLOAT).

If you need a quick “same-ness” signal, compute checksums on a stable projection of columns in each environment and compare those checksums per table or per partition.

6

Tune generation settings

Tune for “migrate-and-validate” loops. You need stable runtime and predictable writes.

Use View and adjust generation settings and Large workloads tuning once the config is correct.

Common pitfalls & misconfigurations

Use-case specific pitfalls

  • Treating a migration run as “one-off” when the schema is still evolving.

  • Datatype mismatches between old and new platforms.

chevron-rightGeneral pitfallshashtag

These pitfalls show up in most projects:

Governance, compliance, and automation

Use-case specific recommendations

  • Run paired workspaces per target (onprem, cloud). Keep generator configs in lockstep by policy.

  • Automate side-by-side checks: row counts, null rates, distinct counts for keys, and a checksum over a stable projection.

  • Require “same snapshot” as an explicit prerequisite in the migration checklist. Otherwise comparisons are meaningless.

  • Log every schema drift event with a resync/validation run. Treat it as part of the migration timeline.

chevron-rightGeneral recommendationshashtag

Use these recommendations for most workspaces.

Ownership and change control

  • Assign a single workspace owner (data steward / privacy lead / DBA).

  • Require a ticket or change request for generator changes.

  • Duplicate a workspace before large edits. Keep the previous version as rollback.

Access control

  • Default to read-only access for source connections.

  • Restrict who can view source data in the UI.

  • Use separate workspaces per environment or audience.

Automation (baseline)

  • Use the Syntho REST API to standardize scans and runs.

  • Automate data generation not workspace configuration.

  • Keep job logs for failed runs. This reduces back-and-forth during support.

Last updated

Was this helpful?