Use Case 8: Cloud & data migration
Validate data workflows during migrations.
Use this use case when you need privacy-safe data to validate migrations. This includes scenarios where schemas and infrastructure are changing.
What problem this use case solves
Teams need to validate data generation workflows across environments. They often need repeatable runs as schemas evolve.
Classic anonymization can over-generalize values and reduce realism. It can also be slow to repeat at scale.
When to choose this use case
Pick this when you validate a migration across environments.
If you’re unsure, start with De-identify and keep generator configs identical across both environments. Validate after every schema change.
You compare behavior across old and new platforms.
You expect schema drift and need repeatable re-sync cycles.
You need privacy-safe data in temporary migration environments.
You need the same masking logic on both targets.
When to avoid this use case
Skip this when migration validation is not the target.
You need an analytics sandbox for exploration. Use Use Case 7: Analytics Sandboxes.
You mainly need load testing at scale. Use Use Case 2: Load & Stress.
You need end-to-end pipeline testing in a stable environment (not a platform migration). Use Use Case 4: ETL & Data Pipeline Testing.
You need ML training data. Prefer an entity-table style dataset and AI synthesize.
Recommended Syntho configuration
This setup is optimized for repeatable migration validation. You expect schema drift. You need the same generation logic on old and new platforms.
Prerequisites
Checklist
Don’t compare two targets fed by two different source snapshots. Drift looks like migration defects.
Use the Prerequisites checklist.
Source & destination management
Create separate workspaces per target platform or environment. Example: onprem-test and cloud-test.
Baseline rules
Keep the source stable. Prefer snapshots or back-ups.
Avoid a live production source for iterative work.
Keep the destination isolated. Never write into production.
Keep schemas aligned between source, workspace and destination.
Use views when you need only a subset of the original database.
Lifecycle rule of thumb
Keep the source connection when you expect schema changes.
Remove the source connection when you expect a new run only much later.
Revalidate after schema changes. Use Validate and synchronize workspace.
Nuances for this use case
Use the same source snapshot across environments. Otherwise drift looks like migration defects.
Keep generator configs aligned across the two workspaces. Masking differences can look like migration defects.
Prefer behavioral equivalence checks. Avoid row-level equality unless you truly need it.
Configure generators
Workspace initialization mode
Choose a workspace mode. It applies baseline generator suggestions during workspace creation.
Match the mode to your migration phase. Early phases need iteration. Later phases need locked configs.
Recommended modes for this use case:
De-identify when you need migration parity across many related tables and you compare behavior across platforms.
Mock or mask all for fast iteration when you mainly test schema compatibility and basic application flows.
Synthesize all only when migration validation is based on a single entity table and you’re not comparing record-level behavior.
AI-generated synthesis
Use this only when migration validation is behavioral and scoped to a single entity table. Avoid it when you need parity across many joined tables.
Example (warehouse-only validation): synthesize a migr_orders_entity_view into both old and new platforms to validate query performance and datatype behavior without requiring record-level parity.
Rule-based generation
Use this to stress migration edge cases like length truncation, nullability, and timezone boundaries. Use Calculated columns to inject boundary values.
Example (precision boundary): enforce a destination-safe numeric scale and inject a small % of boundary values.
Masking
This is the default for migration comparisons. It preserves formats and keeps multi-table behavior close to production while removing identifiers.
Example (same masking on both targets): mask email, iban, and phone_number using identical generator settings in both workspaces. Enable consistent mapping for join keys so old vs new comparisons are fair.
Hybrid
Use this when you want parity for the core schema, plus targeted stress cases.
Example (parity + boundary tests): de-identify the full relational dataset for both targets, then inject the same boundary values in both workspaces so comparisons stay fair.
Minimal configuration steps
Keep both workspaces aligned (same tables, same generators).
Use de-identification + masking for multi-table parity.
Add boundary injections (precision, dates) with calculated columns.
Validate and compare using behavioral checks (counts, nulls, distributions).
Handle keys and relationships (relational schemas)
If your migration validation is single-table (or you validate only a curated layer), you can skip this step.
Migration failures often show up as broken joins. Make FK behavior explicit.
Use Manage foreign keys and add virtual foreign keys when needed. Keep FK configs consistent across the two workspaces.
Validate and sync
Schema drift is normal during migration. Treat validation as part of every run.
Run validation after schema changes and before comparing results across platforms. Use Validate and synchronize workspace.
Compare old vs new (quick checklist)
Row counts per table/partition.
Null rates and distinct counts for key business fields.
Schema-level constraints that changed (nullable, lengths, precision).
Optional: deeper migration checks
Focus on equivalence of behavior, not identical rows.
Common migration edge cases to watch:
VARCHARlength truncation (destination max length < source).Timestamp timezone behavior differences (
TIMESTAMPvsTIMESTAMPTZ).Numeric precision/scale mismatches (e.g.,
DECIMAL(18,2)vsFLOAT).
If you need a quick “same-ness” signal, compute checksums on a stable projection of columns in each environment and compare those checksums per table or per partition.
Tune generation settings
Tune for “migrate-and-validate” loops. You need stable runtime and predictable writes.
Use View and adjust generation settings and Large workloads tuning once the config is correct.
Common pitfalls & misconfigurations
Use-case specific pitfalls
Treating a migration run as “one-off” when the schema is still evolving.
Datatype mismatches between old and new platforms.
Validate schema alignment early. See Prerequisites.
General pitfalls
These pitfalls show up in most projects:
Running full-scale jobs before a small validation run.
Skipping workspace validation/sync after schema changes. Use Validate and synchronize workspace.
Breaking relational integrity (missing PK/FK setup, missing foreign keys, missing virtual foreign keys). Start with Manage foreign keys and virtual foreign keys.
Overusing Consistent mapping (it slows down data generation and increases linkability).
Governance, compliance, and automation
Use-case specific recommendations
Run paired workspaces per target (
onprem,cloud). Keep generator configs in lockstep by policy.Automate side-by-side checks: row counts, null rates, distinct counts for keys, and a checksum over a stable projection.
Require “same snapshot” as an explicit prerequisite in the migration checklist. Otherwise comparisons are meaningless.
Log every schema drift event with a resync/validation run. Treat it as part of the migration timeline.
General recommendations
Use these recommendations for most workspaces.
Ownership and change control
Assign a single workspace owner (data steward / privacy lead / DBA).
Require a ticket or change request for generator changes.
Duplicate a workspace before large edits. Keep the previous version as rollback.
Access control
Default to read-only access for source connections.
Restrict who can view source data in the UI.
Use separate workspaces per environment or audience.
Automation (baseline)
Use the Syntho REST API to standardize scans and runs.
Automate data generation not workspace configuration.
Keep job logs for failed runs. This reduces back-and-forth during support.
Last updated
Was this helpful?

