# Use Case 8: Cloud & data migration

Use this use case when you need privacy-safe data to validate migrations. This includes scenarios where schemas and infrastructure are changing.

### What problem this use case solves

Teams need to validate data generation workflows across environments. They often need repeatable runs as schemas evolve.

Classic anonymization can over-generalize values and reduce realism. It can also be slow to repeat at scale.

### When to choose this use case

Pick this when you validate a migration across environments.

If you’re unsure, start with **De-identify** and keep generator configs identical across both environments. Validate after every schema change.

* You compare behavior across old and new platforms.
* You expect schema drift and need repeatable re-sync cycles.
* You need privacy-safe data in temporary migration environments.
* You need the same masking logic on both targets.

### When to avoid this use case

Skip this when migration validation is not the target.

* You need an analytics sandbox for exploration. Use [Use Case 7: Analytics Sandboxes](/overview/get-started/use-cases-and-configuration/use-case-7-analytics-sandboxes.md).
* You mainly need load testing at scale. Use [Use Case 2: Load & Stress](/overview/get-started/use-cases-and-configuration/use-case-2-load-and-stress-testing.md).
* You need end-to-end pipeline testing in a stable environment (not a platform migration). Use [Use Case 4: ETL & Data Pipeline Testing](/overview/get-started/use-cases-and-configuration/use-case-4-etl-and-data-pipeline-testing.md).
* You need ML training data. Prefer an entity-table style dataset and [AI synthesize](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md).

### Recommended Syntho configuration

This setup is optimized for **repeatable migration validation**. You expect schema drift. You need the same generation logic on old and new platforms.

{% stepper %}
{% step %}

#### Prerequisites

**Checklist**

* [ ] “Migration success” checks defined (counts, null rates, key distincts).
* [ ] Same source snapshot will be used for both targets.
* [ ] Generator config will be identical across workspaces.
* [ ] Privacy posture decided for each target environment.

{% hint style="warning" %}
Don’t compare two targets fed by two different source snapshots. Drift looks like migration defects.
{% endhint %}

* Use the [Prerequisites](/overview/get-started/prerequisites.md) checklist.
  {% endstep %}

{% step %}

#### Source & destination management

Create separate workspaces per target platform or environment. Example: `onprem-test` and `cloud-test`.

#### Baseline rules

* Keep the **source stable**. Prefer snapshots or back-ups.
* Avoid a **live production** source for iterative work.
* Keep the **destination isolated**. Never write into production.
* Keep **schemas aligned** between source, workspace and destination.
* Use **views** when you need only a subset of the original database.

#### Lifecycle rule of thumb

* Keep the source connection when you expect schema changes.
* Remove the source connection when you expect a new run only much later.
* Revalidate after schema changes. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).

**Nuances for this use case**

* Use the same source snapshot across environments. Otherwise drift looks like migration defects.
* Keep generator configs aligned across the two workspaces. Masking differences can look like migration defects.
* Prefer behavioral equivalence checks. Avoid row-level equality unless you truly need it.
* [Create a workspace](/setup-workspaces/create-a-workspace.md)
* [Workspace modes](/setup-workspaces/create-a-workspace/workspace-modes.md)
  {% endstep %}

{% step %}

#### Configure generators

**Workspace initialization mode**

Choose a [workspace mode](/setup-workspaces/create-a-workspace/workspace-modes.md). It applies baseline generator suggestions during workspace creation.

Match the mode to your migration phase. Early phases need iteration. Later phases need locked configs.

Recommended modes for this use case:

* **De-identify** when you need migration parity across many related tables and you compare behavior across platforms.
* **Mock or mask all** for fast iteration when you mainly test schema compatibility and basic application flows.
* **Synthesize all** only when migration validation is based on a single entity table and you’re not comparing record-level behavior.

**AI-generated synthesis**

Use this only when migration validation is **behavioral** and scoped to a **single entity table**. Avoid it when you need parity across many joined tables.

**Example (warehouse-only validation):** synthesize a `migr_orders_entity_view` into both old and new platforms to validate query performance and datatype behavior without requiring record-level parity.

**Rule-based generation**

Use this to stress migration edge cases like **length truncation**, **nullability**, and **timezone boundaries**. Use [Calculated columns](/configure-a-data-generation-job/configure-column-settings/calculated-columns.md) to inject boundary values.

**Example (precision boundary):** enforce a destination-safe numeric scale and inject a small % of boundary values.

```excel-formula
// New column: MIG_BOUNDARY (≈0.5% of rows)
RAND() < 0.005
```

```excel-formula
// New column: order_amount_2dp (common migration pitfall: precision/scale)
IF([MIG_BOUNDARY], ROUND([order_amount], 2), [order_amount])
```

**Masking**

This is the default for migration comparisons. It preserves **formats** and keeps multi-table behavior close to production while removing identifiers.

**Example (same masking on both targets):** mask `email`, `iban`, and `phone_number` using identical generator settings in both workspaces. Enable consistent mapping for join keys so old vs new comparisons are fair.

**Hybrid**

Use this when you want parity for the core schema, plus targeted stress cases.

**Example (parity + boundary tests):** de-identify the full relational dataset for both targets, then inject the same boundary values in both workspaces so comparisons stay fair.

```excel-formula
// Override: created_at (inject migration boundary dates)
IF([MIG_BOUNDARY], DATE(YEAR(TODAY()), 3, 31), [created_at])
```

**Minimal configuration steps**

1. Keep both workspaces aligned (same tables, same generators).
2. Use de-identification + masking for multi-table parity.
3. Add boundary injections (precision, dates) with calculated columns.
4. Validate and compare using behavioral checks (counts, nulls, distributions).

* [Automatic PII discovery with PII scanner](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md)
* [Manage personally identifiable information (PII)](/configure-a-data-generation-job/manage-personally-identifiable-information-pii.md)
  {% endstep %}

{% step %}

#### Handle keys and relationships (relational schemas)

If your migration validation is **single-table** (or you validate only a curated layer), you can skip this step.

Migration failures often show up as broken joins. Make FK behavior explicit.

Use [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md) and add [virtual foreign keys](/configure-a-data-generation-job/manage-foreign-keys/add-virtual-foreign-keys/add-virtual-foreign-keys.md) when needed. Keep FK configs consistent across the two workspaces.

* [Key generators](/configure-a-data-generation-job/configure-column-settings/key-generators.md)
  {% endstep %}

{% step %}

#### Validate and sync

Schema drift is normal during migration. Treat validation as part of every run.

Run validation after schema changes and before comparing results across platforms. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).

* [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md)

**Compare old vs new (quick checklist)**

* Row counts per table/partition.
* Null rates and distinct counts for key business fields.
* Schema-level constraints that changed (nullable, lengths, precision).

<details>

<summary>Optional: deeper migration checks</summary>

Focus on equivalence of behavior, not identical rows.

Common migration edge cases to watch:

* `VARCHAR` length truncation (destination max length < source).
* Timestamp timezone behavior differences (`TIMESTAMP` vs `TIMESTAMPTZ`).
* Numeric precision/scale mismatches (e.g., `DECIMAL(18,2)` vs `FLOAT`).

If you need a quick “same-ness” signal, compute checksums on a stable projection of columns in each environment and compare those checksums per table or per partition.

</details>
{% endstep %}

{% step %}

#### Tune generation settings

Tune for “migrate-and-validate” loops. You need stable runtime and predictable writes.

Use [View and adjust generation settings](/configure-a-data-generation-job/generation-and-validation/view-and-adjust-generation-settings.md) and [Large workloads](/overview/get-started/syntho-bootcamp/9.-large-workloads.md) tuning once the config is correct.

* [View and adjust generation settings](/configure-a-data-generation-job/generation-and-validation/view-and-adjust-generation-settings.md)
* [Large workloads](/overview/get-started/syntho-bootcamp/9.-large-workloads.md)
  {% endstep %}
  {% endstepper %}

### Common pitfalls & misconfigurations

#### Use-case specific pitfalls

* Treating a migration run as “one-off” when the schema is still evolving.
* Datatype mismatches between old and new platforms.
  * Validate schema alignment early. See [Prerequisites](/overview/get-started/prerequisites.md).

<details>

<summary>General pitfalls</summary>

These pitfalls show up in most projects:

* Running full-scale jobs before a small validation run.
* Skipping workspace validation/sync after schema changes. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).
* Breaking relational integrity (missing PK/FK setup, missing foreign keys, missing virtual foreign keys). Start with [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md) and [virtual foreign keys](/configure-a-data-generation-job/manage-foreign-keys/add-virtual-foreign-keys/add-virtual-foreign-keys.md).
* Leaving sensitive columns on [**Duplicate**](/configure-a-data-generation-job/configure-column-settings/duplicate.md), or trusting the [PII scan](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md) without reviewing false positives/negatives.
* Overusing [**Consistent mapping**](/configure-a-data-generation-job/configure-column-settings/consistent-mapping.md) (it slows down data generation and increases linkability).

</details>

### Governance, compliance, and automation

#### Use-case specific recommendations

* Run paired workspaces per target (`onprem`, `cloud`). Keep generator configs in lockstep by policy.
* Automate side-by-side checks: row counts, null rates, distinct counts for keys, and a checksum over a stable projection.
* Require “same snapshot” as an explicit prerequisite in the migration checklist. Otherwise comparisons are meaningless.
* Log every schema drift event with a resync/validation run. Treat it as part of the migration timeline.

<details>

<summary>General recommendations</summary>

Use these recommendations for most workspaces.

#### Ownership and change control

* Assign a single **workspace owner** (data steward / privacy lead / DBA).
* Require a ticket or change request for generator changes.
* Duplicate a workspace before large edits. Keep the previous version as rollback.

#### Access control

* Default to **read-only** access for source connections.
* Restrict **who can view source data** in the UI.
* Use separate workspaces per environment or audience.

#### Automation (baseline)

* Use the [Syntho REST API](/syntho-api/syntho-rest-api.md) to standardize scans and runs.
* Automate data generation not workspace configuration.
* Keep job logs for failed runs. This reduces back-and-forth during support.

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.syntho.ai/overview/get-started/use-cases-and-configuration/use-case-8-cloud-and-data-migration.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.