# Use Case 1: Application & API testing

Use this use case when you need privacy-safe, production-like data for application and API testing. You keep realistic system behavior. You avoid exposing PII/PHI in non-production.

### What problem this use case solves

Teams need stable test data for UI flows, integration tests, and API tests. Production copies are often blocked by privacy regulations. Manual test data creation is slow and not representative.

### When to choose this use case

Pick this when you need production-like test data without privacy risks:

* You test applications and APIs in `dev`, `test`, or `accept`.
* Your tests need valid formats (email, UUID, IBAN, dates).
* Your tests need stable joins across tables and refreshes.
* Your schema changes over time and you need reproducible failures.

### When to avoid this use case

Skip this use case when functional testing is not your main goal:

* You mainly need load or stress testing at huge volumes. Use [Use Case 2: Load & Stress](/overview/get-started/use-cases-and-configuration/use-case-2-load-and-stress-testing.md).
* You do not have production data available yet. Use [Use Case 5: Feature Development](/overview/get-started/use-cases-and-configuration/use-case-5-feature-development.md).
* You need maximum protection for external data sharing. Use [Use Case 9: Data Sharing & Monetization](/overview/get-started/use-cases-and-configuration/use-case-9-data-sharing-and-monetization.md).

### Recommended Syntho configuration

This setup is optimized for **DTAP-style application and API testing**. You keep schemas and relationships intact. You replace identifiers and free text that can leak PII/PHI

{% stepper %}
{% step %}

#### Prerequisites

* Use the [Prerequisites](/overview/get-started/prerequisites.md) checklist.

**Checklist**

* [ ] Source is **non-production** (snapshot/copy), not live production.
* [ ] Destination is isolated (separate DB or schema).
* [ ] You know which columns must change (PII/PHI + quasi-identifiers).
* [ ] You have a small smoke test to validate the output.

{% hint style="warning" %}
Never write generated data back into production. Keep the destination isolated.
{% endhint %}
{% endstep %}

{% step %}

#### Source & destination management

Create one workspace per non-production environment. Examples: `dev`, `test`, `accept`.

#### Baseline rules

* Keep the **source stable**. Prefer snapshots or back-ups.
* Avoid a **live production** source for iterative work.
* Keep the **destination isolated**. Never write into production.
* Keep **schemas aligned** between source, workspace and destination.
* Use **views** when you need only a subset of the original database.

#### Lifecycle rule of thumb

* Keep the source connection when you expect schema changes.
* Remove the source connection when you expect a new run only much later.
* Revalidate after schema changes. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).

**Nuances for this use case**

* Use the **exact production schema** (types, constraints, indexes). Tests often fail on subtle schema drift.
* Avoid a shared destination schemas. Collisions between teams create flaky tests.
* Dropped join keys break joins and consistent mapping.
  {% endstep %}

{% step %}

#### Configure generators

**Workspace initialization mode**

Choose a [workspace mode](/setup-workspaces/create-a-workspace/workspace-modes.md). It applies baseline generator suggestions during workspace creation. For DTAP, you start from an existing schema and data.

Recommended modes for this use case:

* **De-identify** when you have a production-like copy and only need to replace PII/PHI while keeping non-PII intact.
* **From scratch** when you only test a few tables and want full manual control.

**AI-generated synthesis**

Use this when you have indirect identifiers in your data that need masking. For example, a column with gender, age, and weight. AI synthesis will generate new values while preserving correlations.

**Rule-based generation**

Use this when tests must hit **explicit branches** and **edge cases**. Use [Calculated columns](/configure-a-data-generation-job/configure-column-settings/calculated-columns.md) to inject controlled exceptions.

**Example (negative API tests with a stable base):** create an `EDGE_FLAG` and only corrupt one validator field on those rows.

```excel-formula
// New column: EDGE_FLAG (≈1% of rows)
RAND() < 0.01
```

```excel-formula
// Override: postal_code (break format only for edge rows)
IF([EDGE_FLAG], "XX-INVALID", [postal_code])
```

**Masking**

Use this when you need **format-preserving replacements** and **stable keys/relationships** across tables and refreshes. This is the default workhorse for API/UI validators.

**Example:** apply **Mask → Email** on `customers.email`, **Mask → UUID** on `users.external_id`, and enable **Consistent mapping** for shared identifiers so joins and app flows stay intact

**Hybrid**

Use this when you need **relational correctness** plus **deterministic business logic**. Hybrid is the default for serious app testing. It maps to the patterns in [Example data generation scenarios](/overview/get-started/syntho-bootcamp/example-data-generation-scenarios.md).

**Example (deterministic relations + safe identifiers):** enforce “gender → name style” while keeping joins intact.

1. Keep PK/FK behavior stable with key generators and consistent mapping where needed.
2. Mask/replace direct identifiers (emails, phone numbers) for validator safety.
3. Use calculated columns to enforce deterministic relations inside a table.

```excel-formula
// New column: first_name_generated (deterministic by gender)
IFS(
  [gender] = "M", MOCK_FIRST_NAME_MALE,
  [gender] = "F", MOCK_FIRST_NAME_FEMALE,
  TRUE,          MOCK_FIRST_NAME
)
```

**Minimal configuration steps**

1. Run a [PII scan](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md).
2. Set identifier columns to **Mask** or **Mock.**
3. Enable [Consistent mapping](/configure-a-data-generation-job/configure-column-settings/consistent-mapping.md) only for shared identifiers.
4. Use [Free text de-identification](/overview/get-started/syntho-bootcamp/5.-generators/free-text-de-identification.md) only on the few text columns that need it.

{% hint style="warning" %}
Consistent mapping increases linkability. Enable it only for columns needed for joins and flows.
{% endhint %}

* [Automatic PII discovery with PII scanner](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md)
* [Manage personally identifiable information (PII)](/configure-a-data-generation-job/manage-personally-identifiable-information-pii.md)

<details>

<summary>Concrete example: from PII scan output to generator choices</summary>

Example workspace name: `test-api-contracts` (stable dataset refreshed per sprint).

Example PII scan findings you should expect to review (illustrative):

```
customers.email            -> EMAIL_ADDRESS (confidence 0.97)
customers.phone_number     -> PHONE_NUMBER  (confidence 0.93)
customers.date_of_birth    -> DATE_OF_BIRTH (confidence 0.88)
customers.notes            -> FREE_TEXT      (confidence 0.74)
orders.billing_iban        -> IBAN          (confidence 0.96)
```

Example generator mapping for API-friendly realism:

* `customers.email`: **Mask → Email** (keeps a valid email structure for validators).
* `customers.phone_number`: **Mask → Phone** (keeps country/format patterns).
* `orders.billing_iban`: **Mask → IBAN** (format-preserving, prevents checksum failures).
* `customers.date_of_birth`: **Mock** (often safer than masking when age distribution is not tested).
* `customers.notes`: **Free text de-identification** (only on this column, not the whole table).

</details>
{% endstep %}

{% step %}

#### Handle keys and relationships (relational schemas)

If your test dataset is a **single table** with no joins, you can skip this step. Most app and API test failures are broken PK/FK chains.

Verify [foreign key inheritance](/configure-a-data-generation-job/manage-foreign-keys/foreign-key-inheritance.md). Add [virtual foreign keys](/configure-a-data-generation-job/manage-foreign-keys/add-virtual-foreign-keys/add-virtual-foreign-keys.md) where the database doesn’t define them.
{% endstep %}

{% step %}

#### Validate and sync

Validate source via [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md). Run a small smoke suite. Include at least one API happy path.

Re-run validation whenever schemas drift. DTAP schemas drift often.
{% endstep %}

{% step %}

#### Tune generation settings

Tune for repeatability, not just speed. Keep settings stable across runs. This makes test failures debuggable. This also reduces flaky tests.

Use [View and adjust generation settings](/configure-a-data-generation-job/generation-and-validation/view-and-adjust-generation-settings.md) and [Large workloads](/overview/get-started/syntho-bootcamp/9.-large-workloads.md) tuning once the config is correct.
{% endstep %}
{% endstepper %}

### Common pitfalls & misconfigurations

#### Use case-specific pitfalls

* Masking values but breaking API validators (emails, UUIDs, date formats).
* De-identifying large text fields without scoping. It can increase runtime. See [Free text de-identification](/overview/get-started/syntho-bootcamp/5.-generators/free-text-de-identification.md).
* Tuning only at full scale. Start small, then refine configuration.

<details>

<summary>General pitfalls</summary>

These pitfalls show up in most projects:

* Running full-scale jobs before a small validation run.
* Skipping workspace validation/sync after schema changes. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).
* Breaking relational integrity (missing PK/FK setup, missing foreign keys, missing virtual foreign keys). Start with [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md) and [virtual foreign keys](/configure-a-data-generation-job/manage-foreign-keys/add-virtual-foreign-keys/add-virtual-foreign-keys.md).
* Leaving sensitive columns on [**Duplicate**](/configure-a-data-generation-job/configure-column-settings/duplicate.md), or trusting the [PII scan](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md) without reviewing false positives/negatives.
* Overusing [**Consistent mapping**](/configure-a-data-generation-job/configure-column-settings/consistent-mapping.md) (it slows down data generation and increases linkability).

</details>

### Governance, compliance, and automation

#### Use case-specific recommendations

* Use a stable seed per environment (`dev`, `test`, `accept`) for reproducible failures.
* Automate refreshes per environment.
* Treat FK/virtual FK changes as breaking changes. Require sync/validation after schema migrations before testing.

<details>

<summary>General recommendations</summary>

Use these recommendations for most workspaces.

#### Ownership and change control

* Assign a single **workspace owner** (data steward / privacy lead / DBA).
* Require a ticket or change request for generator changes.
* Duplicate a workspace before large edits. Keep the previous version as rollback.

#### Access control

* Default to **read-only** access for source connections.
* Restrict **who can view source data** in the UI.
* Use separate workspaces per environment or audience.

#### Automation (baseline)

* Use the [Syntho REST API](/syntho-api/syntho-rest-api.md) to standardize scans and runs.
* Automate data generation not workspace configuration.
* Keep job logs for failed runs. This reduces back-and-forth during support.

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.syntho.ai/overview/get-started/use-cases-and-configuration/use-case-1-application-and-api-testing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
