Use Case 1: Application & API testing

Privacy-safe, production-like test data for application and API tests in non-production.

Use this use case when you need privacy-safe, production-like data for application and API testing. You keep realistic system behavior. You avoid exposing PII/PHI in non-production.

What problem this use case solves

Teams need stable test data for UI flows, integration tests, and API tests. Production copies are often blocked by privacy regulations. Manual test data creation is slow and not representative.

When to choose this use case

Pick this when you need production-like test data without privacy risks:

  • You test applications and APIs in dev, test, or accept.

  • Your tests need valid formats (email, UUID, IBAN, dates).

  • Your tests need stable joins across tables and refreshes.

  • Your schema changes over time and you need reproducible failures.

When to avoid this use case

Skip this use case when functional testing is not your main goal:

This setup is optimized for DTAP-style application and API testing. You keep schemas and relationships intact. You replace identifiers and free text that can leak PII/PHI

1

Prerequisites

Checklist

circle-exclamation
2

Source & destination management

Create one workspace per non-production environment. Examples: dev, test, accept.

Baseline rules

  • Keep the source stable. Prefer snapshots or back-ups.

  • Avoid a live production source for iterative work.

  • Keep the destination isolated. Never write into production.

  • Keep schemas aligned between source, workspace and destination.

  • Use views when you need only a subset of the original database.

Lifecycle rule of thumb

  • Keep the source connection when you expect schema changes.

  • Remove the source connection when you expect a new run only much later.

  • Revalidate after schema changes. Use Validate and synchronize workspace.

Nuances for this use case

  • Use the exact production schema (types, constraints, indexes). Tests often fail on subtle schema drift.

  • Avoid a shared destination schemas. Collisions between teams create flaky tests.

  • Dropped join keys break joins and consistent mapping.

3

Configure generators

Workspace initialization mode

Choose a workspace mode. It applies baseline generator suggestions during workspace creation. For DTAP, you start from an existing schema and data.

Recommended modes for this use case:

  • De-identify when you have a production-like copy and only need to replace PII/PHI while keeping non-PII intact.

  • From scratch when you only test a few tables and want full manual control.

AI-generated synthesis

Use this when you have indirect identifiers in your data that need masking. For example, a column with gender, age, and weight. AI synthesis will generate new values while preserving correlations.

Rule-based generation

Use this when tests must hit explicit branches and edge cases. Use Calculated columns to inject controlled exceptions.

Example (negative API tests with a stable base): create an EDGE_FLAG and only corrupt one validator field on those rows.

Masking

Use this when you need format-preserving replacements and stable keys/relationships across tables and refreshes. This is the default workhorse for API/UI validators.

Example: apply Mask → Email on customers.email, Mask → UUID on users.external_id, and enable Consistent mapping for shared identifiers so joins and app flows stay intact

Hybrid

Use this when you need relational correctness plus deterministic business logic. Hybrid is the default for serious app testing. It maps to the patterns in Example data generation scenarios.

Example (deterministic relations + safe identifiers): enforce “gender → name style” while keeping joins intact.

  1. Keep PK/FK behavior stable with key generators and consistent mapping where needed.

  2. Mask/replace direct identifiers (emails, phone numbers) for validator safety.

  3. Use calculated columns to enforce deterministic relations inside a table.

Minimal configuration steps

  1. Run a PII scan.

  2. Set identifier columns to Mask or Mock.

  3. Enable Consistent mapping only for shared identifiers.

  4. Use Free text de-identification only on the few text columns that need it.

circle-exclamation
chevron-rightConcrete example: from PII scan output to generator choiceshashtag

Example workspace name: test-api-contracts (stable dataset refreshed per sprint).

Example PII scan findings you should expect to review (illustrative):

Example generator mapping for API-friendly realism:

  • customers.email: Mask → Email (keeps a valid email structure for validators).

  • customers.phone_number: Mask → Phone (keeps country/format patterns).

  • orders.billing_iban: Mask → IBAN (format-preserving, prevents checksum failures).

  • customers.date_of_birth: Mock (often safer than masking when age distribution is not tested).

  • customers.notes: Free text de-identification (only on this column, not the whole table).

4

Handle keys and relationships (relational schemas)

If your test dataset is a single table with no joins, you can skip this step. Most app and API test failures are broken PK/FK chains.

Verify foreign key inheritance. Add virtual foreign keys where the database doesn’t define them.

5

Validate and sync

Validate source via Validate and synchronize workspace. Run a small smoke suite. Include at least one API happy path.

Re-run validation whenever schemas drift. DTAP schemas drift often.

6

Tune generation settings

Tune for repeatability, not just speed. Keep settings stable across runs. This makes test failures debuggable. This also reduces flaky tests.

Use View and adjust generation settings and Large workloads tuning once the config is correct.

Common pitfalls & misconfigurations

Use case-specific pitfalls

  • Masking values but breaking API validators (emails, UUIDs, date formats).

  • De-identifying large text fields without scoping. It can increase runtime. See Free text de-identification.

  • Tuning only at full scale. Start small, then refine configuration.

chevron-rightGeneral pitfallshashtag

These pitfalls show up in most projects:

Governance, compliance, and automation

Use case-specific recommendations

  • Use a stable seed per environment (dev, test, accept) for reproducible failures.

  • Automate refreshes per environment.

  • Treat FK/virtual FK changes as breaking changes. Require sync/validation after schema migrations before testing.

chevron-rightGeneral recommendationshashtag

Use these recommendations for most workspaces.

Ownership and change control

  • Assign a single workspace owner (data steward / privacy lead / DBA).

  • Require a ticket or change request for generator changes.

  • Duplicate a workspace before large edits. Keep the previous version as rollback.

Access control

  • Default to read-only access for source connections.

  • Restrict who can view source data in the UI.

  • Use separate workspaces per environment or audience.

Automation (baseline)

  • Use the Syntho REST API to standardize scans and runs.

  • Automate data generation not workspace configuration.

  • Keep job logs for failed runs. This reduces back-and-forth during support.

Last updated

Was this helpful?