Use Case 1: Application & API testing
Privacy-safe, production-like test data for application and API tests in non-production.
Use this use case when you need privacy-safe, production-like data for application and API testing. You keep realistic system behavior. You avoid exposing PII/PHI in non-production.
What problem this use case solves
Teams need stable test data for UI flows, integration tests, and API tests. Production copies are often blocked by privacy regulations. Manual test data creation is slow and not representative.
When to choose this use case
Pick this when you need production-like test data without privacy risks:
You test applications and APIs in
dev,test, oraccept.Your tests need valid formats (email, UUID, IBAN, dates).
Your tests need stable joins across tables and refreshes.
Your schema changes over time and you need reproducible failures.
When to avoid this use case
Skip this use case when functional testing is not your main goal:
You mainly need load or stress testing at huge volumes. Use Use Case 2: Load & Stress.
You do not have production data available yet. Use Use Case 5: Feature Development.
You need maximum protection for external data sharing. Use Use Case 9: Data Sharing & Monetization.
Recommended Syntho configuration
This setup is optimized for DTAP-style application and API testing. You keep schemas and relationships intact. You replace identifiers and free text that can leak PII/PHI
Prerequisites
Use the Prerequisites checklist.
Checklist
Never write generated data back into production. Keep the destination isolated.
Source & destination management
Create one workspace per non-production environment. Examples: dev, test, accept.
Baseline rules
Keep the source stable. Prefer snapshots or back-ups.
Avoid a live production source for iterative work.
Keep the destination isolated. Never write into production.
Keep schemas aligned between source, workspace and destination.
Use views when you need only a subset of the original database.
Lifecycle rule of thumb
Keep the source connection when you expect schema changes.
Remove the source connection when you expect a new run only much later.
Revalidate after schema changes. Use Validate and synchronize workspace.
Nuances for this use case
Use the exact production schema (types, constraints, indexes). Tests often fail on subtle schema drift.
Avoid a shared destination schemas. Collisions between teams create flaky tests.
Dropped join keys break joins and consistent mapping.
Configure generators
Workspace initialization mode
Choose a workspace mode. It applies baseline generator suggestions during workspace creation. For DTAP, you start from an existing schema and data.
Recommended modes for this use case:
De-identify when you have a production-like copy and only need to replace PII/PHI while keeping non-PII intact.
From scratch when you only test a few tables and want full manual control.
AI-generated synthesis
Use this when you have indirect identifiers in your data that need masking. For example, a column with gender, age, and weight. AI synthesis will generate new values while preserving correlations.
Rule-based generation
Use this when tests must hit explicit branches and edge cases. Use Calculated columns to inject controlled exceptions.
Example (negative API tests with a stable base): create an EDGE_FLAG and only corrupt one validator field on those rows.
Masking
Use this when you need format-preserving replacements and stable keys/relationships across tables and refreshes. This is the default workhorse for API/UI validators.
Example: apply Mask → Email on customers.email, Mask → UUID on users.external_id, and enable Consistent mapping for shared identifiers so joins and app flows stay intact
Hybrid
Use this when you need relational correctness plus deterministic business logic. Hybrid is the default for serious app testing. It maps to the patterns in Example data generation scenarios.
Example (deterministic relations + safe identifiers): enforce “gender → name style” while keeping joins intact.
Keep PK/FK behavior stable with key generators and consistent mapping where needed.
Mask/replace direct identifiers (emails, phone numbers) for validator safety.
Use calculated columns to enforce deterministic relations inside a table.
Minimal configuration steps
Run a PII scan.
Set identifier columns to Mask or Mock.
Enable Consistent mapping only for shared identifiers.
Use Free text de-identification only on the few text columns that need it.
Consistent mapping increases linkability. Enable it only for columns needed for joins and flows.
Concrete example: from PII scan output to generator choices
Example workspace name: test-api-contracts (stable dataset refreshed per sprint).
Example PII scan findings you should expect to review (illustrative):
Example generator mapping for API-friendly realism:
customers.email: Mask → Email (keeps a valid email structure for validators).customers.phone_number: Mask → Phone (keeps country/format patterns).orders.billing_iban: Mask → IBAN (format-preserving, prevents checksum failures).customers.date_of_birth: Mock (often safer than masking when age distribution is not tested).customers.notes: Free text de-identification (only on this column, not the whole table).
Handle keys and relationships (relational schemas)
If your test dataset is a single table with no joins, you can skip this step. Most app and API test failures are broken PK/FK chains.
Verify foreign key inheritance. Add virtual foreign keys where the database doesn’t define them.
Validate and sync
Validate source via Validate and synchronize workspace. Run a small smoke suite. Include at least one API happy path.
Re-run validation whenever schemas drift. DTAP schemas drift often.
Tune generation settings
Tune for repeatability, not just speed. Keep settings stable across runs. This makes test failures debuggable. This also reduces flaky tests.
Use View and adjust generation settings and Large workloads tuning once the config is correct.
Common pitfalls & misconfigurations
Use case-specific pitfalls
Masking values but breaking API validators (emails, UUIDs, date formats).
De-identifying large text fields without scoping. It can increase runtime. See Free text de-identification.
Tuning only at full scale. Start small, then refine configuration.
General pitfalls
These pitfalls show up in most projects:
Running full-scale jobs before a small validation run.
Skipping workspace validation/sync after schema changes. Use Validate and synchronize workspace.
Breaking relational integrity (missing PK/FK setup, missing foreign keys, missing virtual foreign keys). Start with Manage foreign keys and virtual foreign keys.
Overusing Consistent mapping (it slows down data generation and increases linkability).
Governance, compliance, and automation
Use case-specific recommendations
Use a stable seed per environment (
dev,test,accept) for reproducible failures.Automate refreshes per environment.
Treat FK/virtual FK changes as breaking changes. Require sync/validation after schema migrations before testing.
General recommendations
Use these recommendations for most workspaces.
Ownership and change control
Assign a single workspace owner (data steward / privacy lead / DBA).
Require a ticket or change request for generator changes.
Duplicate a workspace before large edits. Keep the previous version as rollback.
Access control
Default to read-only access for source connections.
Restrict who can view source data in the UI.
Use separate workspaces per environment or audience.
Automation (baseline)
Use the Syntho REST API to standardize scans and runs.
Automate data generation not workspace configuration.
Keep job logs for failed runs. This reduces back-and-forth during support.
Last updated
Was this helpful?

