Use Case 5: Feature development

Shift-left testing with realistic synthetic data when production data is unavailable or restricted.

Use this use case when you need data to build and test features early. This can include scenarios with no or limited input data.

What problem this use case solves

Teams need realistic data to build features. Often, production data access is limited or not available yet.

Classic anonymization requires access to production-like data. It may also be slow when requirements change frequently.

When to choose this use case

Pick this when you need data early, before production access exists.

If you’re unsure, start with Mock all, configure PK/FK with key generators, and add rules later.

  • You have no or limited production-like data.

  • You need “new-shaped” data for new features and enums.

  • You iterate fast and configs change often.

  • You need dev data that is safe by default.

  • Use From scratch when only a few tables matter and you want tight control.

When to avoid this use case

Skip this when you need production parity.

This setup is optimized for fast iteration when production data is missing. You generate values quickly. You encode rules only where the feature depends on them.

1

Prerequisites

Checklist

2

Source & destination management

Create one workspace per team or feature area. This prevents generator churn across teams.

  • Duplicate a working workspace before big changes. This gives you a rollback point.

  • Use simple versioned names like v1, v2, baseline, or pilot-partner-x.

Baseline rules

  • Keep the source stable. Prefer snapshots or back-ups.

  • Avoid a live production source for iterative work.

  • Keep the destination isolated. Never write into production.

  • Keep schemas aligned between source, workspace and destination.

  • Use views when you need only a subset of the original database.

Lifecycle rule of thumb

  • Keep the source connection when you expect schema changes.

  • Remove the source connection when you expect a new run only much later.

  • Revalidate after schema changes. Use Validate and synchronize workspace.

Nuances for this use case

  • “Mock-first” is common. Use no source data, or only small reference seeds.

  • Treat dev destinations as disposable. Reset often. Don’t patch generated data by hand.

  • Don’t share one workspace across teams. Generator edits will churn and break others’ flows.

  • “Mock all” won’t preserve relational behavior. Keys and relationships still need explicit setup.

3

Configure generators

Workspace initialization mode

Choose a workspace mode. It applies baseline generator suggestions during workspace creation.

Recommended modes for this use case:

  • Mock all when there is little/no reference data and you’re generating new values.

  • From scratch when only a few tables matter and you want tight control.

  • Mock or mask all when you have a baseline dataset but want to replace most fields quickly.

AI-generated synthesis

Use this when you have enough rows and want realistic distributions quickly, especially for analytics-like features used during development.

Example (realistic transaction amounts): if you have a small seed table payments_seed with representative columns, apply AI synthesize to generate realistic amount, currency, and merchant_category values for UI and backend feature testing.

Rule-based generation

Use this when new features require explicit business rules or new enums that don’t exist in any source yet. Use Calculated columns for maintainable logic.

Example (new status enum): add a calculated review_status column that produces PENDING_REVIEW, APPROVED, REJECTED at fixed ratios. Use it to test UI filters and backend branching.

Masking

Use this when dev environments require format-valid identifiers (UUIDs, emails) and you want to keep relational joins stable.

Example (format + joins): mask user_email and external_reference to valid formats, and keep consistent mapping on account_number so a user’s account references remain stable across tables during repeated dev resets.

Hybrid

Use this when you need mock-first speed, plus a few hard business rules. It maps to “new data creation” in Example data generation scenarios.

Example (tenant-style dev data with realistic emails):

  1. Mock names and basic attributes.

  2. Use a calculated column to build an email from those generated values.

Minimal configuration steps

  1. Apply mockers for most columns.

  2. Configure PK/FK via key generators and Manage foreign keys.

  3. Add calculated columns only for the behaviors your feature relies on.

chevron-rightWhen you need “new-shaped” datahashtag

Use explicit generators when the source cannot contain the new behavior:

  • New enums/statuses (PENDING_REVIEW, ESCALATED, CANCELLED_BY_USER).

  • JSON payload shapes.

  • Free-text fields that may contain identifiers.

Relevant docs:

chevron-rightPractical example: versioning generator configurationshashtag

Feature work is experimental. Treat generator configs like versions.

Recommended pattern:

  • Keep a stable baseline workspace: feature-payments_v1.

  • Duplicate before large edits: feature-payments_v2.

  • Roll back by reusing the previous workspace if tests fail.

This is also how you can compare two approaches (e.g., “more rules” vs “more mock”) without losing a known-good setup.

4

Handle keys and relationships (relational schemas)

If you only test a single table (no joins), you can skip this step.

Mockers can’t generate PK/FK columns. You must configure keys explicitly or your app won’t work.

Use key generators for primary keys and enforce relationships via Manage foreign keys.

5

Validate and sync

Validate after every schema change via Validate and synchronize workspace. Feature development is constant schema drift.

Validate before you blame failing tests on the feature.

6

Tune generation settings

Optimize for small, frequent runs. This matches a dev workflow and keeps feedback fast.

Use View and adjust generation settings when job time becomes a bottleneck.

Common pitfalls & misconfigurations

Use case-specific pitfalls

  • Expecting “Mock all” to preserve original distributions or relationships.

  • Forgetting that mockers cannot be applied to PK/FK columns.

chevron-rightGeneral pitfallshashtag

These pitfalls show up in most projects:

Governance, compliance, and automation

Use case-specific recommendations

  • Treat feature work data as disposable. Create workspaces per feature area or team. Delete/retire when work merges.

  • When you introduce new enums/statuses, document the allowed values and generation ratios. This prevents UI/test drift.

  • Enforce a hard rule: no production source connections for early feature work. Use mock-first or approved seeds only.

  • Automate “fast feedback” runs: small generate → validate → scale only when the feature needs it.

chevron-rightGeneral recommendationshashtag

Use these recommendations for most workspaces.

Ownership and change control

  • Assign a single workspace owner (data steward / privacy lead / DBA).

  • Require a ticket or change request for generator changes.

  • Duplicate a workspace before large edits. Keep the previous version as rollback.

Access control

  • Default to read-only access for source connections.

  • Restrict who can view source data in the UI.

  • Use separate workspaces per environment or audience.

Automation (baseline)

  • Use the Syntho REST API to standardize scans and runs.

  • Automate data generation not workspace configuration.

  • Keep job logs for failed runs. This reduces back-and-forth during support.

Last updated

Was this helpful?