# Use Case 5: Feature development

Use this use case when you need data to build and test features early. This can include scenarios with no or limited input data.

### What problem this use case solves

Teams need realistic data to build features. Often, production data access is limited or not available yet.

Classic anonymization requires access to production-like data. It may also be slow when requirements change frequently.

### When to choose this use case

Pick this when you need data early, before production access exists.

If you’re unsure, start with **Mock all**, configure PK/FK with [key generators](/configure-a-data-generation-job/configure-column-settings/key-generators.md), and add rules later.

* You have no or limited production-like data.
* You need “new-shaped” data for new features and enums.
* You iterate fast and configs change often.
* You need dev data that is safe by default.
* Use **From scratch** when only a few tables matter and you want tight control.

### When to avoid this use case

Skip this when you need production parity.

* You must preserve original relationships for reconciliation. Use [Use Case 4: ETL & Data Pipeline Testing](/overview/get-started/use-cases-and-configuration/use-case-4-etl-and-data-pipeline-testing.md).
* You need production-parity behavior across many related tables. Use [Use Case 1: Application & API Testing](/overview/get-started/use-cases-and-configuration/use-case-1-application-and-api-testing.md).
* You mainly need performance testing at huge volumes. Use [Use Case 2: Load & Stress](/overview/get-started/use-cases-and-configuration/use-case-2-load-and-stress-testing.md).
* If the dataset will leave your team, treat it like a controlled release. Avoid stable pseudonyms by default.

### Recommended Syntho configuration

This setup is optimized for **fast iteration when production data is missing**. You generate values quickly. You encode rules only where the feature depends on them.

{% stepper %}
{% step %}

#### Prerequisites

**Checklist**

* [ ] Schema is stable enough for this iteration.
* [ ] “Must-have” rules listed (enums, ranges, required relationships).
* [ ] PK/FK strategy decided (key generators + foreign keys).

- Use the [Prerequisites](/overview/get-started/prerequisites.md) checklist.
  {% endstep %}

{% step %}

#### Source & destination management

Create one workspace per team or feature area. This prevents generator churn across teams.

* Duplicate a working workspace before big changes. This gives you a rollback point.
* Use simple versioned names like `v1`, `v2`, `baseline`, or `pilot-partner-x`.

#### Baseline rules

* Keep the **source stable**. Prefer snapshots or back-ups.
* Avoid a **live production** source for iterative work.
* Keep the **destination isolated**. Never write into production.
* Keep **schemas aligned** between source, workspace and destination.
* Use **views** when you need only a subset of the original database.

#### Lifecycle rule of thumb

* Keep the source connection when you expect schema changes.
* Remove the source connection when you expect a new run only much later.
* Revalidate after schema changes. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).

**Nuances for this use case**

* “Mock-first” is common. Use no source data, or only small reference seeds.
* Treat dev destinations as disposable. Reset often. Don’t patch generated data by hand.
* Don’t share one workspace across teams. Generator edits will churn and break others’ flows.
* “Mock all” won’t preserve relational behavior. Keys and relationships still need explicit setup.
* [Create a workspace](/setup-workspaces/create-a-workspace.md)
* [Duplicate a workspace](/setup-workspaces/duplicate-a-workspace.md)
  {% endstep %}

{% step %}

#### Configure generators

**Workspace initialization mode**

Choose a [workspace mode](/setup-workspaces/create-a-workspace/workspace-modes.md). It applies baseline generator suggestions during workspace creation.

Recommended modes for this use case:

* **Mock all** when there is little/no reference data and you’re generating new values.
* **From scratch** when only a few tables matter and you want tight control.
* **Mock or mask all** when you have a baseline dataset but want to replace most fields quickly.

**AI-generated synthesis**

Use this when you have enough rows and want realistic distributions quickly, especially for analytics-like features used during development.

**Example (realistic transaction amounts):** if you have a small seed table `payments_seed` with representative columns, apply **AI synthesize** to generate realistic `amount`, `currency`, and `merchant_category` values for UI and backend feature testing.

**Rule-based generation**

Use this when new features require **explicit business rules** or new enums that don’t exist in any source yet. Use [Calculated columns](/configure-a-data-generation-job/configure-column-settings/calculated-columns.md) for maintainable logic.

**Example (new status enum):** add a calculated `review_status` column that produces `PENDING_REVIEW`, `APPROVED`, `REJECTED` at fixed ratios. Use it to test UI filters and backend branching.

```excel-formula
// New column: review_status (weighted distribution)
SWITCH(TRUE,
  RAND() < 0.10, "PENDING_REVIEW",
  RAND() < 0.85, "APPROVED",
  "REJECTED"
)
```

**Masking**

Use this when dev environments require **format-valid identifiers** (UUIDs, emails) and you want to keep relational joins stable.

**Example (format + joins):** mask `user_email` and `external_reference` to valid formats, and keep consistent mapping on `account_number` so a user’s account references remain stable across tables during repeated dev resets.

**Hybrid**

Use this when you need mock-first speed, plus a few **hard business rules**. It maps to “new data creation” in [Example data generation scenarios](/overview/get-started/syntho-bootcamp/example-data-generation-scenarios.md).

**Example (tenant-style dev data with realistic emails):**

1. Mock names and basic attributes.
2. Use a calculated column to build an email from those generated values.

```excel-formula
// New column: user_email (derived from generated names)
LOWER(CONCATENATE([first_name], ".", [last_name], "@", MOCK_FREE_EMAIL_DOMAIN_0))
```

**Minimal configuration steps**

1. Apply mockers for most columns.
2. Configure PK/FK via [key generators](/configure-a-data-generation-job/configure-column-settings/key-generators.md) and [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md).
3. Add calculated columns only for the behaviors your feature relies on.

<details>

<summary>When you need “new-shaped” data</summary>

Use explicit generators when the source cannot contain the new behavior:

* New enums/statuses (`PENDING_REVIEW`, `ESCALATED`, `CANCELLED_BY_USER`).
* JSON payload shapes.
* Free-text fields that may contain identifiers.

Relevant docs:

* [JSON de-identification](/configure-a-data-generation-job/configure-column-settings/json-de-identification.md)
* [Free text de-identification](/overview/get-started/syntho-bootcamp/5.-generators/free-text-de-identification.md)

</details>

<details>

<summary>Practical example: versioning generator configurations</summary>

Feature work is experimental. Treat generator configs like versions.

Recommended pattern:

* Keep a stable baseline workspace: `feature-payments_v1`.
* Duplicate before large edits: `feature-payments_v2`.
* Roll back by reusing the previous workspace if tests fail.

This is also how you can compare two approaches (e.g., “more rules” vs “more mock”) without losing a known-good setup.

</details>
{% endstep %}

{% step %}

#### Handle keys and relationships (relational schemas)

If you only test a **single table** (no joins), you can skip this step.

Mockers can’t generate PK/FK columns. You must configure keys explicitly or your app won’t work.

Use [key generators](/configure-a-data-generation-job/configure-column-settings/key-generators.md) for primary keys and enforce relationships via [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md).

* [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md)
* [Key generators](/configure-a-data-generation-job/configure-column-settings/key-generators.md)
  {% endstep %}

{% step %}

#### Validate and sync

Validate after every schema change via [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md). Feature development is constant schema drift.

Validate before you blame failing tests on the feature.

* [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md)
  {% endstep %}

{% step %}

#### Tune generation settings

Optimize for small, frequent runs. This matches a dev workflow and keeps feedback fast.

Use [View and adjust generation settings](/configure-a-data-generation-job/generation-and-validation/view-and-adjust-generation-settings.md) when job time becomes a bottleneck.

* [View and adjust generation settings](/configure-a-data-generation-job/generation-and-validation/view-and-adjust-generation-settings.md)
  {% endstep %}
  {% endstepper %}

### Common pitfalls & misconfigurations

#### Use case-specific pitfalls

* Expecting “Mock all” to preserve original distributions or relationships.
* Forgetting that mockers cannot be applied to PK/FK columns.
  * Use [Key generators](/configure-a-data-generation-job/configure-column-settings/key-generators.md).

<details>

<summary>General pitfalls</summary>

These pitfalls show up in most projects:

* Running full-scale jobs before a small validation run.
* Skipping workspace validation/sync after schema changes. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).
* Breaking relational integrity (missing PK/FK setup, missing foreign keys, missing virtual foreign keys). Start with [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md) and [virtual foreign keys](/configure-a-data-generation-job/manage-foreign-keys/add-virtual-foreign-keys/add-virtual-foreign-keys.md).
* Leaving sensitive columns on [**Duplicate**](/configure-a-data-generation-job/configure-column-settings/duplicate.md), or trusting the [PII scan](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md) without reviewing false positives/negatives.
* Overusing [**Consistent mapping**](/configure-a-data-generation-job/configure-column-settings/consistent-mapping.md) (it slows down data generation and increases linkability).

</details>

### Governance, compliance, and automation

#### Use case-specific recommendations

* Treat feature work data as disposable. Create workspaces per feature area or team. Delete/retire when work merges.
* When you introduce new enums/statuses, document the allowed values and generation ratios. This prevents UI/test drift.
* Enforce a hard rule: no production source connections for early feature work. Use mock-first or approved seeds only.
* Automate “fast feedback” runs: small generate → validate → scale only when the feature needs it.

<details>

<summary>General recommendations</summary>

Use these recommendations for most workspaces.

#### Ownership and change control

* Assign a single **workspace owner** (data steward / privacy lead / DBA).
* Require a ticket or change request for generator changes.
* Duplicate a workspace before large edits. Keep the previous version as rollback.

#### Access control

* Default to **read-only** access for source connections.
* Restrict **who can view source data** in the UI.
* Use separate workspaces per environment or audience.

#### Automation (baseline)

* Use the [Syntho REST API](/syntho-api/syntho-rest-api.md) to standardize scans and runs.
* Automate data generation not workspace configuration.
* Keep job logs for failed runs. This reduces back-and-forth during support.

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.syntho.ai/overview/get-started/use-cases-and-configuration/use-case-5-feature-development.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
