# Use Case 11: Accelerate PoCs & pilots

Use this use case when you need to validate a new idea fast. This includes internal proof-of-concepts and external pilots.

The focus is **speed** and **safe collaboration**. You want data that behaves like production. You do not want production risk.

### What problem this use case solves

PoCs and pilots often stall on data access.

Teams wait on approvals, exports, and manual data prep. When data finally arrives, it is incomplete or unrealistic.

You need a repeatable way to provision privacy-safe, production-like datasets. You also need to share them with stakeholders.

### When to choose this use case

Pick this when speed is the primary constraint.

If you’re unsure, start with **De-identify**, keep scope to one “must-work” flow, and only mask fields validated by the app/API.

* You need a usable dataset in days, not weeks.
* You test one or two “must-work” workflows end-to-end.
* You collaborate across teams, vendors, or partners.
* Privacy rules block production copies.
* Run a [PII scan](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md) before sharing outputs.
* Duplicate the workspace before big changes. See [Duplicate a workspace](/setup-workspaces/duplicate-a-workspace.md).

### When to avoid this use case

Skip this when you need a long-lived, governed setup.

* You need stable DTAP baselines for many teams. Use [Use Case 1: Application & API Testing](/overview/get-started/use-cases-and-configuration/use-case-1-application-and-api-testing.md).
* You need stable, repeatable demo narratives for sales or pre-sales. Use [Use Case 3: Demo Data](/overview/get-started/use-cases-and-configuration/use-case-3-demo-data.md).
* You need formal external data sharing approvals. Use [Use Case 9: Data Sharing & Monetization](/overview/get-started/use-cases-and-configuration/use-case-9-data-sharing-and-monetization.md).
* If you mainly need upsampling, focus on [AI synthesize](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md) and performance tuning.

### Recommended Syntho configuration

This setup is optimized for **fast iteration with safe sharing**.

You start simple. You get a working dataset. You then tighten rules where the PoC depends on them.

{% stepper %}
{% step %}

#### Prerequisites

**Checklist**

* [ ] “Must-work” workflow defined (1–2 flows).
* [ ] Sharing boundary defined (internal vs external pilot).
* [ ] Reset cadence defined (ad-hoc vs weekly).

- Use the [Prerequisites](/overview/get-started/prerequisites.md) checklist.

<details>

<summary>Quick start checklist: first 48 hours</summary>

Use this when you need momentum fast and you don’t want to over-design early.

**Day 0–1 (get a first dataset):**

* Choose one “must-work” flow (e.g., login → search → checkout).
* Agree on success metrics (example: “demo the flow with 0 PII leakage” and “refresh in <30 minutes”).
* Create a workspace via [Create a workspace](/setup-workspaces/create-a-workspace.md) and run a first [PII scan](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md).
* Apply masking for validator-critical fields (emails, UUIDs, IBANs).
* Generate a small dataset first (smoke test), then scale up.

**Day 1–2 (iterate with stakeholders):**

* Review the dataset with the PoC team and privacy lead.
* Capture issues as a short backlog (missing tables, broken joins, unrealistic values).
* Duplicate the workspace before large changes. See [Duplicate a workspace](/setup-workspaces/duplicate-a-workspace.md). This makes rollbacks easy.

</details>
{% endstep %}

{% step %}

#### Source & destination management

Create one workspace per PoC or pilot. Examples: `poc-crm-integration`, `pilot-partner-x`.

* Duplicate a working workspace before big changes. This gives you a rollback point.
* Use simple versioned names like `v1`, `v2`, `baseline`, or `pilot-partner-x`.

#### Baseline rules

* Keep the **source stable**. Prefer snapshots or back-ups.
* Avoid a **live production** source for iterative work.
* Keep the **destination isolated**. Never write into production.
* Keep **schemas aligned** between source, workspace and destination.
* Use **views** when you need only a subset of the original database.

#### Lifecycle rule of thumb

* Keep the source connection when you expect schema changes.
* Remove the source connection when you expect a new run only much later.
* Revalidate after schema changes. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).

**Nuances for this use case**

* For external pilots, treat the environment like an external share. Restrict access and default to stronger unlinkability.
* Don’t reuse a PoC workspace for a new initiative. Old generator decisions silently carry over.
* Clean up connectors and access after the pilot. Make it part of close-out.
* [Create a workspace](/setup-workspaces/create-a-workspace.md)
* [Duplicate a workspace](/setup-workspaces/duplicate-a-workspace.md)
  {% endstep %}

{% step %}

#### Configure generators

**Workspace initialization mode**

Choose a [workspace mode](/setup-workspaces/create-a-workspace/workspace-modes.md). It applies baseline generator suggestions during workspace creation.

Recommended modes for this use case:

* **De-identify** when you start from a production-like copy and need fast, safe parity.
* **Mock or mask all** when you want stronger separation from the source but still need realistic formats.
* **Mock all** when you have no usable source data yet (early product work).

**AI-generated synthesis**

Use this when the PoC needs **realistic distributions** or **more rows** quickly, and you’re not asserting row-level parity.

**Example (integration at scale):** synthesize a `poc_contacts_entity_view` to generate 5× more contacts so you can validate connector throughput and UI pagination without using real identifiers.

**Rule-based generation**

Use this to guarantee the PoC has the exact scenarios stakeholders will test. Use [Calculated columns](/configure-a-data-generation-job/configure-column-settings/calculated-columns.md) to keep demos repeatable.

**Example (must-have workflow states):** assign a stable stage based on a numeric key so every stage appears across refreshes.

```excel-formula
// New column: onboarding_stage (stable by id modulo)
SWITCH(MOD([customer_id], 3),
  0, "NEW",
  1, "IN_PROGRESS",
  2, "COMPLETED",
  "NEW"
)
```

Use an integer-like key column here. If your IDs are UUIDs, use a numeric surrogate key for the lab.

**Masking**

Use this when partner systems validate formats, and you need stable keys/joins during multiple PoC refreshes.

**Example (API contract fields):** mask `email`, `phone`, and `external_id` to valid formats, and enable consistent mapping for identifiers used across tables so the integration doesn’t break between runs.

**Hybrid**

Use this when you need speed plus just enough realism and governance to share safely.

**Example (fast baseline + scenario control):**

1. De-identify a production-like snapshot for quick parity.
2. Add calculated “scenario columns” that drive the PoC story (workflow stage, flags).
3. Publish a flattened stakeholder view and synthesize it if you need unlinkability.

A simple “demo label” trick that makes filtering obvious in UIs:

```excel-formula
// New column: demo_account_label
CONCATENATE(PROPER(MOCK_COMPANY_NAME), " - ", [onboarding_stage])
```

**Minimal configuration steps**

1. Start with de-identification for parity (or mock-first if no source exists).
2. Mask only the fields the integration validates (emails, UUIDs, IBANs).
3. Add 1–2 calculated “scenario” columns to drive the PoC story.
4. Run a [PII scan](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md) before sharing.

* [Automatic PII discovery with PII scanner](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md)
* [Manage personally identifiable information (PII)](/configure-a-data-generation-job/manage-personally-identifiable-information-pii.md)
  {% endstep %}

{% step %}

#### Handle keys and relationships (relational schemas)

If your PoC uses a **single table** (no joins), you can skip this step.

PoCs fail on broken relationships.

Validate primary keys and foreign keys early. Add virtual keys if the schema is incomplete.

Use [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md) and add [virtual foreign keys](/configure-a-data-generation-job/manage-foreign-keys/add-virtual-foreign-keys/add-virtual-foreign-keys.md) when the database schema is incomplete.

* [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md)
* [Key generators](/configure-a-data-generation-job/configure-column-settings/key-generators.md)
  {% endstep %}

{% step %}

#### Validate and sync

Validate quickly on a subset.

Run the “happy path” scenario the PoC exists to prove.

Re-run validation whenever schemas or requirements change. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).

* [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md)
  {% endstep %}

{% step %}

#### Tune generation settings

Optimize for short feedback loops.

Keep settings stable so results are comparable across iterations.

Use [View and adjust generation settings](/configure-a-data-generation-job/generation-and-validation/view-and-adjust-generation-settings.md) when runtime becomes the bottleneck.

* [View and adjust generation settings](/configure-a-data-generation-job/generation-and-validation/view-and-adjust-generation-settings.md)
  {% endstep %}
  {% endstepper %}

### Common pitfalls & misconfigurations

#### Use-case specific pitfalls

* Starting the PoC without a clear success definition.
* Using real production data in pilot environments.
* Enabling consistent mapping for external sharing without a privacy review.
* Over-modeling the dataset.
  * Get the critical flows working first.

<details>

<summary>General pitfalls</summary>

These pitfalls show up in most projects:

* Running full-scale jobs before a small validation run.
* Skipping workspace validation/sync after schema changes. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).
* Breaking relational integrity (missing PK/FK setup, missing foreign keys, missing virtual foreign keys). Start with [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md) and [virtual foreign keys](/configure-a-data-generation-job/manage-foreign-keys/add-virtual-foreign-keys/add-virtual-foreign-keys.md).
* Leaving sensitive columns on [**Duplicate**](/configure-a-data-generation-job/configure-column-settings/duplicate.md), or trusting the [PII scan](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md) without reviewing false positives/negatives.
* Overusing [**Consistent mapping**](/configure-a-data-generation-job/configure-column-settings/consistent-mapping.md) (it slows down data generation and increases linkability).

</details>

<details>

<summary>Governance, compliance, and automation</summary>

#### Governance, access control, and audit evidence

Keep the workspace configuration as a controlled artifact. Treat it like “test data release”.

#### Recommended roles

* **Workspace Owner**: data steward or privacy lead. Approves generator choices and sharing.
* **Workspace Editor**: data engineer or platform engineer. Implements configuration changes.
* **Workspace Reader**: testers, analysts, or trainees. Can run jobs but should not change rules.

See [Workspace & user management](/overview/get-started/syntho-bootcamp/8.-workspace-and-user-management.md) and [Share a workspace](/setup-workspaces/share-a-workspace.md).

#### Access control checklist

* Use **read-only** access to the **source** database for day-to-day users.
* Restrict **who can view source data** in the UI. Don’t default to broad access.
* Use a **dedicated destination** per environment (`dev`, `test`, `accept`, `sandbox`).
* Keep external recipients in a **separate workspace** with stricter settings.

#### Evidence for auditors (lightweight but useful)

Capture these items per delivery or refresh:

* Workspace name, owner, and intended audience.
* PII scan results and the final list of “PII columns + applied generator type”.
* Any enabled privacy controls (e.g., rare category protection, free-text de-identification scope).
* Validation output and/or QA report (when applicable).
* Approval notes (ticket link, privacy board sign-off, or risk acceptance).

#### Automation and deployment (reference)

You can automate workspace setup, scans, and generation runs via the [Syntho REST API](/syntho-api/syntho-rest-api.md).

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.syntho.ai/overview/get-started/use-cases-and-configuration/use-case-11-accelerate-pocs-and-pilots.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
