# Use Case 12: Training & Education

Use this use case when you need **safe, realistic data** for onboarding, workshops, and hands-on learning.

The focus is **repeatable training scenarios** without exposing real customer data.

### What problem this use case solves

Training requires data that feels real.

Real production data is usually blocked by privacy, security, and access controls. Manually created demo data often lacks realism and breaks workflows.

You need a dataset that supports realistic exercises. You also need a quick reset between sessions.

### When to choose this use case

Pick this when humans learn hands-on using realistic data.

If you’re unsure, start with **Mock or mask all**, keep the dataset small, and duplicate the workspace before every session.

* You run onboarding, enablement, or workshops.
* You need stable scenarios that always work.
* Many trainees share the same dataset.
* You need quick resets during sessions.
* Use [Consistent mapping](/configure-a-data-generation-job/configure-column-settings/consistent-mapping.md) only for storytelling.

### When to avoid this use case

Skip this when training is not the purpose.

* You share data externally beyond the training boundary. Use [Use Case 9: Data Sharing & Monetization](/overview/get-started/use-cases-and-configuration/use-case-9-data-sharing-and-monetization.md).
* You need analytics-grade statistical utility. Use [Use Case 7: Analytics Sandboxes](/overview/get-started/use-cases-and-configuration/use-case-7-analytics-sandboxes.md).
* You need stable demo narratives for product walkthroughs. Use [Use Case 3: Demo Data](/overview/get-started/use-cases-and-configuration/use-case-3-demo-data.md).
* If you need load testing at scale, prioritize volume profiles and destination tuning.

### Recommended Syntho configuration

This setup is optimized for **repeatable training exercises**. You want datasets that reset fast. You want stable examples for step-by-step instructions.

{% stepper %}
{% step %}

#### Prerequisites

**Checklist**

* [ ] Learning goals defined (PII scan, generators, FK handling).
* [ ] Dataset size kept small (fast resets).
* [ ] Trainees’ access decided (no source access by default).

- Use the [Prerequisites](/overview/get-started/prerequisites.md) checklist.
  {% endstep %}

{% step %}

#### Source & destination management

Create one workspace per training track. Examples: `training-basics`, `training-pii`, `training-foreign-keys`.

* Duplicate a working workspace before big changes. This gives you a rollback point.
* Use simple versioned names like `v1`, `v2`, `baseline`, or `pilot-partner-x`.

#### Baseline rules

* Keep the **source stable**. Prefer snapshots or back-ups.
* Avoid a **live production** source for iterative work.
* Keep the **destination isolated**. Never write into production.
* Keep **schemas aligned** between source, workspace and destination.
* Use **views** when you need only a subset of the original database.

#### Lifecycle rule of thumb

* Keep the source connection when you expect schema changes.
* Remove the source connection when you expect a new run only much later.
* Revalidate after schema changes. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).

**Nuances for this use case**

* Prefer mock-first sources. Don’t use production copies for training.
* Keep datasets small. Resets should be minutes, not hours.
* Don’t give trainees source access. Training should not be a backdoor to production-like data.
* Don’t mix “storytelling” and “privacy” goals in one dataset. Use separate tracks or separate workspaces.
* [Create a workspace](/setup-workspaces/create-a-workspace.md)
* [Duplicate a workspace](/setup-workspaces/duplicate-a-workspace.md)
  {% endstep %}

{% step %}

#### Configure generators

**Workspace initialization mode**

Choose a [workspace mode](/setup-workspaces/create-a-workspace/workspace-modes.md). It applies baseline generator suggestions during workspace creation.

Recommended modes for this use case:

* **Mock or mask all** when you want safe, realistic values with minimal reliance on the source.
* **Mock all** when you want trainees to learn configuration from scratch without any production-like input.
* **De-identify** when you have a production-like training dataset and you want to teach privacy-safe replacement patterns.

**AI-generated synthesis**

Use this when training includes analytics/ML concepts and you want realistic correlations without exposing real people.

**Example (training on “churn”):** synthesize a `training_churn_features_view` so participants can build a simple model or dashboard with realistic feature relationships.

**Rule-based generation**

Use this to make exercises deterministic and repeatable. Use [Calculated columns](/configure-a-data-generation-job/configure-column-settings/calculated-columns.md) to keep labs stable.

**Example (scripted “bad rows” lab):** inject a small, known set of malformed values learners must find and fix.

```excel-formula
// New column: LAB_BAD_ROW (≈2% of rows)
RAND() < 0.02
```

```excel-formula
// Override: email (invalid only for lab rows)
IF([LAB_BAD_ROW], "not-an-email", [email])
```

**Masking**

Use this when labs require format-valid fields for validation exercises and relational joins.

**Example (PII lab):** mask `email` and `phone_number`, then enable consistent mapping for `customer_id` so learners can see that joins still work after de-identification.

**Hybrid**

Use this when you want safe realism, plus scripted teaching scenarios.

**Example (progressive lab setup):** follow the hybrid patterns in [Example data generation scenarios](/overview/get-started/syntho-bootcamp/example-data-generation-scenarios.md).

1. Mock names/addresses for safety.
2. Mask format-critical fields for validators (emails, UUIDs).
3. Use calculated columns to inject edge cases (LAB\_BAD\_ROW) and deterministic relations (e.g., gender → name).

If you want trainees to practice a classic “absolute calculation”, add this exercise:

```excel-formula
// New column: trial_end_date (derived field exercise)
DATEADD([signup_date], 14, "day")
```

**Minimal configuration steps**

1. Run a PII scan and review findings.
2. Apply mock/mask for the exercise scope.
3. Use calculated columns to inject lab tasks or derived fields.

* [Automatic PII discovery with PII scanner](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md)
* [Manage personally identifiable information (PII)](/configure-a-data-generation-job/manage-personally-identifiable-information-pii.md)

<details>

<summary>Progressive training scenarios (recommended)</summary>

Design training so learners build confidence, then complexity.

**Scenario A (Basics):** PII scan + safe replacements

* Goal: identify PII and apply mock/mask correctly.
* Exercise: run PII scan, fix one false positive and one false negative, then generate.

**Scenario B (Relational correctness):** keys + foreign keys

* Goal: keep joins working.
* Exercise: add one virtual FK, validate, and re-run generation.

**Scenario C (Edge cases):** inject rare cases for testing

* Goal: produce rows that trigger special logic.
* Exercise: add an `EDGE_FLAG` and override one column.

```excel-formula
// New column: EDGE_FLAG (≈2% of rows)
RAND() < 0.02
```

```excel-formula
// Example override: subscription_status
IF([EDGE_FLAG], "PAST_DUE", [subscription_status])
```

</details>
{% endstep %}

{% step %}

#### Handle keys and relationships (relational schemas)

If the training dataset is **single-table**, you can skip this step.

Training breaks fast on missing relationships.

Validate foreign keys early. Use [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md). Add [virtual foreign keys](/configure-a-data-generation-job/manage-foreign-keys/add-virtual-foreign-keys/add-virtual-foreign-keys.md) if the schema is incomplete.

* [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md)
* [Key generators](/configure-a-data-generation-job/configure-column-settings/key-generators.md)
  {% endstep %}

{% step %}

#### Validate and sync

Validate a small slice first.

Run the exercises end-to-end as a trainee would.

Re-run validation whenever the training schema changes. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).

* [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md)
  {% endstep %}

{% step %}

#### Tune generation settings

Prioritize fast reset times.

Stable settings make labs reproducible.

Use [View and adjust generation settings](/configure-a-data-generation-job/generation-and-validation/view-and-adjust-generation-settings.md) once the exercises are stable.

* [View and adjust generation settings](/configure-a-data-generation-job/generation-and-validation/view-and-adjust-generation-settings.md)
  {% endstep %}
  {% endstepper %}

### Common pitfalls & misconfigurations

#### Use-case specific pitfalls

* Using production copies for training environments.
* Making datasets too big.
  * Training should reset in minutes.
* Changing generator configs right before a session.
  * Duplicate a working workspace instead. See [Duplicate a workspace](/setup-workspaces/duplicate-a-workspace.md).
* Using consistent mapping by default.
  * Decide based on training goals: stable storytelling vs strict unlinkability.

<details>

<summary>General pitfalls</summary>

These pitfalls show up in most projects:

* Running full-scale jobs before a small validation run.
* Skipping workspace validation/sync after schema changes. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).
* Breaking relational integrity (missing PK/FK setup, missing foreign keys, missing virtual foreign keys). Start with [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md) and [virtual foreign keys](/configure-a-data-generation-job/manage-foreign-keys/add-virtual-foreign-keys/add-virtual-foreign-keys.md).
* Leaving sensitive columns on [**Duplicate**](/configure-a-data-generation-job/configure-column-settings/duplicate.md), or trusting the [PII scan](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md) without reviewing false positives/negatives.
* Overusing [**Consistent mapping**](/configure-a-data-generation-job/configure-column-settings/consistent-mapping.md) (it slows down data generation and increases linkability).

</details>

<details>

<summary>Governance, compliance, and automation</summary>

#### Governance, access control, and audit evidence

Keep the workspace configuration as a controlled artifact. Treat it like “test data release”.

#### Recommended roles

* **Workspace Owner**: data steward or privacy lead. Approves generator choices and sharing.
* **Workspace Editor**: data engineer or platform engineer. Implements configuration changes.
* **Workspace Reader**: testers, analysts, or trainees. Can run jobs but should not change rules.

See [Workspace & user management](/overview/get-started/syntho-bootcamp/8.-workspace-and-user-management.md) and [Share a workspace](/setup-workspaces/share-a-workspace.md).

#### Access control checklist

* Use **read-only** access to the **source** database for day-to-day users.
* Restrict **who can view source data** in the UI. Don’t default to broad access.
* Use a **dedicated destination** per environment (`dev`, `test`, `accept`, `sandbox`).
* Keep external recipients in a **separate workspace** with stricter settings.

#### Evidence for auditors (lightweight but useful)

Capture these items per delivery or refresh:

* Workspace name, owner, and intended audience.
* PII scan results and the final list of “PII columns + applied generator type”.
* Any enabled privacy controls (e.g., rare category protection, free-text de-identification scope).
* Validation output and/or QA report (when applicable).
* Approval notes (ticket link, privacy board sign-off, or risk acceptance).

#### Automation and deployment (reference)

You can automate workspace setup, scans, and generation runs via the [Syntho REST API](/syntho-api/syntho-rest-api.md).

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.syntho.ai/overview/get-started/use-cases-and-configuration/use-case-12-training-and-education.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
