# Use Case 9: Data sharing & monetization

Use this use case when data must be made available beyond the original production boundary. Approvals by privacy officers or boards may be needed.

### What problem this use case solves

Teams need to make data available to other consumers. They need strong privacy protection with measurable quality.

Classic anonymization can leak secondary identifiers. It can also reduce utility through broad generalization.

### When to choose this use case

Pick this when data leaves your team or organization.

If you’re unsure, start with **Synthesize all**, avoid [Consistent mapping](/configure-a-data-generation-job/configure-column-settings/consistent-mapping.md), and run the [QA report](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation/qa-report.md) before each release.

* You share data with partners, vendors, or customers.
* You need sign-off and a clear privacy posture.
* You need privacy and utility evidence per delivery.
* You need low linkability across releases and recipients.
* Avoid stable pseudonyms unless required by contract.

### When to avoid this use case

Skip this when the data stays internal.

* You only need internal demo datasets. Use [Use Case 3: Demo Data](/overview/get-started/use-cases-and-configuration/use-case-3-demo-data.md).
* You only need internal training datasets. Use [Use Case 12: Training & Education](/overview/get-started/use-cases-and-configuration/use-case-12-training-and-education.md).
* You need stable, refreshable access for internal analysts. Use [Use Case 7: Analytics Sandboxes](/overview/get-started/use-cases-and-configuration/use-case-7-analytics-sandboxes.md).
* You need reversible mapping back to original records for internal debugging. Prefer de-identification with controlled access and tight auditing.

### Recommended Syntho configuration

This setup is optimized for **external sharing with strong privacy posture**. You minimize linkability. You validate privacy and utility before distribution.

{% stepper %}
{% step %}

#### Prerequisites

**Checklist**

* [ ] Recipient, purpose, retention period defined.
* [ ] Share contract defined (view or extract).
* [ ] Approval/evidence requirements defined (QA report, privacy board).

{% hint style="warning" %}
Avoid stable pseudonyms unless contractually required. They increase linkability across releases.
{% endhint %}

* Use the [Prerequisites](/overview/get-started/prerequisites.md) checklist.
  {% endstep %}

{% step %}

#### Source & destination management

Create one workspace per external consumer or per policy. This prevents accidental reuse of a “less strict” config.

#### Baseline rules

* Keep the **source stable**. Prefer snapshots or back-ups.
* Avoid a **live production** source for iterative work.
* Keep the **destination isolated**. Never write into production.
* Keep **schemas aligned** between source, workspace and destination.
* Use **views** when you need only a subset of the original database.

#### Lifecycle rule of thumb

* Keep the source connection when you expect schema changes.
* Remove the source connection when you expect a new run only much later.
* Revalidate after schema changes. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).

**Nuances for this use case**

* Lock down who can change generators. Treat generator changes like a release.
* Prefer “pull once, publish many”. Generate into staging, then distribute from the output.
* Don’t share the same database boundary as production. It blurs governance and access control.
* Don’t reuse a workspace for two recipients. One recipient’s requirements can silently weaken another’s posture.
* [Create a workspace](/setup-workspaces/create-a-workspace.md)
* [Share a workspace](/setup-workspaces/share-a-workspace.md)
  {% endstep %}

{% step %}

#### Configure generators

**Workspace initialization mode**

Choose a [workspace mode](/setup-workspaces/create-a-workspace/workspace-modes.md). It applies baseline generator suggestions during workspace creation.

Recommended modes for this use case:

* **Synthesize all** when you want maximum unlinkability and strong statistical utility (best default for external sharing).
* **Mock all** when you must avoid learning from the source altogether (highest separation, often lower utility).
* **De-identify** only when the sharing contract explicitly requires production-like multi-table behavior and you can accept higher linkage risk.

**AI-generated synthesis**

This is typically the best option for external sharing. Use it for **strong unlinkability** with good statistical utility.

**Example (partner dataset as one table):** create `partner_share_view` with only approved columns, then AI synthesize it into a single share table. Apply rare-category protection before delivery.

**Rule-based generation**

Use this to enforce **release policy** rules such as bucketing, top-coding, or redaction of sensitive derived fields. Use [Calculated columns](/configure-a-data-generation-job/configure-column-settings/calculated-columns.md) to encode these policies.

**Example (k-anonymity-friendly bucketing):** replace `age` with calculated `age_band` and replace `income` with top-coded buckets. This reduces re-identification risk while keeping utility.

```excel-formula
// New column: age_band
IFS(
  [age] < 18,  "0-17",
  [age] < 35,  "18-34",
  [age] < 55,  "35-54",
  TRUE,        "55+"
)
```

```excel-formula
// New column: income_bucket (top-coded)
SWITCH(TRUE,
  [income] >= 200000, "200k+",
  [income] >= 100000, "100-199k",
  [income] >= 50000,  "50-99k",
  "0-49k"
)
```

**Masking**

Use this only when you must keep **format-valid values** for consumer systems. Be careful with stable pseudonyms across releases.

**Example (format contract without linkability):** mask `customer_reference` to a UUID format **without** consistent mapping so the same person cannot be linked across deliveries.

**Hybrid**

Use this when you need synthesis for unlinkability, plus explicit policy constraints. It mirrors the “rare scenario / policy guardrails” mindset from [Example data generation scenarios](/overview/get-started/syntho-bootcamp/example-data-generation-scenarios.md).

**Example (synthesize + guardrails):**

1. AI synthesize a single “share contract” view (entity table).
2. Add calculated columns for bucketing and top-coding.
3. Use rare category protection and additional privacy controls before release.

One practical guardrail is removing day-level timestamps while keeping time utility:

```excel-formula
// New column: event_month (coarsen timestamps for sharing)
CONCATENATE(YEAR([event_date]), "-", MONTH([event_date]))
```

Then exclude the original `event_date` from the shared dataset.

**Minimal configuration steps**

1. Build a share-contract view with approved columns only.
2. Prefer **AI synthesize** on that view.
3. Add calculated columns for bucketing/top-coding.
4. Run the [QA report](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation/qa-report.md) and capture evidence for release.

* [Automatic PII discovery with PII scanner](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md)
* [Manage personally identifiable information (PII)](/configure-a-data-generation-job/manage-personally-identifiable-information-pii.md)

{% hint style="warning" %}
Avoid [Consistent mapping](/configure-a-data-generation-job/configure-column-settings/consistent-mapping.md) unless the sharing contract explicitly requires stable pseudonyms.
{% endhint %}
{% endstep %}

{% step %}

#### Handle keys and relationships (relational schemas)

If you share a **single flattened table** (common), you can skip this step.

Only keep multi-table relationships if the consumer truly needs joins. Every join key increases linkage risk.

If you must keep multiple tables, keep join keys consistent and document them. Otherwise, prefer a flattened entity table for sharing.

* [Key generators](/configure-a-data-generation-job/configure-column-settings/key-generators.md)

**Single-table sharing is the common path**

Many external consumers want one table that is easy to import and understand.

Typical approach:

1. Create a **view** that represents the shareable dataset (entity table).
2. Use that view as the input to Syntho.
3. Publish the generated output as a single table or file export.

See [Use SQL views as input tables](/setup-workspaces/create-a-workspace/use-sql-views-as-input-tables.md).
{% endstep %}

{% step %}

#### Validate and sync

Run the [QA report](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation/qa-report.md) when available. Treat the QA output as part of your release evidence.

Revalidate on every delivery. A small schema change can reintroduce sensitive fields.

**Privacy metrics and approvals (what to document)**

Treat sharing like a controlled release. Privacy reviewers typically expect:

* The intended recipient, purpose, and retention period.
* A list of removed/transformed identifiers and any stable pseudonyms used.
* Utility and privacy evidence (QA report where applicable, plus privacy controls used).
* A clear threshold statement (example): “k-anonymity ≥ 10 on {age\_band, region, product} after rare category protection”.

If your organization has a privacy board, capture sign-off before distributing the dataset.

**Package and distribute (practical)**

Make the dataset easier (and safer) to consume:

* Include a short data dictionary (column meaning, units, allowed values).
* Include metadata: refresh date, version, and known limitations.
* Use explicit licensing/usage terms when data leaves your organization.
  {% endstep %}

{% step %}

#### Tune generation settings

Apply [Additional privacy controls](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation/privacy-controls.md) before shipping. These controls matter more here than runtime.

Tune performance only after privacy posture is accepted. Speed is not the primary constraint for sharing.
{% endstep %}
{% endstepper %}

### Common pitfalls & misconfigurations

#### Use-case specific pitfalls

* Using consistent mapping for identifiers in external shares.
* Shipping outputs without capturing QA/validation evidence.
* Accidentally leaking indirect identifiers through rare categories.
  * Review rare category protection. See [AI synthesize](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md).

<details>

<summary>General pitfalls</summary>

These pitfalls show up in most projects:

* Running full-scale jobs before a small validation run.
* Skipping workspace validation/sync after schema changes. Use [Validate and synchronize workspace](/configure-a-data-generation-job/generation-and-validation/validate-and-synchronize-workspace.md).
* Breaking relational integrity (missing PK/FK setup, missing foreign keys, missing virtual foreign keys). Start with [Manage foreign keys](/configure-a-data-generation-job/manage-foreign-keys.md) and [virtual foreign keys](/configure-a-data-generation-job/manage-foreign-keys/add-virtual-foreign-keys/add-virtual-foreign-keys.md).
* Leaving sensitive columns on [**Duplicate**](/configure-a-data-generation-job/configure-column-settings/duplicate.md), or trusting the [PII scan](/configure-a-data-generation-job/privacy-dashboard/automatic-pii-discovery-with-pii-scanner.md) without reviewing false positives/negatives.
* Overusing [**Consistent mapping**](/configure-a-data-generation-job/configure-column-settings/consistent-mapping.md) (it slows down data generation and increases linkability).

</details>

<details>

<summary>Governance, compliance, and automation</summary>

#### Governance, access control, and audit evidence

Keep the workspace configuration as a controlled artifact. Treat it like “test data release”.

#### Recommended roles

* **Workspace Owner**: data steward or privacy lead. Approves generator choices and sharing.
* **Workspace Editor**: data engineer or platform engineer. Implements configuration changes.
* **Workspace Reader**: testers, analysts, or trainees. Can run jobs but should not change rules.

See [Workspace & user management](/overview/get-started/syntho-bootcamp/8.-workspace-and-user-management.md) and [Share a workspace](/setup-workspaces/share-a-workspace.md).

#### Access control checklist

* Use **read-only** access to the **source** database for day-to-day users.
* Restrict **who can view source data** in the UI. Don’t default to broad access.
* Use a **dedicated destination** per environment (`dev`, `test`, `accept`, `sandbox`).
* Keep external recipients in a **separate workspace** with stricter settings.

#### Evidence for auditors (lightweight but useful)

Capture these items per delivery or refresh:

* Workspace name, owner, and intended audience.
* PII scan results and the final list of “PII columns + applied generator type”.
* Any enabled privacy controls (e.g., rare category protection, free-text de-identification scope).
* Validation output and/or QA report (when applicable).
* Approval notes (ticket link, privacy board sign-off, or risk acceptance).

#### Automation and deployment (reference)

You can automate workspace setup, scans, and generation runs via the [Syntho REST API](/syntho-api/syntho-rest-api.md).

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.syntho.ai/overview/get-started/use-cases-and-configuration/use-case-9-data-sharing-and-monetization.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
