> For the complete documentation index, see [llms.txt](https://docs.syntho.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.syntho.ai/overview/get-started/syntho-bootcamp/5.-generators/ai-synthesize.md).

# AI synthesize

[AI synthesize](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md) allows you to synthesize realistic data using machine learning models trained on your original dataset. This method maintains statistical fidelity while ensuring privacy and unlinkability to the source records.

#### When to use

* To create synthetic datasets for machine learning or analytics
* When high statistical accuracy and maximum privacy are required
* To expand datasets while preserving original distributions

#### When not to use

* When working with multiple related tables
* When data consistency across systems is required
* When you need to be able to revert to original records
* If entirely new, unseen text values must be generated
* If the data needs to follow specific rules with 100% certainty​

The Syntho platform supports a wide variety of data types. Under the hood, Syntho uses an encoding scheme where each data type is mapped to one of the following encoding types.

| Data type                                                                                                      | Description                                  |
| -------------------------------------------------------------------------------------------------------------- | -------------------------------------------- |
| [Discrete](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#discrete)       | Numerical counts (e.g. number of visits)     |
| [Continuous](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#continuous)   | Continuous values (e.g. weight, temperature) |
| [Categorical](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#categorical) | Predefined values (e.g. blood type, country) |
| [Datetime](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#datetime)       | Timestamps and dates (e.g. created at)       |

#### Interactive guide: How to apply AI synthesize

Follow the interactive guide below to apply AI synthesize.

{% embed url="<https://www.guidejar.com/guides/4ebdd966-24fe-4c67-b1a4-2ec1690d8c41>" %}

#### [Rare category protection](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#rare-category-protection)

To protect privacy, Syntho can automatically replace infrequent values in categorical columns:

* Threshold: minimum frequency before a value is considered rare (default = 10)
* Replacement: value used to replace rare categories (default = `*`)

#### [Advanced settings](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#advanced-settings)

[Generator-level](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#advanced-generator-settings)

* Max rows used for training: limit data for faster performance
* Take random sample: randomly sample rows for training

[Column-level](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#advanced-column-settings)

* Clipping thresholds: restrict extreme values in numeric/date columns
* Locale: set language model context for text/PII


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.syntho.ai/overview/get-started/syntho-bootcamp/5.-generators/ai-synthesize.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
