# 10. AI synthesis: Data pre-processing when using

## Preparing your data

When using AI synthesis, to ensure the highest quality of synthetic data output, proper data preparation is essential. Below are guidelines on how to best prepare your dataset before initiating a generation job.

***

### Preparing your data – entity table

When working with standalone or flat tables, consider the following practices:

1. [**Maintain a column-to-row ratio of at least 1:500**](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#entity-tables)\
   This minimizes privacy risks and improves generalization. For example, a table with 6 columns should ideally have a minimum of 3,000 rows.
2. [**Each entity should be described in one row**<br>](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#entity-tables)One row per unique entity avoids data fragmentation.
3. [**Ensure each row is independent**](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#entity-tables)\
   The order of rows should not affect the dataset. Each row must be self-contained and analyzable on its own.
4. [**Avoid privacy-sensitive column names**](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#entity-tables)\
   For instance, do not use names like `patient_a_medications`. Instead, consolidate sensitive names under generic columns like `patient`.
5. [**Remove derived or redundant columns**](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#entity-tables)\
   If one column is a direct function of another (e.g., `duration = end_time - start_time`), remove the derived column. This also includes categorical redundancies, such as having both `treatment` and `treatment_category`.

***

By adhering to these data preparation guidelines, you ensure that your AI model learns from meaningful patterns, avoids overfitting on redundant information, and respects privacy constraints. This leads to stronger and more reliable synthetic data generation.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.syntho.ai/overview/get-started/syntho-bootcamp/10.-ai-synthesis-data-pre-processing-when-using.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
