# Duplicate

Duplicate can be especially useful in the following situations:

1. When data does not contain personally identifiable information (PII) or sensitive elements, duplicating it allows for efficient replication without modification.

## Apply duplicate

1. Open your **Workspace**.
2. From the **Main hub** or **Table view** tab, select the column where you want to apply a generator.
3. Under **Generator,** select **Duplicate** to copy the column from the source table to the destination table *as-is.*
4. Set the relevant duplicate parameters.
5. Select **Confirm**.

<figure><img src="/files/JvTBKml1mLOpZwSu0BW3" alt="" width="563"><figcaption><p>Selecting Duplicate in Generation Method panel</p></figcaption></figure>

{% hint style="info" %}
**Note:** When you duplicate a column, the column is still used during the training process, as it can contain valuable information.

This means, however, that excluding columns *cannot* be used to to reduce hardware requirements or increase the speed of your synthetic data jobs.
{% endhint %}

## Shuffle data

Enable the **Shuffle** button to shuffle the generated values, while maintaining the overall frequency of values. For example, if you have 4 High, 3 Medium and 5 Low values in the source database, the same counts of values will exist in the destination database, except they are shuffled appear in a different order.

Shuffle works batch-wise. Each generated batch is shuffled independently, based on the configured **Batch size**.

Note that `NULL` values are also considered a distinct value, and will be shuffled like any other value.

## Detect and obfuscate PII

{% hint style="warning" %}
**Caution***:* Using the same underlying modelling techniques as the [PII text obfuscation module](/configure-a-data-generation-job/configure-column-settings/duplicate/automatic-pii-discovery-and-de-identification-in-free-text-columns.md), the *Detect and obfuscate PII* feature can take very long to run.
{% endhint %}

Enable the toggle **Detect and obfuscate PII** to use Syntho's [PII text obfuscation module](/configure-a-data-generation-job/configure-column-settings/duplicate/automatic-pii-discovery-and-de-identification-in-free-text-columns.md) to detect and obfuscate PII entities in columns containing free text information.

When enabled, select the correct **Locale,** as based on the data in your text column, to ensure Syntho uses the appropriate language models to identify and obfuscate PII in your text column.

After enabling this options and setting the right locale, any identified PII entities are obfuscated and then copied to the destination table.

<figure><img src="/files/5RPjwsueyebe85qLEYGM" alt="" width="538"><figcaption><p>Detect and obfuscate PII</p></figcaption></figure>

### **Rare category protection**

Syntho automatically replaces any infrequent categorical values in a column with a user-defined value, ensuring that sensitive data does not appear in the synthetic output.

* **Rare category protection threshold**: Column values that appear with a frequency at or below this threshold are automatically replaced to prevent data leakage.
* **Rare category replacement value**: Values meeting the frequency threshold are substituted with this user-specified replacement value.

<figure><img src="/files/j5Jk1PoZOEXA911NyqFj" alt="" width="530"><figcaption><p>Rare category protection</p></figcaption></figure>

By default, the rare category protection threshold is set to 10, meaning any value that appears 10 times or fewer will be replaced. The default replacement value is an asterisk (\*), so all values at or below the threshold are replaced with (\*).

## **Ordering and indexing considerations**

To ensure accurate ordering, please see [ordering and indexing considerations](/configure-a-data-generation-job/configure-column-settings/consistent-mapping.md#ordering-and-indexing-considerations).

## Supported data types

| Generator | Supported data types                                                                          |
| --------- | --------------------------------------------------------------------------------------------- |
| Duplicate | Categorical, Continuous, Discrete, Datetime, Bytes, Bool, UUID, JSON, XML, Geo, Sets, Unknown |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.syntho.ai/configure-a-data-generation-job/configure-column-settings/duplicate.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
