# Workspace default settings

The **Workspace default settings** menu allows to configure default parameters for workspaces. These settings ensure consistent behavior for data processing, privacy protection, and synthetic data generation. Below is a detailed explanation of the available options.

{% hint style="info" %}
These defaults may vary by deployment/version; your admin may expose a subset.
{% endhint %}

## Access the workspace default settings

Note that user should be an **Owner** or **Editor** to access Workspace Default Settings.

1. Go to **Edit workspace** via **Saved workspaces** panel or from **Configuration panel**.
2. Select **Default settings**.

## **How to modify settings**

1. Access the **Workspace default settings** menu.
2. Modify the required values directly.
3. **Confirm** changes to apply them to the workspace.

## **Configuration options**

Below is an overview of the default settings and their functionalities.

{% hint style="info" %}
`pii_model_settings` controls models used for PII scanning/mocking. `default_text_processor_model_settings` uses the same structure (engine/models/gpu), but applies to text processing defaults.
{% endhint %}

### Consistency (seed) settings

For more information please see [Configure to use other NLP models (limited support)](/configure-a-data-generation-job/configure-column-settings/duplicate/automatic-pii-discovery-and-de-identification-in-free-text-columns.md#configure-to-use-other-nlp-models-limited-support).

### AI synthesis (performance)

| **Parameter**                    | **Default**  | **Possible Values**                   | **Description**                                                                                                                                    |
| -------------------------------- | ------------ | ------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `n_parallel_pipeline_processes`  | 1            | Any integer ≥ 1                       | Number of column-level processing tasks that run in parallel. Higher values can speed up runs, but increase CPU/memory usage.                      |
| `default_n_training_rows`        | 100000       | Any integer ≥ 1                       | Default [number of rows](/configure-a-data-generation-job/configure-table-settings.md#adjust-the-number-of-rows-to-generate) used to train models. |
| `default_read_random_subset`     | `false`      | `true` \| `false` (Boolean)           | If `true`, components that support it may read a random subset of rows by default.                                                                 |
| `default_max_n_feat_per_model`   | -1           | Any integer (use `-1` for “no limit”) | Default cap on number of features per AI synthesis model.                                                                                          |
| `default_feat_model_train_order` | `"as_given"` | Currently supported: `"as_given"`     | Default training order for AI synthesis models.                                                                                                    |

### AI synthesis (privacy)

| **Parameter**                       | **Default** | **Possible Values**         | **Description**                                                                                                                                                                                                                                                                                                                                          |
| ----------------------------------- | ----------- | --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `default_sample_noise_ratio`        | 0.0001      | Any number                  | <p>Default noise ratio used by components that support noise injection. </p><p></p><p>The noise factor will be relative  by having a decimal value lower than 1, where a value of 0.001 will lead to better privacy than 0.0001. The noise factor will be absolute, for integer values above 1, where higher values add more noise, so more privacy.</p> |
| `default_noise_factor`              | 0.0001      | Any number                  | Default factor used by components that scale noise based on data.                                                                                                                                                                                                                                                                                        |
| `default_min_sample_size`           | 5           | Any integer ≥ 1             | Default minimum sample size used by components during training.                                                                                                                                                                                                                                                                                          |
| `default_cardinality_threshold`     | 10          | Any integer ≥ 1             | Categories with occurrences below this threshold are treated as rare (see [rare category protection](/configure-a-data-generation-job/configure-column-settings/ai-powered-generation.md#rare-category-protection)).                                                                                                                                     |
| `default_rare_category_replacement` | `"*"`       | Any string                  | Replacement value for rare categories.                                                                                                                                                                                                                                                                                                                   |
| `default_clip_threshold`            | 0           | Any number                  | Default clip threshold used by components that support clipping/extreme-value limiting.                                                                                                                                                                                                                                                                  |
| `default_pii_mock_replace`          | `false`     | `true` \| `false` (Boolean) | Default for replacing detected PII with mock data where applicable. See [Detect and obfuscate PII](/configure-a-data-generation-job/configure-column-settings/duplicate/automatic-pii-discovery-and-de-identification-in-free-text-columns.md).                                                                                                          |

### Text processing

| **Parameter**                           | **Default**                                                                                                                                                                                                                                                 | **Possible Values**                                                             | **Description**                                                                                                                                                                                                                                                                                           |
| --------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `default_text_processor_model_settings` | `{"models":[{"lang_code":"en","model_name":"en_core_web_md"},{"lang_code":"nl","model_name":"nl_core_news_md"},{"lang_code":"de","model_name":"de_core_news_md"},{"lang_code":"ja","model_name":"ja_core_news_md"}],"nlp_engine_name":"spacy","gpu":false}` | Object with keys: `models` (array), `nlp_engine_name` (string), `gpu` (boolean) | Models/engine used for text processing defaults. Used by features like [Free text de-identification](/overview/get-started/syntho-bootcamp/5.-generators/free-text-de-identification.md).                                                                                                                 |
| `default_textpii_parallel_jobs`         | 2                                                                                                                                                                                                                                                           | Any integer ≥ 1                                                                 | Parallel jobs used by PII text processing (see [Detect and obfuscate PII](/configure-a-data-generation-job/configure-column-settings/duplicate/automatic-pii-discovery-and-de-identification-in-free-text-columns.md)). Higher values can be faster, but use more CPU/memory and can increase contention. |
| `default_textpii_scan_batch_size`       | 1000                                                                                                                                                                                                                                                        | Any integer ≥ 1                                                                 | Batch size used by PII text processing (see [Detect and obfuscate PII](/configure-a-data-generation-job/configure-column-settings/duplicate/automatic-pii-discovery-and-de-identification-in-free-text-columns.md)). Larger batches can be faster, but use more memory.                                   |
| `default_cutoff_length`                 | 1000                                                                                                                                                                                                                                                        | Any integer ≥ 1                                                                 | Default cutoff length used by components that truncate long text/sequences during processing. See [Free text de-identification](/overview/get-started/syntho-bootcamp/5.-generators/free-text-de-identification.md).                                                                                      |

### Throughput optimization

| **Parameter**               | **Default** | **Possible Values**         | **Description**                                                                                                                                                             |
| --------------------------- | ----------- | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `default_fast_executemany`  | false       | `true` \| `false` (Boolean) | Enables [fast execution](/setup-workspaces/create-a-workspace/connect-to-a-database/microsoft-sql-server.md#fast-execute-many) for bulk inserts.                            |
| `default_max_pending_tasks` | 1           | Any integer ≥ 1             | Max number of tables/tasks queued for concurrent processing. Higher values can improve throughput, but increase memory usage and can increase database connection pressure. |

### Other

| **Parameter**                                  | **Default** | **Possible Values**         | **Description**                                                                                                                                  |
| ---------------------------------------------- | ----------- | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| `default_consistent_integer_shuffle_threshold` | 0           | Any integer ≥ 0             | Threshold value used by consistent integer shuffle logic. `0` disables the threshold behavior.                                                   |
| `default_order_by_nr_columns`                  | `[0, 0]`    | Array of two integers       | Default order-by configuration used by [order-by](/configure-a-data-generation-job/configure-table-settings.md#order-by) logic where applicable. |
| `default_exclude_tables`                       | `false`     | `true` \| `false` (Boolean) | If `true`, tables start excluded by default. Related: [Workspace modes](/setup-workspaces/create-a-workspace/workspace-modes.md).                |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.syntho.ai/setup-workspaces/workspace-default-settings.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
