# Workspace default settings

The **Workspace default settings** menu allows to configure default parameters for workspaces. These settings ensure consistent behavior for data processing, privacy protection, and synthetic data generation. Below is a detailed explanation of the available options.

{% hint style="info" %}
These defaults may vary by deployment/version; your admin may expose a subset.
{% endhint %}

## Access the workspace default settings

Note that user should be an **Owner** or **Editor** to access Workspace Default Settings.

1. Go to **Edit workspace** via **Saved workspaces** panel or from **Configuration panel**.
2. Select **Default settings**.

## **How to modify settings**

1. Access the **Workspace default settings** menu.
2. Modify the required values directly.
3. **Confirm** changes to apply them to the workspace.

## **Configuration options**

Below is an overview of the default settings and their functionalities.

{% hint style="info" %}
`pii_model_settings` controls models used for PII scanning/mocking. `default_text_processor_model_settings` uses the same structure (engine/models/gpu), but applies to text processing defaults.
{% endhint %}

### Consistency (seed) settings

For more information please see [Configure to use other NLP models (limited support)](https://docs.syntho.ai/configure-a-data-generation-job/configure-column-settings/duplicate/automatic-pii-discovery-and-de-identification-in-free-text-columns#configure-to-use-other-nlp-models-limited-support).

### AI synthesis (performance)

| **Parameter**                    | **Default**  | **Possible Values**                   | **Description**                                                                                                                                                       |
| -------------------------------- | ------------ | ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `n_parallel_pipeline_processes`  | 1            | Any integer ≥ 1                       | Number of column-level processing tasks that run in parallel. Higher values can speed up runs, but increase CPU/memory usage.                                         |
| `default_n_training_rows`        | 100000       | Any integer ≥ 1                       | Default [number of rows](https://docs.syntho.ai/configure-a-data-generation-job/configure-table-settings#adjust-the-number-of-rows-to-generate) used to train models. |
| `default_read_random_subset`     | `false`      | `true` \| `false` (Boolean)           | If `true`, components that support it may read a random subset of rows by default.                                                                                    |
| `default_max_n_feat_per_model`   | -1           | Any integer (use `-1` for “no limit”) | Default cap on number of features per AI synthesis model.                                                                                                             |
| `default_feat_model_train_order` | `"as_given"` | Currently supported: `"as_given"`     | Default training order for AI synthesis models.                                                                                                                       |

### AI synthesis (privacy)

| **Parameter**                       | **Default** | **Possible Values**         | **Description**                                                                                                                                                                                                                                                    |
| ----------------------------------- | ----------- | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `default_sample_noise_ratio`        | 0.0001      | Any number                  | Default noise ratio used by components that support noise injection.                                                                                                                                                                                               |
| `default_noise_factor`              | 0.0001      | Any number                  | Default factor used by components that scale noise based on data.                                                                                                                                                                                                  |
| `default_min_sample_size`           | 5           | Any integer ≥ 1             | Default minimum sample size used by components during training.                                                                                                                                                                                                    |
| `default_cardinality_threshold`     | 10          | Any integer ≥ 1             | Categories with occurrences below this threshold are treated as rare (see [rare category protection](https://docs.syntho.ai/configure-a-data-generation-job/configure-column-settings/ai-powered-generation#rare-category-protection)).                            |
| `default_rare_category_replacement` | `"*"`       | Any string                  | Replacement value for rare categories.                                                                                                                                                                                                                             |
| `default_clip_threshold`            | 0           | Any number                  | Default clip threshold used by components that support clipping/extreme-value limiting.                                                                                                                                                                            |
| `default_pii_mock_replace`          | `false`     | `true` \| `false` (Boolean) | Default for replacing detected PII with mock data where applicable. See [Detect and obfuscate PII](https://docs.syntho.ai/configure-a-data-generation-job/configure-column-settings/duplicate/automatic-pii-discovery-and-de-identification-in-free-text-columns). |

### AI synthesis (sequence model parameters)

| **Parameter**                     | **Default**         | **Possible Values**                               | **Description**                                                                                                                                                                                                                        |
| --------------------------------- | ------------------- | ------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `default_max_sequence_length`     | 10000               | Any integer ≥ 1 (e.g., 100, 1000, 10000)          | Specifies the [maximum sequence length](https://docs.syntho.ai/configure-a-data-generation-job/configure-column-settings/ai-powered-generation/sequence-model#sequence-model-parameters) for sequential data generation or processing. |
| `default_end_of_sequence_token`   | -123456789.98765433 | Any numeric token unlikely to appear in real data | A special marker denoting the end of a sequence, ensure this is not conflicting with real data values.                                                                                                                                 |
| `default_long_sequence_threshold` | 10                  | Any integer ≥ 1 (e.g., 10, 100)                   | Threshold used by components that apply special handling to long sequences during training/processing.                                                                                                                                 |

### Text processing

| **Parameter**                           | **Default**                                                                                                                                                                                                                                                 | **Possible Values**                                                             | **Description**                                                                                                                                                                                                                                                                                                              |
| --------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `default_text_processor_model_settings` | `{"models":[{"lang_code":"en","model_name":"en_core_web_md"},{"lang_code":"nl","model_name":"nl_core_news_md"},{"lang_code":"de","model_name":"de_core_news_md"},{"lang_code":"ja","model_name":"ja_core_news_md"}],"nlp_engine_name":"spacy","gpu":false}` | Object with keys: `models` (array), `nlp_engine_name` (string), `gpu` (boolean) | Models/engine used for text processing defaults. Used by features like [Free text de-identification](https://docs.syntho.ai/overview/get-started/syntho-bootcamp/5.-generators/free-text-de-identification).                                                                                                                 |
| `default_textpii_parallel_jobs`         | 2                                                                                                                                                                                                                                                           | Any integer ≥ 1                                                                 | Parallel jobs used by PII text processing (see [Detect and obfuscate PII](https://docs.syntho.ai/configure-a-data-generation-job/configure-column-settings/duplicate/automatic-pii-discovery-and-de-identification-in-free-text-columns)). Higher values can be faster, but use more CPU/memory and can increase contention. |
| `default_textpii_scan_batch_size`       | 1000                                                                                                                                                                                                                                                        | Any integer ≥ 1                                                                 | Batch size used by PII text processing (see [Detect and obfuscate PII](https://docs.syntho.ai/configure-a-data-generation-job/configure-column-settings/duplicate/automatic-pii-discovery-and-de-identification-in-free-text-columns)). Larger batches can be faster, but use more memory.                                   |
| `default_cutoff_length`                 | 1000                                                                                                                                                                                                                                                        | Any integer ≥ 1                                                                 | Default cutoff length used by components that truncate long text/sequences during processing. See [Free text de-identification](https://docs.syntho.ai/overview/get-started/syntho-bootcamp/5.-generators/free-text-de-identification).                                                                                      |

### Throughput optimization

| **Parameter**               | **Default** | **Possible Values**         | **Description**                                                                                                                                                             |
| --------------------------- | ----------- | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `default_fast_executemany`  | false       | `true` \| `false` (Boolean) | Enables [fast execution](https://docs.syntho.ai/create-a-workspace/connect-to-a-database/microsoft-sql-server#fast-execute-many) for bulk inserts.                          |
| `default_max_pending_tasks` | 1           | Any integer ≥ 1             | Max number of tables/tasks queued for concurrent processing. Higher values can improve throughput, but increase memory usage and can increase database connection pressure. |

### Other

| **Parameter**                                  | **Default** | **Possible Values**         | **Description**                                                                                                                                                     |
| ---------------------------------------------- | ----------- | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `default_consistent_integer_shuffle_threshold` | 0           | Any integer ≥ 0             | Threshold value used by consistent integer shuffle logic. `0` disables the threshold behavior.                                                                      |
| `default_order_by_nr_columns`                  | `[0, 0]`    | Array of two integers       | Default order-by configuration used by [order-by](https://docs.syntho.ai/configure-a-data-generation-job/configure-table-settings#order-by) logic where applicable. |
| `default_exclude_tables`                       | `false`     | `true` \| `false` (Boolean) | If `true`, tables start excluded by default. Related: [Workspace modes](https://docs.syntho.ai/setup-workspaces/create-a-workspace/workspace-modes).                |
