Workspace default settings

The Workspace default settings menu allows to configure default parameters for workspaces. These settings ensure consistent behavior for data processing, privacy protection, and synthetic data generation. Below is a detailed explanation of the available options.

circle-info

These defaults may vary by deployment/version; your admin may expose a subset.

Access the workspace default settings

Note that user should be an Owner or Editor to access Workspace Default Settings.

  1. Go to Edit workspace via Saved workspaces panel or from Configuration panel.

  2. Select Default settings.

How to modify settings

  1. Access the Workspace default settings menu.

  2. Modify the required values directly.

  3. Confirm changes to apply them to the workspace.

Configuration options

Below is an overview of the default settings and their functionalities.

circle-info

pii_model_settings controls models used for PII scanning/mocking. default_text_processor_model_settings uses the same structure (engine/models/gpu), but applies to text processing defaults.

Consistency (seed) settings

For more information please see Configure to use other NLP models (limited support).

AI synthesis (performance)

Parameter

Default

Possible Values

Description

n_parallel_pipeline_processes

1

Any integer ≥ 1

Number of column-level processing tasks that run in parallel. Higher values can speed up runs, but increase CPU/memory usage.

default_n_training_rows

100000

Any integer ≥ 1

Default number of rows used to train models.

default_read_random_subset

false

true | false (Boolean)

If true, components that support it may read a random subset of rows by default.

default_max_n_feat_per_model

-1

Any integer (use -1 for “no limit”)

Default cap on number of features per AI synthesis model.

default_feat_model_train_order

"as_given"

Currently supported: "as_given"

Default training order for AI synthesis models.

AI synthesis (privacy)

Parameter

Default

Possible Values

Description

default_sample_noise_ratio

0.0001

Any number

Default noise ratio used by components that support noise injection.

default_noise_factor

0.0001

Any number

Default factor used by components that scale noise based on data.

default_min_sample_size

5

Any integer ≥ 1

Default minimum sample size used by components during training.

default_cardinality_threshold

10

Any integer ≥ 1

Categories with occurrences below this threshold are treated as rare (see rare category protection).

default_rare_category_replacement

"*"

Any string

Replacement value for rare categories.

default_clip_threshold

0

Any number

Default clip threshold used by components that support clipping/extreme-value limiting.

default_pii_mock_replace

false

true | false (Boolean)

Default for replacing detected PII with mock data where applicable. See Detect and obfuscate PII.

AI synthesis (sequence model parameters)

Parameter

Default

Possible Values

Description

default_max_sequence_length

10000

Any integer ≥ 1 (e.g., 100, 1000, 10000)

Specifies the maximum sequence length for sequential data generation or processing.

default_end_of_sequence_token

-123456789.98765433

Any numeric token unlikely to appear in real data

A special marker denoting the end of a sequence, ensure this is not conflicting with real data values.

default_long_sequence_threshold

10

Any integer ≥ 1 (e.g., 10, 100)

Threshold used by components that apply special handling to long sequences during training/processing.

Text processing

Parameter

Default

Possible Values

Description

default_text_processor_model_settings

{"models":[{"lang_code":"en","model_name":"en_core_web_md"},{"lang_code":"nl","model_name":"nl_core_news_md"},{"lang_code":"de","model_name":"de_core_news_md"},{"lang_code":"ja","model_name":"ja_core_news_md"}],"nlp_engine_name":"spacy","gpu":false}

Object with keys: models (array), nlp_engine_name (string), gpu (boolean)

Models/engine used for text processing defaults. Used by features like Free text de-identification.

default_textpii_parallel_jobs

2

Any integer ≥ 1

Parallel jobs used by PII text processing (see Detect and obfuscate PII). Higher values can be faster, but use more CPU/memory and can increase contention.

default_textpii_scan_batch_size

1000

Any integer ≥ 1

Batch size used by PII text processing (see Detect and obfuscate PII). Larger batches can be faster, but use more memory.

default_cutoff_length

1000

Any integer ≥ 1

Default cutoff length used by components that truncate long text/sequences during processing. See Free text de-identification.

Throughput optimization

Parameter

Default

Possible Values

Description

default_fast_executemany

false

true | false (Boolean)

Enables fast execution for bulk inserts.

default_max_pending_tasks

1

Any integer ≥ 1

Max number of tables/tasks queued for concurrent processing. Higher values can improve throughput, but increase memory usage and can increase database connection pressure.

Other

Parameter

Default

Possible Values

Description

default_consistent_integer_shuffle_threshold

0

Any integer ≥ 0

Threshold value used by consistent integer shuffle logic. 0 disables the threshold behavior.

default_order_by_nr_columns

[0, 0]

Array of two integers

Default order-by configuration used by order-by logic where applicable.

default_exclude_tables

false

true | false (Boolean)

If true, tables start excluded by default. Related: Workspace modes.

Last updated

Was this helpful?