Workspace default settings
The Workspace default settings menu allows to configure default parameters for workspaces. These settings ensure consistent behavior for data processing, privacy protection, and synthetic data generation. Below is a detailed explanation of the available options.
These defaults may vary by deployment/version; your admin may expose a subset.
Access the workspace default settings
Note that user should be an Owner or Editor to access Workspace Default Settings.
Go to Edit workspace via Saved workspaces panel or from Configuration panel.
Select Default settings.
How to modify settings
Access the Workspace default settings menu.
Modify the required values directly.
Confirm changes to apply them to the workspace.
Configuration options
Below is an overview of the default settings and their functionalities.
pii_model_settings controls models used for PII scanning/mocking. default_text_processor_model_settings uses the same structure (engine/models/gpu), but applies to text processing defaults.
Consistency (seed) settings
For more information please see Configure to use other NLP models (limited support).
AI synthesis (performance)
Parameter
Default
Possible Values
Description
n_parallel_pipeline_processes
1
Any integer ≥ 1
Number of column-level processing tasks that run in parallel. Higher values can speed up runs, but increase CPU/memory usage.
default_read_random_subset
false
true | false (Boolean)
If true, components that support it may read a random subset of rows by default.
default_max_n_feat_per_model
-1
Any integer (use -1 for “no limit”)
Default cap on number of features per AI synthesis model.
default_feat_model_train_order
"as_given"
Currently supported: "as_given"
Default training order for AI synthesis models.
AI synthesis (privacy)
Parameter
Default
Possible Values
Description
default_sample_noise_ratio
0.0001
Any number
Default noise ratio used by components that support noise injection.
default_noise_factor
0.0001
Any number
Default factor used by components that scale noise based on data.
default_min_sample_size
5
Any integer ≥ 1
Default minimum sample size used by components during training.
default_cardinality_threshold
10
Any integer ≥ 1
Categories with occurrences below this threshold are treated as rare (see rare category protection).
default_rare_category_replacement
"*"
Any string
Replacement value for rare categories.
default_clip_threshold
0
Any number
Default clip threshold used by components that support clipping/extreme-value limiting.
default_pii_mock_replace
false
true | false (Boolean)
Default for replacing detected PII with mock data where applicable. See Detect and obfuscate PII.
AI synthesis (sequence model parameters)
Parameter
Default
Possible Values
Description
default_max_sequence_length
10000
Any integer ≥ 1 (e.g., 100, 1000, 10000)
Specifies the maximum sequence length for sequential data generation or processing.
default_end_of_sequence_token
-123456789.98765433
Any numeric token unlikely to appear in real data
A special marker denoting the end of a sequence, ensure this is not conflicting with real data values.
default_long_sequence_threshold
10
Any integer ≥ 1 (e.g., 10, 100)
Threshold used by components that apply special handling to long sequences during training/processing.
Text processing
Parameter
Default
Possible Values
Description
default_text_processor_model_settings
{"models":[{"lang_code":"en","model_name":"en_core_web_md"},{"lang_code":"nl","model_name":"nl_core_news_md"},{"lang_code":"de","model_name":"de_core_news_md"},{"lang_code":"ja","model_name":"ja_core_news_md"}],"nlp_engine_name":"spacy","gpu":false}
Object with keys: models (array), nlp_engine_name (string), gpu (boolean)
Models/engine used for text processing defaults. Used by features like Free text de-identification.
default_textpii_parallel_jobs
2
Any integer ≥ 1
Parallel jobs used by PII text processing (see Detect and obfuscate PII). Higher values can be faster, but use more CPU/memory and can increase contention.
default_textpii_scan_batch_size
1000
Any integer ≥ 1
Batch size used by PII text processing (see Detect and obfuscate PII). Larger batches can be faster, but use more memory.
default_cutoff_length
1000
Any integer ≥ 1
Default cutoff length used by components that truncate long text/sequences during processing. See Free text de-identification.
Throughput optimization
Parameter
Default
Possible Values
Description
default_max_pending_tasks
1
Any integer ≥ 1
Max number of tables/tasks queued for concurrent processing. Higher values can improve throughput, but increase memory usage and can increase database connection pressure.
Other
Parameter
Default
Possible Values
Description
default_consistent_integer_shuffle_threshold
0
Any integer ≥ 0
Threshold value used by consistent integer shuffle logic. 0 disables the threshold behavior.
default_order_by_nr_columns
[0, 0]
Array of two integers
Default order-by configuration used by order-by logic where applicable.
default_exclude_tables
false
true | false (Boolean)
If true, tables start excluded by default. Related: Workspace modes.
Last updated
Was this helpful?

