Workspace default settings
The Workspace default settings menu allows to configure default parameters for workspaces. These settings ensure consistent behavior for data processing, privacy protection, and synthetic data generation. Below is a detailed explanation of the available options.
Access the workspace default settings
Note that user should be an Owner or Editor to access Workspace Default Settings.
Create or open the workspace.
Get the workspace ID https://demo.syntho.ai/hub/<WORKSPACE-ID>/hub/?tab=Main%20hub
Workspace ID can be i.e. c5f5aa8b-3491-4025-b637-1ccf6e83ec24
How to modify settings
Access the Workspace default settings menu.
Modify the required values directly.
Save changes to apply them to the workspace.
Configuration options
Below is an overview of the default settings and their functionalities:
General settings
Parameter
Default
Possible Values
Description
seed_value
42
Any integer (e.g., 0, 42, 1234)
Sets the seed used by generators with consistent mapping enabled or inherently consistent (e.g., hash) generators.
Consistent Mapping: When use_seed
is true
, changing seed_value
will rotate the mapping so the same input maps to a different output, allowing you to periodically change synthetic outputs for security.
use_seed
false
true
or false
(Boolean)
If true
, all generators that support consistent mapping (mock, mask, hash) will use your specified seed_value
. If false
, the system does not use any seed for consistent mapping, meaning changes to seed_value
have no effect on the mapping scheme.
PII model settings
Specifies the natural language processing (NLP) models for PII detection and mockers.
Example:
Language code:
"en"
with model name"en_core_web_md"
.Supported languages: English, Dutch, German, Japanese.
Parameter
Default
Possible Values
Description
nlp_engine_name
"spacy"
"spacy"
or any other supported NLP engine name
Determines which NLP engine is used for PII scanning and detection.
models
N/A
A list of dictionaries with "lang_code"
and "model_name"
. E.g.:
[{ "lang_code": "en", "model_name": "en_core_web_md" }]
Each item defines a specific language and its associated model for PII detection.
gpu
false
true
or false
(Boolean)
Toggles GPU acceleration. When true
, models that support GPU will run faster but require a compatible GPU setup.
For more information please see Configure to use other NLP models (limited support).
Initialization and data handling
Parameter
Default
Possible Values
Description
initialization_mode
"SCRATCH"
"SCRATCH"
, "APPEND"
, or "READ_ONLY"
Defines how the workspace is initialized.
• "SCRATCH"
starts an empty workspace.
• "APPEND"
adds data to existing tables.
• "READ_ONLY"
prevents any modifications.
key_generation_method
"duplicate"
"generate"
, "duplicate"
, "hash"
Determines the method for generating keys.
• "generate"
creates new keys.
• "duplicate"
copies source keys.
• "hash"
applies a hash function.
n_parallel_pipeline_processes
1
Any integer (e.g., 1, 2, -1 for all CPUs)
Controls the number of column-processing jobs (fitting, transforming, inverse-transforming) that run in parallel. Higher values can speed up processing but use more system resources.
default_n_training_rows
100000
Any positive integer (e.g., 10, 1000, 100000)
Sets the default number of rows used to train synthetic data models. If the input dataset exceeds this number, only the specified number of rows is used for training (unless otherwise configured).
Privacy control defaults
Parameter
Default
Possible Values
Description
default_sample_noise_ratio
0.0001
Any positive numeric value (float).
• 0 < ratio ≤ 1
: relative noise
• > 1
: absolute std. dev.
Specifies the level of noise added to synthetic data. • Between 0 and 1: noise is added as a relative ratio. • Above 1: noise is treated as an absolute standard deviation.
default_min_sample_size
5
Any positive integer value
Minimum sample size used for model training
default_cardinality_threshold
10
Any integer ≥ 1 (e.g., 5, 10, 20)
Any category in a categorical column with occurrences below this threshold is considered “rare” and gets replaced.
default_rare_category_replacement
"*"
Any string (e.g., "*"
or "other"
)
Placeholder for rare categories to preserve privacy.
default_clip_threshold
0
Any numeric value (integer or float).
• 0
: no clipping
• Positive number: outlier limit
Limits extreme outliers in numeric columns by “clipping” values above/below a certain threshold.
Text processing
Parameter
Default
Possible Values
Description
default_text_processor_model_settings
No single value
Same format as pii_model_settings
(language models, engine, GPU)
Specifies NLP models for advanced text processing tasks (non-PII or general text analytics).
default_textpii_parallel_jobs
2
Any integer ≥ 1; -1
to use all available processors
Defines how many parallel jobs are used when scanning text for PII. Increasing the number of jobs speeds up scanning but uses more CPU resources.
default_textpii_scan_batch_size
1000
Any integer ≥ 1 (e.g., 100, 1000, 5000)
Batch size for PII detection in text columns. Larger batches can be faster but may consume more memory.
Sequence model parameters
Parameter
Default
Possible Values
Description
default_max_sequence_length
10000
Any integer ≥ 1 (e.g., 100, 1000, 10000)
Specifies the maximum sequence length for sequential data generation or processing.
default_end_of_sequence_token
-123456789.98765433
Any numeric token unlikely to appear in real data
A special marker denoting the end of a sequence, ensuring it is not confused with real data values.
default_long_sequence_threshold
10
Any integer ≥ 1 (e.g., 10, 100)
Defines a limit for the length of data sequences used in training, adjusting the longest sequences to the length of the Nth sequence. This helps prevent large sequences from overwhelming memory or computational resources.
Optimization and advanced settings
Parameter
Default
Possible Values
Description
default_ray_memory_optimization
true
true
or false
(Boolean)
When true
, the system explicitly releases idle Ray workers between jobs, reducing memory usage. When false
, workers remain alive, reducing overhead for frequent runs.
default_drop_indexes
false
true
or false
(Boolean)
Temporarily drops indexes before inserting synthetic data and re-creates them afterward. Often speeds up inserts but re-building indexes can be time-consuming for large tables.
Other
Parameter
Default
Possible Values
Description
default_locale
"en"
Supports various locales, including: "en"
, "nl"
, "de"
, "ja"
Sets the default locale for language-based processing, such as date parsing or random text generation.
default_order_by_nr_columns
[3, 1]
A list of integers (e.g., [3, 1]
, [1, 2, 3]
)
Defines the order in which columns are processed or modeled, which can be relevant for preserving data order in AI-powered generation or for certain backends.
default_max_pending_tasks
5
Any positive integer
Defines the number of tables that can be processed in parallel when using ranked scheduling. Increasing the value can improve performance through greater concurrency, but also increases memory usage. It's recommended to adjust this setting gradually, start with the default, monitor system performance, and tune based on available memory and database connection limits to ensure a balanced and stable operation.
Last updated
Was this helpful?