Workspace default settings

The Workspace Default Settings menu allows to configure default parameters for workspaces. These settings ensure consistent behavior for data processing, privacy protection, and synthetic data generation. Below is a detailed explanation of the available options.

Access the workspace default settings

Note that user should be an Owner or Editor to access Workspace Default Settings.

  1. Create or open the workspace.

  2. Use the shortcut CTRL + SHIFT + ALT + 0 to open the Workspace Default Settings menu. If this short key is reserved on your system, you can add /global_settings to the end of the workspace URL.

How to modify settings

  1. Access the Workspace Default Settings menu.

  2. Modify the required values directly.

  3. Save changes to apply them to the workspace.

Configuration options

Below is an overview of the default settings and their functionalities:

General Settings:

  • seed_value: (Default: 42) Specifies the random seed value for reproducibility in data generation.

  • use_seed: (Default: false) Enables or disables the use of the seed value.

PII model settings:

  • pii_model_settings

    • Specifies the natural language processing (NLP) models for PII detection and mockers.

    • Example:

    • nlp_engine_name: (Default: "spacy") Sets the NLP engine.

    • gpu: (Default: false) Toggles GPU acceleration for model execution.

Initialization and data handling:

  • initialization_mode: (Default: "SCRATCH") Defines how the workspace is initialized.

  • key_generation_method: (Default: "duplicate") Determines the method for generating keys (e.g., duplicate or hash).

  • n_parallel_pipeline_processes: (Default: 1) Controls the number of parallel pipeline processes.

  • default_n_training_rows: (Default: 100000) Sets the default number of rows used for training.

Privacy control defaults:

  • default_sample_noise_ratio: (Default: 0.001) Specifies the level of noise added to synthetic data.

  • default_cardinality_threshold: (Default: 10) Replaces rare categories in data.

  • default_rare_category_replacement: (Default: "*") Defines the placeholder for rare categories.

  • default_clip_threshold: (Default: 0) Sets clipping limits for numerical data to remove outliers.

Text processing:

  • default_text_processor_model_settings:

    • Similar to pii_model_settings, specifies NLP models for text processing.

    • Supports the same languages and configuration.

  • default_textpii_parallel_jobs: (Default: 2) Number of parallel jobs for PII scanning.

  • default_textpii_scan_batch_size: (Default: 1000) Batch size for PII scanning.

Sequence model parameters:

  • default_max_sequence_length: (Default: 10000) Limits the maximum sequence length.

  • default_end_of_sequence_token: (Default: -123456789.98765433) Placeholder for end-of-sequence tokens.

  • default_long_sequence_threshold: (Default: 10) Threshold for processing long sequences.

Optimization and advanced settings:

  • default_ray_memory_optimization: (Default: true) Toggles optimization for memory usage during parallel processing.

  • default_fast_executemany: (Default: false) Enables fast execution for bulk inserts (if supported).

  • default_drop_indexes: (Default: false) Controls whether indexes are dropped during data generation.

Other:

  • default_locale: (Default: "en") Sets the default locale for language-based processing.

  • default_order_by_nr_columns: (Default: [3, 1]) Defines the order of processing by column numbers.

Last updated