LogoLogo
Go to Syntho.AI
English
English
  • Welcome to Syntho
  • Overview
    • Get started
      • Syntho bootcamp
        • 1. What is Syntho?
        • 2. Introduction data anonymization
        • 3. Connectors & workspace creation
        • 4. PII scan
        • 5. Generators
          • Mockers
          • Maskers
          • AI synthesize
          • Calculated columns
          • Free text de-identification
        • 6. Referential integrity & foreign keys
        • 7. Workspace synchronization & validation
        • 8. Workspace & user management
        • 9. Large workloads​
        • 10. AI synthesis: Data pre-processing when using
      • Prerequisites
      • Sample datasets
      • Introduction to data generators
      • AI-generated synthetic data
    • Frequently asked questions
  • Setup Workspaces
    • View workspaces
    • Create a workspace
      • Connect to a database
        • PostgreSQL
        • MySQL / MariaDB
        • Oracle
        • Microsoft SQL Server
        • DB2
        • Databricks
          • Importing Data into Databricks
        • Hive
        • SAP Sybase
        • Azure Data Lake Storage (ADLS)
        • Amazon Simple Storage Service (S3)
      • Workspace modes
    • Edit a workspace
    • Duplicate a workspace
    • Transfer workspace ownership
    • Share a workspace
    • Delete a workspace
    • Workspace default settings
  • Configure a Data Generation Job
    • Configure table settings
    • Configure column settings
      • AI synthesize
        • Sequence model
          • Prepare your sequence data
        • QA report
        • Additional privacy controls
        • Cross-table relationships limitations
      • Mock
        • Text
          • Supported languages
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • Other
      • Mask
        • Text
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • UUID
      • Duplicate
      • Exclude
      • Consistent mapping
      • Calculated columns
      • Key generators
        • Differences between key generators
      • JSON de-identification
    • Manage personally identifiable information (PII)
      • Privacy dashboard
      • Discover and de-identify PII columns
        • Identify PII columns manually
        • Automatic PII discovery with PII scanner
      • Remove columns from PII list
      • Automatic PII discovery and de-identification in free text columns
      • Supported PII & PHI entities
    • Manage foreign keys
      • Foreign key inheritance
      • Add virtual foreign keys
        • Add virtual foreign keys
        • Use foreign key scanner
        • Import foreign keys via JSON
        • Export foreign keys via JSON
      • Delete foreign keys
    • Validate and synchronize workspace
    • View and adjust generation settings
  • Deploy Syntho
    • Introduction
      • Syntho architecture
      • Requirements
        • Requirements for Docker deployments
        • Requirements for Kubernetes deployments
      • Access Docker images
        • Online
        • Offline
    • Deploy Syntho using Docker
      • Preparations
      • Deploy using Docker Compose
      • Run the application
      • Manually saving logs
      • Updating the application
      • Backup
    • Deploy Syntho using Kubernetes
      • Preparations
      • Deploy Ray using Helm
        • Upgrading Ray CRDs
        • Troubleshooting
      • Deploy Syntho using Helm
      • Validate the deployment
      • Troubleshooting
      • Saving logs
      • Upgrading the applications
      • Backup
    • Manage users and access
      • Single Sign-On (SSO) in Azure
      • Manage admin users
      • Manage non-admin users
    • Logs and monitoring
      • Does Syntho collect any data?
      • Temporary data storage by application
  • Syntho API
    • Syntho REST API
Powered by GitBook
On this page
  • Large workloads
  • Speeding up data generation jobs and reducing memory footprint
  • Lowering memory footprint
  • Speeding up data generation jobs
  • Interactive guide: How to handle large workloads
  • Best practice

Was this helpful?

  1. Overview
  2. Get started
  3. Syntho bootcamp

9. Large workloads​

Previous8. Workspace & user managementNext10. AI synthesis: Data pre-processing when using

Last updated 22 days ago

Was this helpful?

Large workloads

Speeding up data generation jobs and reducing memory footprint

Working with large databases can significantly impact the performance and success of your synthetic data generation jobs. These tips will help you configure your workspace for large workloads by minimizing memory consumption and optimizing execution speed.


Lowering memory footprint

To reduce memory usage and avoid potential timeouts or job failures, consider these strategies:

  • : Lower the number of concurrent connections to reduce memory usage.

  • : Smaller batches consume less memory per operation.

  • : This is a resource-intensive process. Only enable it when absolutely necessary.

  • (AI synthesis only): Limiting the training data size speeds up processing and conserves resources.


Speeding up data generation jobs

To accelerate data generation for large-scale datasets, apply the following optimizations:

  • : More connections can speed up data reading and writing through parallel execution.

  • Enable schema-independent scheduling: By removing constraints in the destination schema, Syntho can parallelize processing based on the number of records instead of schema dependencies.

  • Write to Parquet instead of a database: Writing directly to a database is often slower. When dealing with very large datasets, consider exporting to efficient columnar file formats like Parquet.

Interactive guide: How to handle large workloads

Follow the interactive guide below to handle large workloads


Best practice

Always aim to use the minimal viable dataset to validate your configurations before executing large jobs. Scaling up becomes much easier and more stable when you're confident in your setup.

Reduce the number of training rows
Decrease parallel connections
Decrease read and write batch size
Increase parallel connections
Limit free text PII detection