LogoLogo
Go to Syntho.AI
English
English
  • Welcome to Syntho
  • Overview
    • Get started
      • Syntho bootcamp
        • 1. What is Syntho?
        • 2. Introduction data anonymization
        • 3. Connectors & workspace creation
        • 4. PII scan
        • 5. Generators
          • Mockers
          • Maskers
          • AI synthesize
          • Calculated columns
          • Free text de-identification
        • 6. Referential integrity & foreign keys
        • 7. Workspace synchronization & validation
        • 8. Workspace & user management
        • 9. Large workloads​
        • 10. Data pre-processing
        • 11. Continuous Success
      • Prerequisites
      • Sample datasets
      • Introduction to data generators
    • Frequently asked questions
  • Setup Workspaces
    • View workspaces
    • Create a workspace
      • Connect to a database
        • PostgreSQL
        • MySQL / MariaDB
        • Oracle
        • Microsoft SQL Server
        • DB2
        • Databricks
          • Importing Data into Databricks
        • Hive
        • SAP Sybase
        • Azure Data Lake Storage (ADLS)
        • Amazon Simple Storage Service (S3)
      • Workspace modes
    • Edit a workspace
    • Duplicate a workspace
    • Transfer workspace ownership
    • Share a workspace
    • Delete a workspace
    • Workspace default settings
  • Configure a Data Generation Job
    • Configure table settings
    • Configure column settings
      • AI synthesize
        • Sequence model
          • Prepare your sequence data
        • QA report
        • Additional privacy controls
        • Cross-table relationships limitations
      • Mockers
        • Text
          • Supported languages
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • Other
      • Mask
        • Text
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • UUID
      • Duplicate
      • Exclude
      • Consistent mapping
      • Calculated columns
      • Key generators
        • Differences between key generators
      • JSON de-identification
    • Manage personally identifiable information (PII)
      • Privacy dashboard
      • Discover and de-identify PII columns
        • Identify PII columns manually
        • Automatic PII discovery with PII scanner
      • Remove columns from PII list
      • Automatic PII discovery and de-identification in free text columns
      • Supported PII & PHI entities
    • Manage foreign keys
      • Foreign key inheritance
      • Add virtual foreign keys
        • Add virtual foreign keys
        • Use foreign key scanner
        • Import foreign keys via JSON
        • Export foreign keys via JSON
      • Delete foreign keys
    • Validate and Synchronize workspace
    • View and adjust generation settings
  • Deploy Syntho
    • Introduction
      • Syntho architecture
      • Requirements
        • Requirements for Docker deployments
        • Requirements for Kubernetes deployments
      • Access Docker images
        • Online
        • Offline
    • Deploy Syntho using Docker
      • Preparations
      • Deploy using Docker Compose
      • Run the application
      • Manually saving logs
      • Updating the application
    • Deploy Syntho using Kubernetes
      • Preparations
      • Deploy Ray using Helm
        • Upgrading Ray CRDs
        • Troubleshooting
      • Deploy Syntho using Helm
      • Validate the deployment
      • Troubleshooting
      • Saving logs
      • Upgrading the applications
    • Manage users and access
      • Single Sign-On (SSO) in Azure
      • Manage admin users
      • Manage non-admin users
    • Logs and monitoring
      • Does Syntho collect any data?
      • Temporary data storage by application
  • Syntho API
    • Syntho REST API
Powered by GitBook
On this page

Was this helpful?

  1. Configure a Data Generation Job

View and adjust generation settings

The final step in the data generation wizard allows you to adjust the Generation settings.

Advanced generation settings

The following advanced settings can be adjusted for your generation job:

  • Read batch size: The maximum number of rows per batch to read from each table in the source database. The allowed value is integer (number of rows). The default batch size is 100k. You can increase the value to improve reading speeds, at the costs of a larger memory usage, and vice versa.

  • Write batch size: The maximum number of rows per batch to insert in each table in the destination database. The allowed value is integer (number of rows). The default batch size is 100k. You can increase the value to improve writing speeds, at the costs of a larger memory usage, and vice versa.

  • Maximum number of connections: The maximum number of connections that can be made to the database during the writing. A higher number can speed up the process as it allows for more parallel operations. However, if you are reading from a production database, this might not be desirable.

Known Issues

Parameter Limit Error When data is inserted into the destination database you may encounter an error indicating that the SQL query parameter limit has been exceeded. This occurs because large write batch sizes can exceed the driver’s maximum allowable parameters for a single query. For instance, you might see an error like:

The SQL contains 16970 parameter markers, but 1000010 parameters were supplied'

If you experience this error, reducing the "Write batch size" is recommended to stay within the driver’s parameter limits. As a starting point, you can reduce the batch size from 1 million datapoints (1M) to 500k datapoints (0.5M), half the write batch size again even lower, if the errors persist. The exact limit can vary per environment.

PreviousValidate and Synchronize workspaceNextIntroduction

Last updated 3 months ago

Was this helpful?