LogoLogo
Go to Syntho.AI
English
English
  • Welcome to Syntho
  • Overview
    • Get started
      • Syntho bootcamp
        • 1. What is Syntho?
        • 2. Introduction data anonymization
        • 3. Connectors & workspace creation
        • 4. PII scan
        • 5. Generators
          • Mockers
          • Maskers
          • AI synthesize
          • Calculated columns
          • Free text de-identification
        • 6. Referential integrity & foreign keys
        • 7. Workspace synchronization & validation
        • 8. Workspace & user management
        • 9. Large workloads​
        • 10. AI synthesis: Data pre-processing when using
      • Prerequisites
      • Sample datasets
      • Introduction to data generators
      • AI-generated synthetic data
    • Frequently asked questions
  • Setup Workspaces
    • View workspaces
    • Create a workspace
      • Connect to a database
        • PostgreSQL
        • MySQL / MariaDB
        • Oracle
        • Microsoft SQL Server
        • DB2
        • Databricks
          • Importing Data into Databricks
        • Hive
        • SAP Sybase
        • Azure Data Lake Storage (ADLS)
        • Amazon Simple Storage Service (S3)
      • Workspace modes
    • Edit a workspace
    • Duplicate a workspace
    • Transfer workspace ownership
    • Share a workspace
    • Delete a workspace
    • Workspace default settings
  • Configure a Data Generation Job
    • Configure table settings
    • Configure column settings
      • AI synthesize
        • Sequence model
          • Prepare your sequence data
        • QA report
        • Additional privacy controls
        • Cross-table relationships limitations
      • Mock
        • Text
          • Supported languages
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • Other
      • Mask
        • Text
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • UUID
      • Duplicate
      • Exclude
      • Consistent mapping
      • Calculated columns
      • Key generators
        • Differences between key generators
      • JSON de-identification
    • Manage personally identifiable information (PII)
      • Privacy dashboard
      • Discover and de-identify PII columns
        • Identify PII columns manually
        • Automatic PII discovery with PII scanner
      • Remove columns from PII list
      • Automatic PII discovery and de-identification in free text columns
      • Supported PII & PHI entities
    • Manage foreign keys
      • Foreign key inheritance
      • Add virtual foreign keys
        • Add virtual foreign keys
        • Use foreign key scanner
        • Import foreign keys via JSON
        • Export foreign keys via JSON
      • Delete foreign keys
    • Validate and synchronize workspace
    • View and adjust generation settings
  • Deploy Syntho
    • Introduction
      • Syntho architecture
      • Requirements
        • Requirements for Docker deployments
        • Requirements for Kubernetes deployments
      • Access Docker images
        • Online
        • Offline
    • Deploy Syntho using Docker
      • Preparations
      • Deploy using Docker Compose
      • Run the application
      • Manually saving logs
      • Updating the application
      • Backup
    • Deploy Syntho using Kubernetes
      • Preparations
      • Deploy Ray using Helm
        • Upgrading Ray CRDs
        • Troubleshooting
      • Deploy Syntho using Helm
      • Validate the deployment
      • Troubleshooting
      • Saving logs
      • Upgrading the applications
      • Backup
    • Manage users and access
      • Single Sign-On (SSO) in Azure
      • Manage admin users
      • Manage non-admin users
    • Logs and monitoring
      • Does Syntho collect any data?
      • Temporary data storage by application
  • Syntho API
    • Syntho REST API
Powered by GitBook
On this page
  • Adjust the number of rows to generate
  • Considerations for adjusting the number of rows to generate
  • Advanced table settings
  • ORDER BY

Was this helpful?

  1. Configure a Data Generation Job

Configure table settings

On the Table view tab, you can apply several configurations on the column level. On the left panel, the Database section lists all tables in your database. Click Edit to choose which tables to include or exclude.

  • Include – selected tables will be transferred to the destination database.

  • Exclude – selected tables will not be transferred.

Caution: Excluding a table could cause conflicts with foreign key constraints in your destination database.

Hint: o include or exclude several tables at once, click Bulk.

Adjust the number of rows to generate

By default, Syntho generates the same number of rows in the destination table as in your source table.

To change the number of rows to generate for a table:

  1. Go to Rows to generate field in the Table settings menu right on the Job settings panel.

  2. Update the field value to the desired number of destination rows.

The behaviour when adjusting the destination table row count is the following:

  • For tables that are included:

    • If an AI synthesize or mockers are applied, Syntho will generate the exact number of rows you specify.

    • If Duplicate is applied to any column, it generates the specified number of rows (n) by duplicating from the original table (n_original).

      • If n ≤ n_original, the original rows are copied as they are.

      • If n > n_original, the original n_original rows are copied, and any additional rows are randomly sampled (with replacement) from the original rows

  • For tables that are excluded, Syntho does not generate any rows (since the table is excluded).

Considerations for adjusting the number of rows to generate

  • The Rows to generate field will be disabled if the table doesn't support oversampling, which can be due to the following:

    • The table has another method than Generate as the applied key generator method.

  • If its number was previously changed and the table doesn't support oversampling anymore, the value will be reverted to the original one.

  • Adjusting Rows to generate could cause conflicts with foreign key constraints in your destination database.

Advanced table settings

Unfold Advanced settings under the Table settings to view and adjust settings on the table-level. Note that these settings will only be relevant for any columns that use AI synthesize.

You can adjust the following advanced table settings:

  1. Maximum rows used for training: The maximum number of rows to be used for training. Using fewer rows can speed up the process, but may come at the cost of lower synthetic data utility.

  2. Take random sample:

    • On: takes a random sample of rows used for training. Note that choosing this option can cause a data generation job to run significantly longer, depending on the database.

    • Off (default): takes the top rows as defined in the database.

  3. Choose Table Model: The generative AI model that will be applied to all columns using AI synthesize. This feature allows users to flexibly manage multiple table models by selecting between the following options:

    • Single table model

    • Sequence table model

    Please note that you can create multiple sequence models as long as the foreign key (FK) relationship limit between the tables is present.

ORDER BY

Hive only

In the Table Settings panel, a dropdown field allows users to specify which columns should be used in the "ORDER BY" clause. This feature enables users to define a set of columns that ensure the uniqueness of the returned results for a given table. By selecting the appropriate columns, users can achieve deterministic ordering even in the absence of primary keys or indexes.

  • Order By dropdown: Located in the Table Settings panel on the right side of the Table view tab, this dropdown lets users choose the columns for the "ORDER BY" clause.

Steps to configure:

  1. Open the Table settings panel in the Table view tab.

  2. Scroll to find the "ORDER BY" dropdown.

  3. Select the desired columns from the dropdown to define the order.

Example scenario:

  • If a table does not have a primary key or index, and the first column contains duplicates, the application may not order the data consistently. By using the new "ORDER BY" dropdown, users can select a combination of columns (e.g., ColumnA, ColumnB, ColumnC) that together provide a unique ordering for the table.

To improve the user experience when loading application screens and panels, Syntho has efficient data loading mechanisms. These aim to ensure smoother interaction, especially when the source database contains a significant amount of data.

PreviousWorkspace default settingsNextConfigure column settings

Last updated 1 month ago

Was this helpful?

Advanced settings in Table settings view
Choosing table mode