LogoLogo
Go to Syntho.AI
English
English
  • Welcome to Syntho
  • Overview
    • Get started
      • Syntho bootcamp
        • 1. What is Syntho?
        • 2. Introduction data anonymization
        • 3. Connectors & workspace creation
        • 4. PII scan
        • 5. Generators
          • Mockers
          • Maskers
          • AI synthesize
          • Calculated columns
          • Free text de-identification
        • 6. Referential integrity & foreign keys
        • 7. Workspace synchronization & validation
        • 8. Workspace & user management
        • 9. Large workloads​
        • 10. AI synthesis: Data pre-processing when using
      • Prerequisites
      • Sample datasets
      • Introduction to data generators
      • AI-generated synthetic data
    • Frequently asked questions
  • Setup Workspaces
    • View workspaces
    • Create a workspace
      • Connect to a database
        • PostgreSQL
        • MySQL / MariaDB
        • Oracle
        • Microsoft SQL Server
        • DB2
        • Databricks
          • Importing Data into Databricks
        • Hive
        • SAP Sybase
        • Azure Data Lake Storage (ADLS)
        • Amazon Simple Storage Service (S3)
      • Workspace modes
    • Edit a workspace
    • Duplicate a workspace
    • Transfer workspace ownership
    • Share a workspace
    • Delete a workspace
    • Workspace default settings
  • Configure a Data Generation Job
    • Configure table settings
    • Configure column settings
      • AI synthesize
        • Sequence model
          • Prepare your sequence data
        • QA report
        • Additional privacy controls
        • Cross-table relationships limitations
      • Mock
        • Text
          • Supported languages
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • Other
      • Mask
        • Text
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • UUID
      • Duplicate
      • Exclude
      • Consistent mapping
      • Calculated columns
      • Key generators
        • Differences between key generators
      • JSON de-identification
    • Manage personally identifiable information (PII)
      • Privacy dashboard
      • Discover and de-identify PII columns
        • Identify PII columns manually
        • Automatic PII discovery with PII scanner
      • Remove columns from PII list
      • Automatic PII discovery and de-identification in free text columns
      • Supported PII & PHI entities
    • Manage foreign keys
      • Foreign key inheritance
      • Add virtual foreign keys
        • Add virtual foreign keys
        • Use foreign key scanner
        • Import foreign keys via JSON
        • Export foreign keys via JSON
      • Delete foreign keys
    • Validate and synchronize workspace
    • View and adjust generation settings
  • Deploy Syntho
    • Introduction
      • Syntho architecture
      • Requirements
        • Requirements for Docker deployments
        • Requirements for Kubernetes deployments
      • Access Docker images
        • Online
        • Offline
    • Deploy Syntho using Docker
      • Preparations
      • Deploy using Docker Compose
      • Run the application
      • Manually saving logs
      • Updating the application
      • Backup
    • Deploy Syntho using Kubernetes
      • Preparations
      • Deploy Ray using Helm
        • Upgrading Ray CRDs
        • Troubleshooting
      • Deploy Syntho using Helm
      • Validate the deployment
      • Troubleshooting
      • Saving logs
      • Upgrading the applications
      • Backup
    • Manage users and access
      • Single Sign-On (SSO) in Azure
      • Manage admin users
      • Manage non-admin users
    • Logs and monitoring
      • Does Syntho collect any data?
      • Temporary data storage by application
  • Syntho API
    • Syntho REST API
Powered by GitBook
On this page
  • How to apply Syntho's synthetic data sequence model
  • Sequence model parameters
  • Limitations & Recommendations

Was this helpful?

  1. Configure a Data Generation Job
  2. Configure column settings
  3. AI synthesize

Sequence model

PreviousAI synthesizeNextPrepare your sequence data

Last updated 1 month ago

Was this helpful?

Note: Before using this feature, make sure your data is set up as described in the section.

Syntho is capable of processing data in the form of lists, sequences, or time-series when structured in entity table-linked table structure.

Syntho's synthetic data sequence models allows you to capture relational information between any entity table and linked table. Entity tables contain the profiles of data entities, while linked tables reference them.

Entity tables can be identified by their attributes, which describe privacy-sensitive information about data entities, such as names, birthdates, phone numbers, addresses, and more. Linked tables often contain event information about a referenced entity, which can span multiple rows per entity, such as a monthly salary payment.

Let's consider the Patients and PatientMedications tables shown below. Here, the Patients table is the entity table. The PatientMedications tables is the linked table.

To synthesize these tables using Syntho's sequence models:

  1. Syntho starts by synthesizing the Patients table.

  2. Then, it synthesizes the PatientMedications table using the synthetic Patients table as context.

How to apply Syntho's synthetic data sequence model

To use Syntho's synthetic data sequence models, you can do the following:

  1. On the Table view tab, enable the Enable sequence modeling.

  2. Finally, select Start generating.

Sequence model parameters

Before initiating the generation process, you have the option to modify sequence model parameters. Here's an overview:

  • Max sequence length: Sets a cap on sequence lengths, truncating any sequence that exceeds this limit.

  • Rare long sequence protection threshold: Defines a limit for the length of data sequences used in training, adjusting the longest sequences to the length of the Nth sequence.

  • Read batch size: The quantity of rows read from each source table per batch.

  • Write batch size: The quantity of rows inserted into each destination table per batch.

  • N connections: Specifies the number of connections.

Limitations & Recommendations

It is important to consider the following when using Syntho's sequence models:

  • Order of rows: For your linked table, it is recommended to store the rows in the correct order. This information will be used to train Syntho's generative AI models, so it can leader to higher quality synthetic data.

  • Resource Consumption: This feature is resource-intensive and may slow down your data generation process. Consider reducing your input data or adjust the sequence model parameters to reduce time and resources for your job.

2 tables: Syntho has limited the use of its sequence models to 2 tables that are structured according to the to maximize the synthetic data utility.

Prepare your sequence data
entity table-linked structure
Enable sequence modeling
Sequence Model Parameters