LogoLogo
Go to Syntho.AI
English
English
  • Welcome to Syntho
  • Overview
    • Get started
      • Syntho bootcamp
        • 1. What is Syntho?
        • 2. Introduction data anonymization
        • 3. Connectors & workspace creation
        • 4. PII scan
        • 5. Generators
          • Mockers
          • Maskers
          • AI synthesize
          • Calculated columns
          • Free text de-identification
        • 6. Referential integrity & foreign keys
        • 7. Workspace synchronization & validation
        • 8. Workspace & user management
        • 9. Large workloads​
        • 10. Data pre-processing
        • 11. Continuous Success
      • Prerequisites
      • Sample datasets
      • Introduction to data generators
    • Frequently asked questions
  • Setup Workspaces
    • View workspaces
    • Create a workspace
      • Connect to a database
        • PostgreSQL
        • MySQL / MariaDB
        • Oracle
        • Microsoft SQL Server
        • DB2
        • Databricks
          • Importing Data into Databricks
        • Hive
        • SAP Sybase
        • Azure Data Lake Storage (ADLS)
        • Amazon Simple Storage Service (S3)
      • Workspace modes
    • Edit a workspace
    • Duplicate a workspace
    • Transfer workspace ownership
    • Share a workspace
    • Delete a workspace
    • Workspace default settings
  • Configure a Data Generation Job
    • Configure table settings
    • Configure column settings
      • AI synthesize
        • Sequence model
          • Prepare your sequence data
        • QA report
        • Additional privacy controls
        • Cross-table relationships limitations
      • Mockers
        • Text
          • Supported languages
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • Other
      • Mask
        • Text
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • UUID
      • Duplicate
      • Exclude
      • Consistent mapping
      • Calculated columns
      • Key generators
        • Differences between key generators
      • JSON de-identification
    • Manage personally identifiable information (PII)
      • Privacy dashboard
      • Discover and de-identify PII columns
        • Identify PII columns manually
        • Automatic PII discovery with PII scanner
      • Remove columns from PII list
      • Automatic PII discovery and de-identification in free text columns
      • Supported PII & PHI entities
    • Manage foreign keys
      • Foreign key inheritance
      • Add virtual foreign keys
        • Add virtual foreign keys
        • Use foreign key scanner
        • Import foreign keys via JSON
        • Export foreign keys via JSON
      • Delete foreign keys
    • Validate and Synchronize workspace
    • View and adjust generation settings
  • Deploy Syntho
    • Introduction
      • Syntho architecture
      • Requirements
        • Requirements for Docker deployments
        • Requirements for Kubernetes deployments
      • Access Docker images
        • Online
        • Offline
    • Deploy Syntho using Docker
      • Preparations
      • Deploy using Docker Compose
      • Run the application
      • Manually saving logs
      • Updating the application
    • Deploy Syntho using Kubernetes
      • Preparations
      • Deploy Ray using Helm
        • Upgrading Ray CRDs
        • Troubleshooting
      • Deploy Syntho using Helm
      • Validate the deployment
      • Troubleshooting
      • Saving logs
      • Upgrading the applications
    • Manage users and access
      • Single Sign-On (SSO) in Azure
      • Manage admin users
      • Manage non-admin users
    • Logs and monitoring
      • Does Syntho collect any data?
      • Temporary data storage by application
  • Syntho API
    • Syntho REST API
Powered by GitBook
On this page
  • 1. Split single sequential datasets into entity and linked tables
  • How to Split Data into Entity and Events
  • 2. Transfer all static data to the entity table

Was this helpful?

  1. Configure a Data Generation Job
  2. Configure column settings
  3. AI synthesize
  4. Sequence model

Prepare your sequence data

PreviousSequence modelNextQA report

Last updated 8 months ago

Was this helpful?

This section describes how to convert your single table time series dataset into an . If your sequence data is already prepared into an entity table and linked table you can skip this section.

If your raw data involves a series of events in a single table, you should separate it into an and a . Follow the below steps, to achieve this.

For example, the below table has a series of events (baseball players information and statistics every season).

1. Split single sequential datasets into entity and linked tables

Relocate the event data to another table, ensuring that this new table is connected to the entity table via a foreign key that corresponds to the entity table's primary key. In this setup, each individual or entity listed in the entity table has a corresponding ID in the linked table.

The arrangement of your sequential data is crucial. If your event data exists in columns, you should reshape them into rows, where each row describes a unique event.

How to Split Data into Entity and Events

Examples of common datasets designed for a range of applications include:

  • Patient journeys where a table of medical events are linked to individual patients.

  • Various types of sensor readings where an entity table lists sensors, and the linked table records readings associated with those sensors.

  • In e-commerce, synthetic data often originates from purchase datasets where entity tables contain customer information, and linked tables store the purchases made by those customers.

These are chronologically ordered, sequential datasets, where the sequence and timing of events provide important insights.

When organizing your datasets for further processing, adhere to these requirements:

Entity Table
Linked Table

Each row represents a unique individual

Multiple rows can correspond to the same individual

Must have a unique entity ID (primary key)

Each row should link to a unique ID in the entity table (foreign key)

Rows are independent of each other

Multiple rows can be interrelated

Contains only static information

Contains only dynamic information; sequences should be time-ordered if possible

2. Transfer all static data to the entity table

Inspect your linked table containing events. If it includes static information describing the entity, this should be moved to the entity table. For instance, consider an e-commerce scenario where each purchase event belongs to specific customers. The customer's email remains the same across various events. It's static and characterizes the customer, not the event. In such a case, the email_address column should be transferred to the entity table.

Another example can be a baseball players table and table showing their statistics per season. In this case, baseball players should be considered as an entity table since baseball players table will have primary key (player id), rows will be independent of each other and represent unique individual and contains static information. On the other hand, seasons table will have different rows devoted to one individual since one baseball player can play in more than one season. Also seasons table has unique ID in the entity table (foreign key) and it contains time-ordered dynamic information. See illustration below.

Basbeall players and their statistics in one table
One table was separated into entity and linked tables, showing static (players) and dynamic (seasons) information, respectively
Illustration showing one-to-many relationship of players and seasons tables
entity table-linked table dataset
entity table
linked table