LogoLogo
Go to Syntho.AI
English
English
  • Welcome to Syntho
  • Overview
    • Get started
      • Syntho bootcamp
        • 1. What is Syntho?
        • 2. Introduction data anonymization
        • 3. Connectors & workspace creation
        • 4. PII scan
        • 5. Generators
          • Mockers
          • Maskers
          • AI synthesize
          • Calculated columns
          • Free text de-identification
        • 6. Referential integrity & foreign keys
        • 7. Workspace synchronization & validation
        • 8. Workspace & user management
        • 9. Large workloads​
        • 10. Data pre-processing
        • 11. Continuous Success
      • Prerequisites
      • Sample datasets
      • Introduction to data generators
    • Frequently asked questions
  • Setup Workspaces
    • View workspaces
    • Create a workspace
      • Connect to a database
        • PostgreSQL
        • MySQL / MariaDB
        • Oracle
        • Microsoft SQL Server
        • DB2
        • Databricks
          • Importing Data into Databricks
        • Hive
        • SAP Sybase
        • Azure Data Lake Storage (ADLS)
        • Amazon Simple Storage Service (S3)
      • Workspace modes
    • Edit a workspace
    • Duplicate a workspace
    • Transfer workspace ownership
    • Share a workspace
    • Delete a workspace
    • Workspace default settings
  • Configure a Data Generation Job
    • Configure table settings
    • Configure column settings
      • AI synthesize
        • Sequence model
          • Prepare your sequence data
        • QA report
        • Additional privacy controls
        • Cross-table relationships limitations
      • Mockers
        • Text
          • Supported languages
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • Other
      • Mask
        • Text
        • Numeric (integer)
        • Numeric (decimal)
        • Datetime
        • UUID
      • Duplicate
      • Exclude
      • Consistent mapping
      • Calculated columns
      • Key generators
        • Differences between key generators
      • JSON de-identification
    • Manage personally identifiable information (PII)
      • Privacy dashboard
      • Discover and de-identify PII columns
        • Identify PII columns manually
        • Automatic PII discovery with PII scanner
      • Remove columns from PII list
      • Automatic PII discovery and de-identification in free text columns
      • Supported PII & PHI entities
    • Manage foreign keys
      • Foreign key inheritance
      • Add virtual foreign keys
        • Add virtual foreign keys
        • Use foreign key scanner
        • Import foreign keys via JSON
        • Export foreign keys via JSON
      • Delete foreign keys
    • Validate and Synchronize workspace
    • View and adjust generation settings
  • Deploy Syntho
    • Introduction
      • Syntho architecture
      • Requirements
        • Requirements for Docker deployments
        • Requirements for Kubernetes deployments
      • Access Docker images
        • Online
        • Offline
    • Deploy Syntho using Docker
      • Preparations
      • Deploy using Docker Compose
      • Run the application
      • Manually saving logs
      • Updating the application
    • Deploy Syntho using Kubernetes
      • Preparations
      • Deploy Ray using Helm
        • Upgrading Ray CRDs
        • Troubleshooting
      • Deploy Syntho using Helm
      • Validate the deployment
      • Troubleshooting
      • Saving logs
      • Upgrading the applications
    • Manage users and access
      • Single Sign-On (SSO) in Azure
      • Manage admin users
      • Manage non-admin users
    • Logs and monitoring
      • Does Syntho collect any data?
      • Temporary data storage by application
  • Syntho API
    • Syntho REST API
Powered by GitBook
On this page
  • Before you begin
  • File formats
  • Output format
  • Connect and set up the workspace
  • Supported data types
  • Limitations & considerations

Was this helpful?

  1. Setup Workspaces
  2. Create a workspace
  3. Connect to a database

Amazon Simple Storage Service (S3)

PreviousAzure Data Lake Storage (ADLS)NextWorkspace modes

Last updated 2 months ago

Was this helpful?

Destination only

This connector can only be used as a destination for writing your generated data.

  • Supported File Types: Parquet and ORC

  • Supported Partitioning: Horizontal partitioning based on the write batch size (i.e. each batch will be written to a separate file). Please also give an example of file output structure.

Before you begin

Before you begin, gather this connection information:

  • Get the connection details to connect with your S3 bucket

File formats

Supported file type formats include:

  • Parquet

  • ORC

Output format

Syntho's S3 output connector will write all generated data to files as follows:

  • Each generated table will be written to a Parquet file in the following format: {schema-name}-{table_name}_part_{part_number}.parquet

  • The number of rows in a single Parquet file (part) is defined by the batch_generate size. All the Parquet parts of a single table will be stored in their own directory, which is dedicated to that particular table.

  • Each folder name will use the following format:

    {schema_name}.{table_name}

Connect and set up the workspace

Launch Syntho and select Connect to a database, or under Create workspace > Destination Database, select S3. For a complete list of data connections, select More under From database. Then do the following:

  1. Enter the bucket name.

  2. Enter the region name.

  3. Enter the port number.

  4. Enter the AWS access key id.

  5. Enter the AWS secret access key.

  6. Enter the prefix. If Syntho can't make the connection, verify that your credentials are correct. If you still can't connect, your computer is having trouble locating the server. Contact your network administrator or database administrator.

Supported data types

  • The supported data types for ORC files are specified in the Apache Arrow documentation.

Logical type
Mapped Arrow type

BOOLEAN

Boolean

BYTE

Int8

SHORT

Int16

INT

Int32

LONG

Int64

FLOAT

Float32

DOUBLE

Float64

STRING

String/LargeString

BINARY

Binary/LargeBinary/FixedSizeBinary

TIMESTAMP

Timestamp/Date64

TIMESTAMP_INSTANT

Timestamp

LIST

List/LargeList/FixedSizeList

MAP

Map

STRUCT

Struct

UNION

SparseUnion/DenseUnion

DECIMAL

Decimal128/Decimal256

DATE

Date32

VARCHAR

String

CHAR

String

Errors can occur during data conversion when writing to ORC files if unsupported data types are involved.

Limitations & considerations

Contact your Syntho contact person to discuss possible limitations regarding this connector.

  • For ORC files, columns full of None values which are of type Char, String or Varchar will be written as "None" (i.e. a string value) to the destination database instead of None.

Source and Destination Databases