Amazon Simple Storage Service (S3)
Last updated
Last updated
Destination only
This connector can only be used as a destination for writing your generated data.
Supported File Types: Parquet and ORC
Supported Partitioning: Horizontal partitioning based on the write batch size (i.e. each batch will be written to a separate file). Please also give an example of file output structure.
Before you begin, gather this connection information:
Get the connection details to connect with your S3 bucket
Supported file type formats include:
Parquet
ORC
Syntho's S3 output connector will write all generated data to files as follows:
Each generated table will be written to a Parquet file in the following format:
{schema-name}-{table_name}_part_{part_number}.parquet
The number of rows in a single Parquet file (part) is defined by the batch_generate
size. All the Parquet parts of a single table will be stored in their own directory, which is dedicated to that particular table.
Each folder name will use the following format:
{schema_name}.{table_name}
Launch Syntho and select Connect to a database, or under Create workspace > Destination Database, select S3. For a complete list of data connections, select More under From database. Then do the following:
Enter the bucket name.
Enter the region name.
Enter the port number.
Enter the AWS access key id.
Enter the AWS secret access key.
Enter the prefix. If Syntho can't make the connection, verify that your credentials are correct. If you still can't connect, your computer is having trouble locating the server. Contact your network administrator or database administrator.
The supported data types for ORC files are specified in the Apache Arrow documentation.
Errors can occur during data conversion when writing to ORC files if unsupported data types are involved.
Contact your Syntho contact person to discuss possible limitations regarding this connector.
For ORC files, columns full of None values which are of type Char, String or Varchar will be written as "None" (i.e. a string value) to the destination database instead of None.
Logical type | Mapped Arrow type |
---|---|
BOOLEAN
Boolean
BYTE
Int8
SHORT
Int16
INT
Int32
LONG
Int64
FLOAT
Float32
DOUBLE
Float64
STRING
String/LargeString
BINARY
Binary/LargeBinary/FixedSizeBinary
TIMESTAMP
Timestamp/Date64
TIMESTAMP_INSTANT
Timestamp
LIST
List/LargeList/FixedSizeList
MAP
Map
STRUCT
Struct
UNION
SparseUnion/DenseUnion
DECIMAL
Decimal128/Decimal256
DATE
Date32
VARCHAR
String
CHAR
String