Databricks
Last updated
Was this helpful?
Last updated
Was this helpful?
Important
This connector can only be used as a source database. The generated data can be written to Local Filesystem, Azure Data Lake Storage (ADLS) or Amazon Simple Storage Service (S3) as Parquet files.
Before you begin, gather this connection information:
Name of the server that hosts the database you want to connect to and port number
The name of the database that you want to connect to
HTTP path to the data source
Personal Access Token
In Databricks, find your cluster server hostname and HTTP path using the instructions in Construct the JDBC URL on the Databricks website.
Launch Syntho and select Connect to a database, or under Create workspace, select Databricks. For a complete list of data connections, select More under From database. Then do the following:
Enter the server hostname.
Enter the catalog name.
Enter the database name.
Enter the HTTP Path to the data source.
Enter Personal Access Token. (See Personal Access Tokens on the Databricks website for information on access tokens.)
Select Create Workspace.
If Syntho can't make the connection, verify that your credentials are correct. If you still can't connect, your computer is having trouble locating the server. Contact your network administrator or database administrator.
The table below provides an overview of the supported Databricks versions and their corresponding Apache Spark versions.
16.2
3.5.0
15.4 LTS
3.5.0
14.3 LTS
3.5.0
Note: Version 13 is no longer supported.
The following table summarizes the current support limitations for various data types when using connectors with Databricks. It indicates what's supported per generator type.
The following table summarizes the current support limitations for various data types when using connectors with Databricks. It indicates what's supported per generator type.
INT
discrete
SMALLINT
discrete
TINYINT
discrete
BIGINT
discrete
DECIMAL
continuous
FLOAT
continuous
DOUBLE
continuous
STRING
categorical
BINARY
no active support
False
True
True
True
BOOLEAN
bool
False
False
True
DATE
datetime
False
TIMESTAMP
datetime
False
TIMESTAMP_NTZ
datetime
False
ARRAY
no active support
False
True
True
True
STRUCT
json
False
False
False
MAP
no active support
False
True
True
True
VARIANT
no active support
False
True
True
True
OBJECT
no active support
False
True
True
True
ENUM
not supported
False
False
False
False
Data types that are labeled as "no active support" mean that they are not actively supported, however, you may still be able to apply generators (e.g., AI-powered generation, mask, mockers or calculated columns) to these columns. Duplication is fully supported for these types.