Importing Data into Databricks

Once your synthetic data is written as Parquet files to a storage location (Local Filesystem, Azure Data Lake Storage (ADLS), or Amazon S3), follow these steps to import it back into Databricks:

Access Databricks workspace: Go to your Databricks workspace and navigate to the Data tab.
Select your data source:
- For ADLS, select Azure Data Lake.
- For Amazon S3, choose Amazon S3.
- If using the Local Filesystem, upload your files to a cloud storage service like ADLS or S3 first.
Mount the storage: Mount your cloud storage (ADLS or S3) to Databricks following the Databricks mounting documentation for ADLS or S3.
Read the parquet files: Use the Databricks Data tab or a notebook to load Parquet files into a DataFrame. For details, check the Databricks guide on reading files.
Create or register a table: Use Databricks SQL commands or the user interface to create a temporary or permanent table from the loaded data. Parquet files generated by Syntho can be loaded into Databricks using standard Databricks SQL commands. For example:

CREATE TABLE example_table 
USING PARQUET 
LOCATION 'dbfs:/FileStore/tables/example.parquet';

This command creates a table from the specified Parquet file. Refer to the Databricks documentation for more information on managing tables.

PreviousDatabricks NextHive

Last updated 3 months ago

Was this helpful?