Importing Data into Databricks

Once your synthetic data is written as Parquet files to a storage location (Local Filesystem, Azure Data Lake Storage (ADLS), or Amazon S3), follow these steps to import it back into Databricks:

  1. Access Databricks Workspace: Go to your Databricks workspace and navigate to the Data tab.

  2. Select Your Data Source:

    • For ADLS, select Azure Data Lake.

    • For Amazon S3, choose Amazon S3.

    • If using the Local Filesystem, upload your files to a cloud storage service like ADLS or S3 first.

  3. Mount the Storage: Mount your cloud storage (ADLS or S3) to Databricks following the Databricks mounting documentation for ADLS or S3.

  4. Read the Parquet Files: Use the Databricks Data tab or a notebook to load Parquet files into a DataFrame. For details, check the Databricks guide on reading files.

  5. Create or Register a Table: Use Databricks SQL commands or the user interface to create a temporary or permanent table from the loaded data. Parquet files generated by Syntho can be loaded into Databricks using standard Databricks SQL commands. For example:

CREATE TABLE example_table 
USING PARQUET 
LOCATION 'dbfs:/FileStore/tables/example.parquet';

This command creates a table from the specified Parquet file. Refer to the Databricks documentation for more information on managing tables.

Last updated