Importing Data into Databricks
Once your synthetic data is written as Parquet files to a storage location (Local Filesystem, Azure Data Lake Storage (ADLS), or Amazon S3), follow these steps to import it back into Databricks:
Access Databricks Workspace: Go to your Databricks workspace and navigate to the Data tab.
Select Your Data Source:
For ADLS, select Azure Data Lake.
For Amazon S3, choose Amazon S3.
If using the Local Filesystem, upload your files to a cloud storage service like ADLS or S3 first.
Mount the Storage: Mount your cloud storage (ADLS or S3) to Databricks following the Databricks mounting documentation for ADLS or S3.
Read the Parquet Files: Use the Databricks Data tab or a notebook to load Parquet files into a DataFrame. For details, check the Databricks guide on reading files.
Create or Register a Table: Use Databricks SQL commands or the user interface to create a temporary or permanent table from the loaded data. Parquet files generated by Syntho can be loaded into Databricks using standard Databricks SQL commands. For example:
This command creates a table from the specified Parquet file. Refer to the Databricks documentation for more information on managing tables.
Last updated