# Sample datasets

To provide users with practical examples for testing and analytics, we have selected datasets optimized for various scenarios. These datasets are sourced from well-known repositories and are designed to help users get started with Syntho's features effectively. For testing purposes, you can access a **multi-table dataset**, while for analytics, there is a **single-table dataset**. These datasets serve as a practical starting point for exploring Syntho's features and capabilities:

## **Census dataset**

* **Use Case**: Ideal for analytics and AI model training.
* **Description**: Contains demographic information, including age, education, occupation, and income classification.
* **Source**: [UCI Machine Learning Repository - Adult Dataset](https://archive.ics.uci.edu/dataset/2/adult).

<figure><img src="/files/lBPMSZjfonKF290PdfnF" alt=""><figcaption><p>A screenshot from census dataset</p></figcaption></figure>

Click below link to download `.csv` file.

{% file src="/files/RO3muY2hNLcw74HNwQnv" %}
Census dataset
{% endfile %}

## **COVID-19 dataset**

* **Use Case**: Useful for testing synthetic data generation on multi-table healthcare-related datasets.
* **Description**: Includes tables such as patients, conditions, encounters etc. simulated for COVID-19 scenarios.
* **Source**: [Synthea COVID Patients Dataset](https://synthea.mitre.org/downloads).

<figure><img src="/files/agUZXlWk23tHQOcPN7e2" alt=""><figcaption><p>A screenshot from patients table</p></figcaption></figure>

Click below link to download `.zip` file for 10k patient records with COVID-19 in the CSV format. If you would like to download 100k patient records version, please click [here](https://mitre.box.com/shared/static/wk3560f962ozlg7sd2oj1zxk73ayqvm0.zip).

{% file src="/files/ks7hURZnT9B8eyBrk8g7" %}
Covid datasets with 10k records
{% endfile %}

## **Baseball dataset**

* **Use Case**: Suitable for analytics and relational dataset exploration.
* **Description**: Features player statistics and seasonal performance data.
* **Source**: [Lahman Baseball Dataset](https://lahman.r-forge.r-project.org/).

<figure><img src="/files/NSKhBlgyJtuwA5g90PGp" alt=""><figcaption><p>A screenshot from players table</p></figcaption></figure>

<figure><img src="/files/IpRhkjCn4Xxt6zPONKlh" alt=""><figcaption><p>A screenshot from seasons table</p></figcaption></figure>

Click below link to download `.zip` file.

{% file src="/files/mIABrc9U4TcMf1OdDoEt" %}
Baseball dataset
{% endfile %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.syntho.ai/overview/get-started/sample-datasets.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
