# Databricks

Connect a Databricks notebook to a Tigris bucket using serverless compute (the default in Databricks). Tigris is S3-compatible, so you can use `boto3` to list and read files stored in Tigris directly from your notebooks.

![Databricks + Tigris Overview](/docs/assets/images/image1-019c30abac5052a336a5437a2ea4f3e0.png)

## Prerequisites[​](#prerequisites "Direct link to Prerequisites")

* Tigris **Access Key ID** and **Secret Access Key** (see the [Access Key guide](/docs/iam/manage-access-key/.md) if you need to create one)
* Tigris **Endpoint**: `https://t3.storage.dev`
* A Tigris **bucket** with data to read

## 1. Create a notebook[​](#1-create-a-notebook "Direct link to 1. Create a notebook")

Log in to your Databricks workspace and create a new notebook.

![Create Notebook in Databricks](/docs/assets/images/image2-245dcc5e9d3cedea52275d10f972ce1b.png)

## 2. Install dependencies[​](#2-install-dependencies "Direct link to 2. Install dependencies")

```
pip install boto3 pandas pyarrow s3fs
```

Then restart the Python kernel:

```
%restart_python
```

## 3. Initialize the Tigris client[​](#3-initialize-the-tigris-client "Direct link to 3. Initialize the Tigris client")

```
import boto3



tigris_client = boto3.client(

    's3',

    aws_access_key_id='YOUR-ACCESS-KEY-ID',

    aws_secret_access_key='YOUR-SECRET-ACCESS-KEY',

    endpoint_url='https://t3.storage.dev',

    region_name='auto'

)
```

Set `region_name` to `auto`. This works for all Tigris buckets.

## 4. Verify the connection[​](#4-verify-the-connection "Direct link to 4. Verify the connection")

List your Tigris buckets to confirm the client is configured correctly:

```
response = tigris_client.list_buckets()

print([bucket['Name'] for bucket in response['Buckets']])
```

## 5. Read a Parquet file[​](#5-read-a-parquet-file "Direct link to 5. Read a Parquet file")

Download and read a Parquet file from your Tigris bucket:

```
import pandas as pd

import pyarrow.parquet as pq

from io import BytesIO



bucket_name = 'databricks-test-bucket'

key = 'test/easy-00000-of-00002.parquet'



buffer = BytesIO()

tigris_client.download_fileobj(bucket_name, key, buffer)

buffer.seek(0)



table = pq.read_table(buffer)

df = table.to_pandas()

df.head()
```

![Reading Parquet File in Databricks](/docs/assets/images/image3-7305c3455bda3cdc655fec863cf43fe2.png)

You should see a preview of your Parquet file loaded into a Pandas DataFrame:

```
   column1   column2   column3

0  value_1  value_2  value_3

1  value_4  value_5  value_6

...
```

![DataFrame Output](/docs/assets/images/image4-7f5f16c53da036e5edf259d18b7bac46.png)
