DuckDB
DuckDB is an in-process embedded analytical database optimized for fast queries, ease of use, and embedding efficient analytics inside existing applications. It's the SQLite for analytics, but with a stronger focus on data analytics, aggregation, and Online Analytical Processing (OLAP) queries. Tigris dynamically distributes your data based on access patterns and handles lots of small files efficiently, enabling additional performance gains while using DuckDB.
Tigris is also compatible with DuckLake, a lakehouse format that simplifies lakehouses by using a standard SQL database for all metadata, instead of complex file-based systems, while still storing data in open formats like Parquet, in object storage.
Prerequisites
Make sure you have the following information from your Tigris account:
- Tigris Access Key ID
- Tigris Secret Access Key
- Tigris Endpoint (e.g.,
https://t3.storage.dev
) - Tigris Bucket Name
If you don't yet have access credentials, follow the steps in the Access Key guide to create one.
Configure DuckDB to use Tigris
Open the DuckDB command line in your terminal:
$ duckdb
v1.2.2 7c039464e4
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D
Create a new secret with the CREATE SECRET
command:
CREATE OR REPLACE SECRET tigris
( TYPE s3
, PROVIDER config
, KEY_ID 'tid_access_key_id'
, SECRET 'tsec_secret_access_key'
, REGION 'auto'
, ENDPOINT 't3.storage.dev'
, URL_STYLE 'vhost'
);
Query files in DuckDB
Once you're in, you can query files in your bucket like you would normally, just
open them with the s3://<bucketname>
prefix. For example, you can import the
data from the
LinkedIn Data Jobs Dataset
by creating a local table like this:
CREATE TABLE IF NOT EXISTS jobs
( id INT64 NOT NULL
, title TEXT NOT NULL
, company TEXT NOT NULL
, "location" TEXT NOT NULL
, link TEXT NOT NULL
, source TEXT NOT NULL
, date_posted DATE NOT NULL
, work_type TEXT NOT NULL
, employment_type TEXT NOT NULL
, "description" TEXT NOT NULL
);
And then importing the data from Tigris:
INSERT INTO jobs
SELECT * FROM
READ_CSV
( 's3://xe-duckdb/clean_jobs.csv'
, header = true
, columns =
{ 'id': 'INT64'
, 'title': 'TEXT'
, 'company': 'TEXT'
, 'location': 'TEXT'
, 'link': 'TEXT'
, 'source': 'TEXT'
, 'date_posted': 'DATE'
, 'work_type': 'TEXT'
, 'employment_type': 'TEXT'
, 'description': 'TEXT'
}
);
Now you can query it like normal:
SELECT
*
FROM
jobs
WHERE
location LIKE 'Seattle%';