Skip to main content

Data Time Travel with DuckLake and Tigris

· 11 min read
Xe Iaso
Katie Schilling

You’ve got your tunes in high gear, your editor is open, and you’re working on recreating some database tables with your AI agent in an epic duo. Your AI agent says “run this SQL query?” and you click yes. The tests pass, you make PR that’s quickly stamped and merged, when suddenly your pager goes off. And again. And again. You’ve just broken the analytics database and everything is on fire. What do you do?

If you’re using most database engines, this is a priority zero “stop the world and restore from backups” shaped problem. This is especially annoying with analytics databases because many times those databases aren’t just bigger than ram, they’re bigger than your local disk, and sometimes even bigger than any disk in any computer can ever be. But further, if your AI starts renaming columns or combining data in interesting ways, it can be quite the mess to untangle, even impossible. In such cases, this is an XK-class end-of-the-project scenario, which is triple-plus ungood.

However, you read our last post on Ducklake and have been storing your analytics data in Tigris so you can get that sweet, sweet global performance. How do you go back to the past where everything Just Worked? Turns out it’s easy, no DeLorean required. All you have to do is reset the timeline with a couple simple commands.

Ducklake and you

Ducklake is an analytics data lakehouse that lets you import SQL and NoSQL data so you can run SQL queries on it. One of the really cool parts about Ducklake is that when you do any INSERT or DELETE into Ducklake tables, you create a new snapshot of the database that you can roll back to. Ducklake turns your SQL database into an append-only-log.

As an example, let’s create a Ducklake database backed by Tigris, insert some data and see what happens. First, install DuckDB and then set up the Ducklake extension:

INSTALL ducklake;
LOAD ducklake;

Then attach to a new Ducklake database in Tigris:

ATTACH 'ducklake:timetravel_demo.ddb'
AS delorean
( DATA_PATH 's3://xe-ducklake/delorean'
);

Note: This creates the Ducklake metadata on the local filesystem which is fine for demos like this, but for production use we suggest putting your Ducklake metadata in the cloud with a Postgres database or one of the other backends Ducklake supports.

Now that we have the database, let’s create a simple table and throw some data in it:

CREATE TABLE IF NOT EXISTS delorean.youtube_videos
( id TEXT NOT NULL PRIMARY KEY
, title TEXT NOT NULL
, channel TEXT NOT NULL
);

INSERT INTO delorean.youtube_videos
( id, title, channel )
VALUES
( 'WcSCYzI2peM', 'Delfino Plaza (Super Mario Sunshine) - Mario Kart World', 'SiIvaGunner' )
, ( 'W4AcveHnDzg', 'Retribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable Night', 'SiIvaGunner' );

Awesome, let’s see what the bucket looks like:

$ aws s3 ls s3://xe-ducklake/delorean/main/youtube_videos/
2025-06-25 10:28:34 1175 ducklake-0197a77d-905b-7624-8e32-c80c69470e52.parquet

Interesting, Ducklake created a parquet file named after the table we inserted the data into, let’s see what it looks like:

FROM 's3://xe-ducklake/delorean/main/youtube_videos/ducklake-0197a77d-905b-7624-8e32-c80c69470e52.parquet';
idtitlechannel
WcSCYzI2peMDelfino Plaza (Super Mario Sunshine) - Mario Kart WorldSiIvaGunner
W4AcveHnDzgRetribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable NightSiIvaGunner

This is the key to how Ducklake works. Every time you write to one of its tables, it puts those rows you added into a parquet file in object storage. Let’s see the changes we made to the delorean database:

FROM ducklake_snapshots('delorean');
snapshot_idsnapshot_timeschema_versionchanges
02025-06-25 10:19:32.897-040{schemas_created=[main]}
12025-06-25 10:27:33.45-041{tables_created=[main.youtube_videos]}
22025-06-25 10:27:40.828-042{tables_dropped=[1]}
32025-06-25 10:27:45.229-043{tables_created=[main.youtube_videos]}
42025-06-25 10:28:33.497-043{tables_inserted_into=[2]}

The first change is creating the main schema, and then you can see that while I was working on this article I made the youtube_videos table, messed up the schema, dropped it, recreated it, and then inserted information about epic tunes into the table. To really show off this time travel power though, let’s delete the data and then add other data into the mix:

DELETE FROM delorean.youtube_videos;

INSERT INTO delorean.youtube_videos
( id, title, channel )
VALUES
( 'jhl5afLEKdo', 'Hatsune Miku World is Mine / ryo(supercell)', 'Hatsune Miku' )
, ( 'sqK-jh4TDXo', 'Machine Love (feat. Kasane Teto)', 'Jamie Page' );

So what happened to the database? Here’s what the table looks like now:

SELECT * FROM delorean.youtube_videos;
idtitlechannel
jhl5afLEKdoHatsune Miku World is Mine / ryo(supercell)Hatsune Miku
sqK-jh4TDXoMachine Love (feat. Kasane Teto)Jamie Page

But if you look in the bucket, the old data is still there:

FROM 's3://xe-ducklake/delorean/main/youtube_videos/*.parquet';
idtitlechannel
WcSCYzI2peMDelfino Plaza (Super Mario Sunshine) - Mario Kart WorldSiIvaGunner
W4AcveHnDzgRetribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable NightSiIvaGunner
jhl5afLEKdoHatsune Miku World is Mine / ryo(supercell)Hatsune Miku
sqK-jh4TDXoMachine Love (feat. Kasane Teto)Jamie Page

How do you get the old data back? Well for one, we can time travel directly in SQL queries! Let’s look at the database snapshots again and try to figure out what happened:

FROM ducklake_snapshots('delorean');
snapshot_idsnapshot_timeschema_versionchanges
42025-06-25 10:28:33.497-043{tables_inserted_into=[2]}
52025-06-25 10:38:19.959-043{tables_deleted_from=[2]}
62025-06-25 10:41:48.455-043{tables_inserted_into=[2]}

So it looks like the table was added to in snapshot 4, the data was deleted in snapshot 5, and the new data comes in at snapshot 6. Let’s get the superset of the data at snapshot 4 AND snapshot 6:

SELECT * FROM delorean.youtube_videos AT (VERSION => 4)
UNION ALL
SELECT * FROM delorean.youtube_videos AT (VERSION => 6)
idtitlechannel
WcSCYzI2peMDelfino Plaza (Super Mario Sunshine) - Mario Kart WorldSiIvaGunner
W4AcveHnDzgRetribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable NightSiIvaGunner
jhl5afLEKdoHatsune Miku World is Mine / ryo(supercell)Hatsune Miku
sqK-jh4TDXoMachine Love (feat. Kasane Teto)Jamie Page

You can see the power here right? The data is still safely stored in your bucket, so deletes don’t matter. It may be more inconvenient to access the data, but you can also time travel for the entire database at once:

ATTACH 'ducklake:timetravel_demo.ddb'
AS delorean_past
( DATA_PATH 's3://xe-ducklake/delorean'
, SNAPSHOT_VERSION 4
);

SELECT * FROM delorean_past.youtube_videos;
idtitlechannel
WcSCYzI2peMDelfino Plaza (Super Mario Sunshine) - Mario Kart WorldSiIvaGunner
W4AcveHnDzgRetribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable NightSiIvaGunner

Advanced temporal mechanics

You’re not limited to just running queries against the past, you can also connect to the database at a given point in time. This combined with making a local fork of the database lets you get into advanced temporal mechanics.

Let’s make a local copy of the database to debug the AI agent’s changes. Connect to the database in the past before the AI model messed things up:

ATTACH 'ducklake:timetravel_demo.ddb'
AS delorean_past
( DATA_PATH 's3://xe-ducklake/delorean'
, SNAPSHOT_VERSION 4
);

Cool, then let’s make a local copy of it at that point in time:

ATTACH 'ducklake:timetravel_local_copy.ddb'
AS local_delorean
( DATA_PATH 's3://xe-ducklake/delorean'
);

COPY FROM DATABASE delorean TO local_delorean;
DETACH delorean;
DETACH local_delorean;

The magic part of that is the COPY FROM DATABASE instruction. That makes a copy of the database locally so you can debug the AI agent’s change and prevent a future timeline from coming to pass the way the current one did. Then for the cherry on top, attach the local database with the same name as the remote one so that your agent is none the wiser:

ATTACH 'ducklake:timetravel_local_copy.ddb'
AS delorean
( DATA_PATH 's3://xe-ducklake/delorean'
);

Et voila! We have successfully forked the timeline and can now make any change we want without affecting the main timeline. Test it by running a SQL query:

SELECT * FROM delorean.youtube_videos
idtitlechannel
WcSCYzI2peMDelfino Plaza (Super Mario Sunshine) - Mario Kart WorldSiIvaGunner
W4AcveHnDzgRetribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable NightSiIvaGunner

The really cool part is that we did all that without affecting the data in Tigris:

SELECT * FROM 's3://xe-ducklake/delorean/main/youtube_videos/*.parquet';
idtitlechannel
WcSCYzI2peMDelfino Plaza (Super Mario Sunshine) - Mario Kart WorldSiIvaGunner
W4AcveHnDzgRetribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable NightSiIvaGunner
jhl5afLEKdoHatsune Miku World is Mine / ryo(supercell)Hatsune Miku
sqK-jh4TDXoMachine Love (feat. Kasane Teto)Jamie Page

When we made a copy of the data lake to hack on locally, all we were copying was the schemata of the database and references to objects in Tigris. At some level, the tables don’t logically exist, they’re really just a bunch of rules and references that Ducklake uses to rebuild your database on the fly! And because the tables quack like SQL tables enough, the illusion is maintained!

Another really cool thing is that every INSERT or UPDATE operation results in discrete parquet files being put into Tigris:

$ aws s3 ls s3://xe-ducklake/delorean/main/youtube_videos/
2025-06-25 10:28:34 1175 ducklake-0197a77d-905b-7624-8e32-c80c69470e52.parquet
2025-06-25 10:41:49 975 ducklake-0197a789-b1b2-797e-8bd5-a20905d2d73f.parquet

These parquet files are written to once and NEVER updated. This means that as your analytics pipelines or developers all over the world access things, they’re automatically fast and local thanks to Tigris’ global performance.

Victory achieved!

Now you can turn your analytics pipeline back on and go on with hacking up a storm. Next time you do those changes with your AI agent though, make sure to test them against a backup of the data lake just in case things go pear-shaped again. Ensure your AI isn’t renaming columns, deleting tables, changing the structure, etc. Ideally your schema changes should only add columns and never remove them. Your database tables should be treated as a public API.

To make a local backup of your data lake so your agent can break whatever the tokens deem worthy without breaking prod:

ATTACH 'ducklake:timetravel_local_copy.ddb'
AS local_delorean
( DATA_PATH 's3://xe-ducklake/delorean'
);

COPY FROM DATABASE delorean TO local_delorean;
DETACH delorean;
DETACH local_delorean;

ATTACH 'ducklake:timetravel_local_copy.ddb'
AS delorean
( DATA_PATH 's3://xe-ducklake/delorean'
);

And to attach to Ducklake in read-only mode so that it can’t break anything if it wanted to:

ATTACH 'ducklake:timetravel_demo.ddb'
AS stone_tablet
( DATA_PATH 's3://xe-ducklake/delorean'
, READ_ONLY -- <- attaches the ducklake database in read-only mode
);

And then you can go back to fearless vibe coding to make your dreams come true in the form of B2B SaaS! All your data will be safe in the cloud and fast to load anywhere in the world, even if you need to time travel a bit to get things working again.

Analytics databases with time travel!

Tigris lets you store your data everywhere, including your analytics data. When you use Tigris and Ducklake together, you get global performance to rival the cloud giants at a fraction of the cost. Query data from the past to bring you to a better future!