Skip to main content
Blog / Build with Tigris

Build a Self-Updating Knowledge Base for Under $10

David Myriel · · 11 min read
David Myriel
Machine Learning Engineer
A dark IDE panel showing a git-style diff of overnight wiki updates: new pages on binary quantization, an updated vendor release timeline, and a contradiction surfaced on Turbopuffer pricing tiers. A small status block reads 'Flush ✓ · Snapshot v412 · Presigned digest 30d'.

I track vector search news for a living, and the field ships fast enough that my browser ended up with 200 unread tabs. Half arXiv papers, half vendor changelogs and HN threads I half-read on my phone and never came back to. Two weeks ago I had to write a one-pager on filtered vector search for a partner call and spent two hours rebuilding context I'd already had: release notes I'd skimmed in March, a paper from February, an HN thread that disagreed with a vendor's own pricing page.

A close-up of one diff hunk on a wiki vendor page: a competitor flipped binary quantization on behind a feature flag in v1.29 and the wiki picked up the change overnight.

So I built llm-digest, a nightly GitHub Actions cron that reads my feeds, updates a markdown wiki on a Tigris bucket, and posts a single digest URL to Slack. I wake up to the day's reading already done, with notes on what shipped and which one paper to actually click through.

The Karpathy insight

Andrej Karpathy's LLM Wiki gist describes the pattern in a paragraph. You drop raw sources into a sources/ folder, then you point an LLM agent at it with a prompt that tells it how to maintain a wiki. The agent extracts entities and updates a parallel wiki/ folder of plain markdown, with one page per concept, vendor, paper, or person. You read the wiki; the agent writes it.

Compile-time, not query-time

The difference from RAG: synthesis happens once, at ingest time, into a durable artifact you can read. RAG re-derives the answer on every query against raw chunks. The wiki gets denser over time. New sources update old pages, contradictions surface explicitly, and by month three the page on [[binary-quantization]] is a real reference with provenance back to every paper that contributed to it.

The catch in the gist is that ingestion is manual; you do it when you remember to, which in practice is rarely. My version doesn't wait for me to remember.

llm-digest: the wiki does the reading

A GitHub repo holds the schema, the tool implementations, and a list of RSS feeds, while a Tigris bucket holds the wiki itself. The scheduling layer is three lines of YAML:

# .github/workflows/daily-ingest.yml
on:
schedule:
- cron: "0 7 * * *" # 07:00 UTC daily

There's no web app, no vector store, no custom backend behind any of this. The user-facing interface is a URL in Slack and an Obsidian vault on my laptop synced from the bucket, and the 200 unread tabs are at 47 now and falling.

What's in the morning digest

A single page generated at the end of each ingest run, posted to Slack as a presigned URL:

# 2026-05-04 · Daily digest

7 sources ingested · 23 pages updated · 4 created · 1 contradiction.

**Theme:** binary quantization is moving from research to production.

- [[binary-quantization]] — added 2026 production reports section
- [[hnsw]] — a vendor flagged binary quantization in v1.29
- [[turbopuffer]] — pricing tier names contradicted by HN thread

**Recommended:** "BBQ at scale" — clearest single read of the day.

I click into the one or two pages worth reading and get on with my morning.

How it actually works

The whole thing is a sandwich: the LLM is on top, the Tigris bucket is on the bottom, and @tigrisdata/agent-shell sits in the middle as a JS-virtual filesystem. Every write the agent makes goes through agent-shell's in-memory buffer; the bucket only sees those writes when the run cleanly reaches flush() at the end. If anything throws on the way there, the buffer is discarded and the bucket is byte-for-byte unchanged. The rollback is a structural property of the runtime, not a pattern I have to maintain in user code.

1The AgentAnthropic SDK loop, runs in your Node processwriteFile() · readFile() · listDir()tool calls2agent-shellin-memory buffer — every write stages here, not in the bucket yetwiki/binq.mdwiki/hnsw.mdsources/…digest.mdon throwdiscard()on clean runflush()3abuffer droppedthe live bucket staysbyte-for-byte unchanged3bTigris bucketwiki/ · sources/ · state.jsonatomic promote, all or nothing

The script at the heart of this is small. Mount the bucket via agent-shell, run a custom Anthropic SDK agent loop with the wiki schema as the system prompt, let the loop call tools that route every read and write through the shell handle. On clean return, flush. On any throw, discard.

// scripts/lib/shell.ts (the shape that matters)
export async function withMountedBucket<T>(
fn: (shell: WikiBucketHandle) => Promise<T>
): Promise<T> {
const shell = mountWikiBucket();
try {
const result = await fn(shell);
await shell.flush(); // atomic promote to Tigris
return result;
} catch (e) {
await shell.discard(); // buffer dropped; bucket untouched
throw e;
}
}

Nine lines of fn-and-catch is the entire rollback story. There's no separate restore primitive and no scratch prefix in the bucket to garbage-collect. The buffer lives in process memory until flush; if the process exits without flushing, the buffer is gone with it and the live bucket sees nothing.

The practical consequence: if a runner crashes at 3am, you wake up to yesterday's wiki, not a half-edited one. The same bucket you went to bed with, plus a Slack notification telling you the run aborted.

Why the SDK and not headless Claude Code

Claude Code in -p mode runs as a separate child process and writes to the OS filesystem. agent-shell's JS-virtual buffer can't see those writes, so the atomicity guarantee only holds if the agent runs in-process. The SDK is the right runner here for that one reason.

Cleaning up the input

The agent's fetch_url tool wraps the response body in Mozilla Readability (the algorithm Firefox Reader View uses). Strips nav, footer, ads, comment threads. On real-world URLs from my feeds:

URLRaw HTMLAfter ReadabilityReduction
HN discussion page863 t93 t89.2%
arXiv abstract12,169 t702 t94.2%

Letting Claude eat raw HTML would burn 10–100× more tokens for the same result. This is the difference between a $5+ run and a $0.23 run; the arithmetic shows up in "I ran this last night" below.

Where Tigris and agent-shell earn their place

The project follows a rule: don't pitch a tool unless it solves a real problem.

The Tigris bucket is the floor. A scheduled job needs durable shared storage that any runner can mount and find the same wiki in. Last week's GitHub Actions runner, this morning's manual workflow_dispatch from my laptop, the rclone mount on my phone for read-side browsing: all looking at the same bytes through the S3-compatible API. Table-stakes work, and Tigris does it without a Tigris-specific code path.

Snapshots cover the failures agent-shell can't. agent-shell's flush makes a single run atomic. It doesn't help when a run "succeeds" but quietly produces a wiki I don't actually want, where the agent drifted from the schema in a way that didn't trip a validator. I might not notice for three days. createBucketSnapshot runs at the top of every ingest, and fork(srcBucket, recovered, { snapshotVersion }) lets me walk back. Spin up a fork pointing at Friday's snapshot, repoint TIGRIS_STORAGE_BUCKET, and the next ingest builds on the recovered version. The verdict in TIGRIS_FEATURES.md is "convenient, not load-bearing" and that's accurate. Most nights it's a no-op. The night I need it, it's a one-line CLI call.

Presigned URLs are the delivery. The digest is the user-facing artifact of this whole pipeline; without delivery the system doesn't do anything for me. getPresignedUrl(path, 30 * 86400) turns the markdown digest into a one-line URL I drop in Slack. The phone opens it without auth, the link expires in 30 days, and the same primitive lets me share a digest with a colleague who isn't on my GitHub or paste a wiki page into a Notion comment without exporting anything.

@tigrisdata/agent-shell is what makes "scheduled and unattended" believable. A nightly job is exactly the workload that benefits from atomic-or-nothing writes. The buffer-then-flush pattern is built in; the throw-discards-it semantics come along structurally. Without it you'd own the staging prefix, the runId namespacing, the partial-flush failure mode, and the GC sweep yourself. That's about thirty lines of pattern code that goes away once agent-shell takes responsibility for the buffer.

I ran this last night

End-to-end, against a fresh Tigris bucket, with two URLs from a Hacker News RSS feed. Here's the per-iteration log straight off stderr:

ingest.start
ingest.snapshot snapshotVersion=1778000882992955252 # +2.0s
iter 1 stop=tool_use tools=[fetch_url, fetch_url] in= 512 out= 152
iter 2 stop=tool_use tools=[fetch_url, list_wiki, write_source] in= 7,425 out=1,399
iter 3 stop=tool_use tools=[fetch_url, list_wiki] in= 15,712 out=1,557
iter 4 stop=tool_use tools=[write_source] in= 17,289 out=1,855
iter 5 stop=tool_use tools=[write_wiki_page] in= 17,819 out=2,814
iter 6 stop=tool_use tools=[write_wiki_page] in= 19,088 out=4,295
iter 7 stop=tool_use tools=[write_wiki_page] in= 21,483 out=4,894
iter 8 stop=tool_use tools=[write_wiki_page] in= 22,137 out=5,551
iter 9 stop=tool_use tools=[write_wiki_page] in= 22,850 out=6,130
iter 10 stop=tool_use tools=[write_wiki_page] in= 23,482 out=6,667
iter 11 stop=tool_use tools=[write_wiki_page] in= 24,071 out=7,570
iter 12 stop=tool_use tools=[write_wiki_page] in= 25,562 out=8,049
iter 13 stop=tool_use tools=[list_wiki] in= 27,025 out=8,100
iter 14 stop=tool_use tools=[list_wiki, list_wiki, list_wiki] in= 27,142 out=8,228
iter 15 stop=tool_use tools=[save_digest] in= 27,489 out=9,262
iter 16 stop=end_turn tools=[] in= 28,798 out=9,771
ingest.flushed iterations=16 inputTokens=28,798 outputTokens=9,771
ingest.presigned digestPath=wiki/digest/2026-05-05.md
ingest.done newUrlsCount=2 digestUrl=https://...

Three minutes fifteen seconds, sixteen iterations, eight wiki pages plus two sources plus the digest. About $0.23 in API spend. Atomic flush at the end, state.json updated, presigned URL generated. A separate verification I ran right before this confirmed the discard semantics held: write a sentinel file inside withMountedBucket, throw, re-mount, list the bucket. The sentinel was nowhere to be found and the live tree was byte-for-byte identical to before the run.

The verification artifacts live alongside the code: scripts/experiments/shell-flush.ts confirms a thrown error doesn't promote a sentinel, scripts/experiments/byte-equal-after-throw.ts confirms the bucket listing is identical pre and post. If you fork the repo and run them against your own bucket, you get the same output. The atomicity isn't a footnote in the docs; it's a script you can re-run any time you want to retest the claim against a future version of agent-shell or the SDK.

How I work with it

My week with the wiki has four touchpoints and that's all of them. Mornings I read the digest in Slack on my phone over coffee, then click into one or two pages worth reading in full. About once a week I edit config/feeds.txt if there's a new vendor or paper venue I want tracked. Maybe once a month I open Obsidian, pointed at the bucket via rclone, and browse the graph for connections the agent missed. When I find a URL between scheduled runs that I want tonight's wiki to absorb, I append it to queue.txt in the bucket and the next ingest picks it up alongside the day's RSS.

That's the entire interaction surface. The wiki is the UI, and the wiki lives in your Obsidian vault.

The FOMO is gone. Not because I'm seeing every paper (I see fewer than before) but because I trust the wiki to surface the ones that matter, and I trust myself to walk away from the ones it summarized for me. When you can answer "what's the current state of binary quantization" in three seconds by clicking a [[wiki-link]], the anxiety goes somewhere quieter.

Build your own LLM digest

The repo is open. Fork it, point it at your feeds, set five secrets, and your wiki starts updating tonight. Your first 5 GB on Tigris are free.