[Blog](/blog/.md)

<!-- -->

/

<!-- -->

[Build with Tigris](/blog/tags/build-with-tigris/.md)

# Build a Self-Updating Knowledge Base for Under $10

David Myriel · May 5, 2026 ·

<!-- -->

11 min read

[![David Myriel](https://github.com/davidmyriel.png)](https://github.com/davidmyriel)

[David Myriel](https://github.com/davidmyriel)

Machine Learning Engineer

![A dark IDE panel showing a git-style diff of overnight wiki updates: new pages on binary quantization, an updated vendor release timeline, and a contradiction surfaced on Turbopuffer pricing tiers. A small status block reads 'Flush ✓ · Snapshot v412 · Presigned digest 30d'.](/blog/assets/images/hero-image-1703c383f064ce635e3b86a5da198479.webp)

I track vector search news for a living, and the field ships fast enough that my browser ended up with 200 unread tabs. Half arXiv papers, half vendor changelogs and HN threads I half-read on my phone and never came back to. Two weeks ago I had to write a one-pager on filtered vector search for a partner call and spent two hours rebuilding context I'd already had: release notes I'd skimmed in March, a paper from February, an HN thread that disagreed with a vendor's own pricing page.

![A close-up of one diff hunk on a wiki vendor page: a competitor flipped binary quantization on behind a feature flag in v1.29 and the wiki picked up the change overnight.](/blog/assets/images/competition-b27f342926ff62653be88af28afa720b.webp)

So I built [`llm-digest`](https://github.com/davidmyriel/llm-digest), a nightly GitHub Actions cron that reads my feeds, updates a markdown wiki on a [Tigris](https://www.tigrisdata.com/) bucket, and posts a single digest URL to Slack. I wake up to the day's reading already done, with notes on what shipped and which one paper to actually click through.

## The Karpathy insight[​](#the-karpathy-insight "Direct link to The Karpathy insight")

[Andrej Karpathy's LLM Wiki gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) describes the pattern in a paragraph. You drop raw sources into a `sources/` folder, then you point an LLM agent at it with a prompt that tells it how to maintain a wiki. The agent extracts entities and updates a parallel `wiki/` folder of plain markdown, with one page per concept, vendor, paper, or person. You read the wiki; the agent writes it.

Compile-time, not query-time

The difference from RAG: synthesis happens **once, at ingest time**, into a durable artifact you can read. RAG re-derives the answer on every query against raw chunks. The wiki gets denser over time. New sources update old pages, contradictions surface explicitly, and by month three the page on `[[binary-quantization]]` is a real reference with provenance back to every paper that contributed to it.

The catch in the gist is that ingestion is manual; you do it when you remember to, which in practice is rarely. My version doesn't wait for me to remember.

## llm-digest: the wiki does the reading[​](#llm-digest-the-wiki-does-the-reading "Direct link to llm-digest: the wiki does the reading")

A [GitHub repo](https://github.com/davidmyriel/llm-digest) holds the schema, the tool implementations, and a list of RSS feeds, while a Tigris bucket holds the wiki itself. The scheduling layer is three lines of YAML:

```
# .github/workflows/daily-ingest.yml
on:
  schedule:
    - cron: "0 7 * * *" # 07:00 UTC daily
```

There's no web app, no vector store, no custom backend behind any of this. The user-facing interface is a URL in Slack and an Obsidian vault on my laptop synced from the bucket, and the 200 unread tabs are at 47 now and falling.

## What's in the morning digest[​](#whats-in-the-morning-digest "Direct link to What's in the morning digest")

A single page generated at the end of each ingest run, posted to Slack as a presigned URL:

```
# 2026-05-04 · Daily digest

7 sources ingested · 23 pages updated · 4 created · 1 contradiction.

**Theme:** binary quantization is moving from research to production.

- [[binary-quantization]] — added 2026 production reports section
- [[hnsw]] — a vendor flagged binary quantization in v1.29
- [[turbopuffer]] — pricing tier names contradicted by HN thread

**Recommended:** "BBQ at scale" — clearest single read of the day.
```

I click into the one or two pages worth reading and get on with my morning.

## How it actually works[​](#how-it-actually-works "Direct link to How it actually works")

The whole thing is a sandwich: the LLM is on top, the Tigris bucket is on the bottom, and [`@tigrisdata/agent-shell`](https://www.npmjs.com/package/@tigrisdata/agent-shell) sits in the middle as a JS-virtual filesystem. Every write the agent makes goes through agent-shell's in-memory buffer; the bucket only sees those writes when the run cleanly reaches `flush()` at the end. If anything throws on the way there, the buffer is discarded and the bucket is byte-for-byte unchanged. The rollback is a structural property of the runtime, not a pattern I have to maintain in user code.

The script at the heart of this is small. Mount the bucket via agent-shell, run a custom [Anthropic SDK](https://docs.anthropic.com/en/api/messages) agent loop with the wiki schema as the system prompt, let the loop call tools that route every read and write through the shell handle. On clean return, flush. On any throw, discard.

```
// scripts/lib/shell.ts (the shape that matters)
export async function withMountedBucket<T>(
  fn: (shell: WikiBucketHandle) => Promise<T>
): Promise<T> {
  const shell = mountWikiBucket();
  try {
    const result = await fn(shell);
    await shell.flush(); // atomic promote to Tigris
    return result;
  } catch (e) {
    await shell.discard(); // buffer dropped; bucket untouched
    throw e;
  }
}
```

Nine lines of fn-and-catch is the entire rollback story. There's no separate restore primitive and no scratch prefix in the bucket to garbage-collect. The buffer lives in process memory until flush; if the process exits without flushing, the buffer is gone with it and the live bucket sees nothing.

The practical consequence: if a runner crashes at 3am, you wake up to yesterday's wiki, not a half-edited one. The same bucket you went to bed with, plus a Slack notification telling you the run aborted.

### Why the SDK and not headless Claude Code[​](#why-the-sdk-and-not-headless-claude-code "Direct link to Why the SDK and not headless Claude Code")

Claude Code in `-p` mode runs as a separate child process and writes to the OS filesystem. agent-shell's JS-virtual buffer can't see those writes, so the atomicity guarantee only holds if the agent runs in-process. The SDK is the right runner here for that one reason.

### Cleaning up the input[​](#cleaning-up-the-input "Direct link to Cleaning up the input")

The agent's `fetch_url` tool wraps the response body in [Mozilla Readability](https://github.com/mozilla/readability) (the algorithm Firefox Reader View uses). Strips nav, footer, ads, comment threads. On real-world URLs from my feeds:

| URL                | Raw HTML | After Readability | Reduction |
| ------------------ | -------- | ----------------- | --------- |
| HN discussion page | 863 t    | 93 t              | 89.2%     |
| arXiv abstract     | 12,169 t | 702 t             | 94.2%     |

Letting Claude eat raw HTML would burn 10–100× more tokens for the same result. This is the difference between a $5+ run and a $0.23 run; the arithmetic shows up in "I ran this last night" below.

## Where Tigris and agent-shell earn their place[​](#where-tigris-and-agent-shell-earn-their-place "Direct link to Where Tigris and agent-shell earn their place")

The project follows a rule: don't pitch a tool unless it solves a real problem.

**The Tigris bucket is the floor.** A scheduled job needs durable shared storage that any runner can mount and find the same wiki in. Last week's GitHub Actions runner, this morning's manual `workflow_dispatch` from my laptop, the `rclone` mount on my phone for read-side browsing: all looking at the same bytes through the S3-compatible API. Table-stakes work, and Tigris does it without a Tigris-specific code path.

**Snapshots cover the failures agent-shell can't.** agent-shell's flush makes a single run atomic. It doesn't help when a run "succeeds" but quietly produces a wiki I don't actually want, where the agent drifted from the schema in a way that didn't trip a validator. I might not notice for three days. [`createBucketSnapshot`](https://www.tigrisdata.com/docs/buckets/snapshots/) runs at the top of every ingest, and [`fork(srcBucket, recovered, { snapshotVersion })`](https://www.tigrisdata.com/docs/buckets/forking/) lets me walk back. Spin up a fork pointing at Friday's snapshot, repoint `TIGRIS_STORAGE_BUCKET`, and the next ingest builds on the recovered version. The verdict in `TIGRIS_FEATURES.md` is "convenient, not load-bearing" and that's accurate. Most nights it's a no-op. The night I need it, it's a one-line CLI call.

**Presigned URLs are the delivery.** The digest is the user-facing artifact of this whole pipeline; without delivery the system doesn't do anything for me. [`getPresignedUrl(path, 30 * 86400)`](https://www.tigrisdata.com/docs/objects/presigned-urls/) turns the markdown digest into a one-line URL I drop in Slack. The phone opens it without auth, the link expires in 30 days, and the same primitive lets me share a digest with a colleague who isn't on my GitHub or paste a wiki page into a Notion comment without exporting anything.

**`@tigrisdata/agent-shell` is what makes "scheduled and unattended" believable.** A nightly job is exactly the workload that benefits from atomic-or-nothing writes. The buffer-then-flush pattern is built in; the throw-discards-it semantics come along structurally. Without it you'd own the staging prefix, the runId namespacing, the partial-flush failure mode, and the GC sweep yourself. That's about thirty lines of pattern code that goes away once agent-shell takes responsibility for the buffer.

## I ran this last night[​](#i-ran-this-last-night "Direct link to I ran this last night")

End-to-end, against a fresh Tigris bucket, with two URLs from a Hacker News RSS feed. Here's the per-iteration log straight off stderr:

```
ingest.start
ingest.snapshot snapshotVersion=1778000882992955252  # +2.0s
iter  1 stop=tool_use tools=[fetch_url, fetch_url]            in=    512  out=  152
iter  2 stop=tool_use tools=[fetch_url, list_wiki, write_source]   in= 7,425  out=1,399
iter  3 stop=tool_use tools=[fetch_url, list_wiki]            in= 15,712  out=1,557
iter  4 stop=tool_use tools=[write_source]                    in= 17,289  out=1,855
iter  5 stop=tool_use tools=[write_wiki_page]                 in= 17,819  out=2,814
iter  6 stop=tool_use tools=[write_wiki_page]                 in= 19,088  out=4,295
iter  7 stop=tool_use tools=[write_wiki_page]                 in= 21,483  out=4,894
iter  8 stop=tool_use tools=[write_wiki_page]                 in= 22,137  out=5,551
iter  9 stop=tool_use tools=[write_wiki_page]                 in= 22,850  out=6,130
iter 10 stop=tool_use tools=[write_wiki_page]                 in= 23,482  out=6,667
iter 11 stop=tool_use tools=[write_wiki_page]                 in= 24,071  out=7,570
iter 12 stop=tool_use tools=[write_wiki_page]                 in= 25,562  out=8,049
iter 13 stop=tool_use tools=[list_wiki]                       in= 27,025  out=8,100
iter 14 stop=tool_use tools=[list_wiki, list_wiki, list_wiki] in= 27,142  out=8,228
iter 15 stop=tool_use tools=[save_digest]                     in= 27,489  out=9,262
iter 16 stop=end_turn  tools=[]                               in= 28,798  out=9,771
ingest.flushed iterations=16 inputTokens=28,798 outputTokens=9,771
ingest.presigned digestPath=wiki/digest/2026-05-05.md
ingest.done newUrlsCount=2 digestUrl=https://...
```

Three minutes fifteen seconds, sixteen iterations, eight wiki pages plus two sources plus the digest. **About $0.23 in API spend.** Atomic flush at the end, state.json updated, presigned URL generated. A separate verification I ran right before this confirmed the discard semantics held: write a sentinel file inside `withMountedBucket`, throw, re-mount, list the bucket. The sentinel was nowhere to be found and the live tree was byte-for-byte identical to before the run.

The verification artifacts live alongside the code: `scripts/experiments/shell-flush.ts` confirms a thrown error doesn't promote a sentinel, `scripts/experiments/byte-equal-after-throw.ts` confirms the bucket listing is identical pre and post. If you fork the repo and run them against your own bucket, you get the same output. The atomicity isn't a footnote in the docs; it's a script you can re-run any time you want to retest the claim against a future version of agent-shell or the SDK.

## How I work with it[​](#how-i-work-with-it "Direct link to How I work with it")

My week with the wiki has four touchpoints and that's all of them. Mornings I read the digest in Slack on my phone over coffee, then click into one or two pages worth reading in full. About once a week I edit `config/feeds.txt` if there's a new vendor or paper venue I want tracked. Maybe once a month I open Obsidian, pointed at the bucket via `rclone`, and browse the graph for connections the agent missed. When I find a URL between scheduled runs that I want tonight's wiki to absorb, I append it to `queue.txt` in the bucket and the next ingest picks it up alongside the day's RSS.

That's the entire interaction surface. The wiki is the UI, and the wiki lives in your Obsidian vault.

The FOMO is gone. Not because I'm seeing every paper (I see fewer than before) but because I trust the wiki to surface the ones that matter, and I trust myself to walk away from the ones it summarized for me. When you can answer "what's the current state of binary quantization" in three seconds by clicking a `[[wiki-link]]`, the anxiety goes somewhere quieter.

Build your own LLM digest

The repo is open. Fork it, point it at your feeds, set five secrets, and your wiki starts updating tonight. Your first 5 GB on Tigris are free.

[Get the repo](https://github.com/davidmyriel/llm-digest)

**Tags:**

* [Build with Tigris](/blog/tags/build-with-tigris/.md)
* [AI](/blog/tags/ai/.md)
* [Agents](/blog/tags/agents/.md)
