Tokenbin: An Artifact Overflow Layer for LLM-heavy Work

Saving the human-oriented interfaces that LLMs are flooding

May 08, 2026

Aiden Bai turned on sixteen code-review bots on his React Grab repo at the same time to compare them: bugbot, greptile, coderabbit, sentry, vercel agent, claude code review, codex code review, gemini code assist, and eight more. The morning after, Rhys Sullivan posted the result: 53,395 characters of agent commentary on a 100-line PR. The tweet did 458k impressions. The line was: “like it or not, this is what software engineering in 2026 will be.”

53,395 characters is roughly 13,000 tokens. On 100 lines. How will we survive this flood?

When you start offloading low-complexity knowledge work to agents, what stays in the human-facing tools is medium- and high-complexity work. The PR thread, the Linear ticket, the support reply. Those interfaces were designed for humans applying careful judgment, and they’re tight on purpose. The tokens involved in resolving the low-complexity layer have to play nice with that surface. Right now they don’t.

I built tokenbin because that mismatch was breaking real workflows at our company. It’s a private-first artifact overflow layer for LLM-heavy work. Agents write the bulky output there. The thread or ticket gets the curated synthesis and a link. The raw analysis stays addressable for the next agent that needs it.

The kernel

The whole model is four primitives:

artifact     immutable text blob, addressed by id
directory    mutable index of related artifacts
pin          named pointer to the current artifact that matters
capability   short-lived scoped token for one directory

Directories are named by convention: github/<org>/<repo>/pull/<n>, linear/<key>, support/<system>/<id>. An agent can resolve the right directory from the work item it’s looking at. Pins carry conventional names too: brief, synthesis, comment, handoff. The capability is what an orchestrator hands a sub-agent so it can write into one directory and nothing else.

The baseline deploy is one Cloudflare Worker, one R2 bucket, one shared API key, signed viewer URLs at /v/:id. Two minutes from a Deploy button to the first artifact. There is no database. The bucket is the index.

What it looks like in a PR review

A PR comes in. The orchestrator opens the directory github/<org>/<repo>/pull/1842. Three review agents (security, correctness, performance) write their full analysis as artifacts in that directory. Each one is a few thousand tokens of careful work. None of it goes in the PR thread.

A synthesis agent reads all three artifacts (it has the directory, so it gets them by listing), writes a single review comment that cites the strongest findings from each, and pins it as synthesis. The orchestrator posts only the synthesis to GitHub. Three short paragraphs, link to the directory for the reviewer who wants to read deeper.

A second round of agents critiques the synthesis. Same pattern: full analysis in the directory, a new synthesis pinned, the PR thread gets one updated comment.

The PR thread sees two comments, total. The directory sees every artifact every agent produced - addressable, durable, and out of the way. A reviewer who wants the receipts follows the link. Everyone else reads the PR like a normal PR.

One directory per work item. Agents write artifacts. One agent synthesizes. The synthesis pins. The human surface stays tight.

What it’s not

Tokenbin isn’t a task tracker, a wiki, deployment-wide search, or a binary asset store. It’s not a SaaS app. The baseline is a Worker you run on your own Cloudflare account. It’s not trying to replace GitHub, Linear, or Zendesk. It’s the overflow and coordination layer beside them.

Artifacts cap at 1 MiB of UTF-8 text. TTL classes are fixed at 1d, 7d, and 30d - long enough for a PR cycle, short enough that nothing accidentally becomes the system of record.

It’s the right tool when most of these are true: long machine output is useful but doesn’t belong in the main UI; multiple agents are working on one item and need shared context; the human at the end needs to scan the work in seconds. If everything you generate belongs directly in the system of record, you don’t need this.

Repo

github.com/Tetra-Research/tokenbin. v0.1.1. Four shipping surfaces over one kernel: a JSON API, hosted MCP at /mcp for Claude/Cursor/etc., a thin CLI, and first-party TypeScript and Python SDKs. The OpenAPI contract is in the repo. Anything we ship to one surface, you can ship to another.

Aiden and Rhys deserve credit for putting the failure mode on the timeline at scale. The number was already showing up in our own PR threads at smaller volume. Theirs just made it impossible to look away from.

If you’re feeling this (agent output flooding a surface humans were supposed to be scanning), send me what you’ve tried. I want to know what’s working.

Appendix

Deploy

Deploy to Cloudflare

Surfaces

JSON API
Hosted MCP at /mcp: works with Claude Desktop, Cursor, and other MCP clients.
TypeScript SDK + CLI: npm install tokenbin (-g for global CLI access).
Python SDK: pip install tokenbin.
OpenAPI contract: openapi/tokenbin.v1.json.

Infinite Playground

Discussion about this post

Ready for more?