BEAM Is a Suspiciously Good Fit for Agents

The actor model from 1986 is the agent model from 2026

Apr 29, 2026

The default agent stack is Python or TypeScript. The runtime never enters the conversation.

It should. The BEAM, the runtime under Elixir and Erlang, was built for the shape of work agents actually do. It’s been sitting there since 1986. OpenAI shipped Symphony, their coding-agent orchestration reference implementation, primarily in Elixir. Almost no one else is building there.

What an agent actually is, as a workload

An agent is not a stateless endpoint. It’s not a single long-lived connection. It’s something in the messy middle → bursty, stateful compute.

Watch what happens when you give a coding agent a task. It holds a scratchpad of context across a dozen tool calls. It opens a few MCP connections and keeps them alive. It spawns sub-agents that go think about a thing and come back. It streams tokens out to a UI while it’s still mid-thought. A tool fails. The model retries, or routes to a different sub-agent, keeps going. You hit cancel. The whole tree should collapse cleanly without leaking workers.

Hold private state. Supervise children. Take messages from outside. Recover from failure. Die clean when told to.

Every agent framework I’ve used in Python or TypeScript is reinventing this in user space. Actor libraries, message buses, supervisor primitives, worker pools, registries, durable-execution layers. All built on top of asyncio or threadpools or queues, all with their own subtle bugs in supervision and cancellation and back-pressure.

There’s a runtime where every line of that list is a free primitive.

The BEAM, briefly

The BEAM was built in the 1980s at Ericsson for telecom switches. The workload: handle a hundred thousand simultaneous calls, each stateful, each independent, never bring the system down.

Lightweight processes. Not OS threads. Each one has a private heap, a private stack, a private mailbox, its own garbage collector. A fresh process is about 2.6KB. A million of them is under 1GB. Discord runs ~12 million concurrent users on this; WhatsApp pushed 2 million TCP connections per server in 2012. One process per agent, per sub-agent, per tool call, per MCP connection. That’s mighty fine math.

Mailboxes. Every process has an inbox. Send a message, it lands. Inter-agent comms map directly: delegate, ask, broadcast, cancel. No separate message bus.

Fan-out. Task and Task.Supervisor give you structured concurrency: spawn N units of work in parallel, supervised, with per-item timeouts and isolation. Task.Supervisor.async_stream(sup, candidates, &evaluate/1, max_concurrency: 8, timeout: 30_000). Eight in flight, each crash is local, slow ones die at 30 seconds, you get back per-item success or failure. Sub-agent dispatch is this primitive.

Process registries. Name a long-lived process and address it from anywhere by name. With Horde (the distributed CRDT-backed Registry), “the agent for user 42” routes correctly whether the agent is on this node or another node in the cluster, whether it’s the original process or got restarted ten times. Without registries you’re back to a Redis lookup table mapping IDs to network endpoints. The personal-assistant case lives or dies on this.

Supervision. Wrap a process in a supervisor. It crashes, the supervisor restarts it under a strategy you declared. A tool call fails, the agent loop keeps running. Supervision fixes infrastructure failures, not LLM reasoning failures. Restarting a process that crashed because the model returned malformed JSON gets you the same malformed JSON.

Hot reload. Swap a module’s code while processes are still running. Patch a buggy MCP handler without dropping connections. Update a tool implementation without killing in-flight sessions. (Most teams still blue/green-deploy in production, but the primitive is right there.)

Live introspection. Attach to a running node and look at what any process is doing right now: mailbox length, current state, current call, memory, reductions. No instrumentation you planned for in advance required. José Valim’s framing: “agents can programmatically inspect running processes, state, and pending work, mirroring how experienced engineers diagnose production issues.”

Every primitive Python and TypeScript agent frameworks are reinventing in user space exists here at the runtime level, load-tested for thirty years.

It gets sharper for personal-assistant agents

Coding agents are one shape. Short-lived. Single-channel (one TTY or one IDE). No cross-session memory required. No concurrent-inbox problem. The BEAM fit there is real but moderate. You could do it in Python and be fine.

Personal-assistant agents are a different shape. Long-lived daemons. Multi-channel inboxes (Telegram, Slack, WhatsApp, Signal, SMS, email) fanning into one agent process. Per-session writers serializing concurrent messages. Persistent memory across weeks. Dynamic skills the agent edits over time.

OpenClaw (Peter Steinberger). Open-source, self-hosted, reachable over WhatsApp, Telegram, Slack, Discord, Signal, iMessage, WebChat through one Gateway process. The architecture, in its own words: “queue-based serialization, single-writer pattern per session, sequence numbers, append-only JSONL session logs, lazy loading and memory cache.”

Hermes Agent (Nous Research, the agent product, not their model family). Long-lived self-hosted daemon. 15+ messaging platforms fanning into one gateway. Curated long-term memory files (MEMORY.md, USER.md, SOUL.md), 47 tools across 19 toolsets, FTS5 session search with LLM summarization, dynamic skill creation, context compression for long conversations.

“Single-writer pattern per session” is one GenServer per session. “Long-lived gateway process” is a supervised process holding a connection open. “Dynamic skill creation” is a process registry plus hot code reload. “Append-only session logs” is what you do because you don’t trust your in-memory state to survive a redeploy. That’s the actual missing primitive; more on it below.

Both products are rebuilding OTP in user space. Both are written in TypeScript or Python.

George Guimarães wrote that the actor model Erlang introduced in 1986 is the agent model AI is rediscovering in 2026. The personal-assistant case is the sharper version. It’s not that the abstract model is the same. It’s that the specific operational primitives these teams are advertising as architectural wins are the literal section headings of the OTP design principles documentation. Gateway, queue, single-writer, registry, supervision.

Covering the weakspots with abstraction

Eval and observability tooling is thin. Python has Inspect AI, LangSmith, Braintrust, Weights & Biases Weave, dspy. Elixir has nothing comparable. You roll your own or call out to a Python eval harness.

Durable execution doesn’t have a great answer. Durable execution is the term of art for “resume from exactly where I crashed, without re-running side effects”: Temporal, Inngest, Restate, AWS Step Functions. Your agent runs for six hours, hits Stripe, sends an email, the node dies; you don’t want to re-charge or re-send on restart. The BEAM survives process crashes via supervision. It does not give you Temporal-style deterministic replay across node death and redeploys. Oban (Postgres-backed jobs, with workflow primitives in Oban Pro) is the closest thing native to Elixir, and you build the rest yourself with checkpoints and idempotency keys.

Training-data asymmetry. The model writing your agent code has seen 100x more Python agent code than Elixir agent code. José Valim cites a Tencent benchmark where Elixir hit 97.5% problem-solve rate, the highest of 20 languages tested. Encouraging. Not dispositive for agent code specifically.

“Let it crash” doesn’t help with reasoning failures. The runtime can’t fix the model being wrong.

What falls out of this is the control plane / execution plane split.

The control plane is what the BEAM was designed for: session state, supervision, process registry, routing, lifecycle, durable jobs, multi-channel gateway. Elixir is the right runtime here.

The execution plane belongs in Python or TypeScript: model API calls, tools, eval harnesses, embeddings, anything wrapping the Python ML ecosystem.

The interface plane is split between platforms (Telegram, Slack, WhatsApp) and frontends (TypeScript, usually). Elixir’s role is the gateway processes that broker between them and the control plane.

The interop boundaries are already standard. Tools and skills become MCP servers in any language. The runtime is just an MCP host. One-off Python or TS code runs as a supervised port over stdio JSON-RPC, the way mix already shells out to Node. Model API calls go out over HTTP, streaming responses fan back through PubSub. Eval harnesses subscribe to telemetry the runtime emits.

That’s probably what most teams should actually do.

Appendix: References & resources

The BEAM and Elixir

Elixir — the language.
Erlang / BEAM — the runtime underneath.
The BEAM Book — Erik Stenman’s deep dive into the runtime internals. The reference for “what is actually happening in there.”
OTP Design Principles — the canonical writeup of supervision trees, GenServers, applications.
Hauleth — How much memory is needed to run 1M Erlang processes? — source for the 2.6KB-per-process number.
Discord — How Discord Scaled Elixir to 5,000,000 Concurrent Users — the canonical “BEAM at scale” production case.
WhatsApp — 1 million is so 2011 — the original “we run a few million TCP connections on a single Erlang server” announcement.

BEAM primitives I named

Task and Task.Supervisor — bounded fan-out with timeouts.
Registry — local in-memory process registry.
Horde — distributed registry + supervisor for multi-node agent fleets.
Supervisor — supervision tree behaviors.
Hot code reload guide — what it actually looks like to upgrade a running process.

Agents in Elixir

Symphony — OpenAI’s coding-agent orchestration spec, with a primarily-Elixir reference implementation.
José Valim — Why Elixir is the best language for AI — the strongest “the BEAM is the right runtime for AI” argument from inside the language. Source for the live-introspection framing and the Tencent benchmark cite.
George Guimarães — Your Agent Orchestrator Is Just a Bad Clone of Elixir — the 1986/2026 line, plus a primitive-by-primitive table comparing agent-framework reinventions to OTP.
George Guimarães — What the critics got right about Elixir and AI agents — the honest follow-up. The “let it crash doesn’t fix reasoning failures” caveat is sharper here than anywhere else I found.

Personal-assistant agent products

OpenClaw — Peter Steinberger’s open-source, self-hosted personal agent. GitHub. Architecture writeup.
Hermes Agent — Nous Research’s self-hosted personal AI agent. Architecture docs.

Durable execution and the gap

Temporal, Inngest, Restate, AWS Step Functions — the durable-execution category.
Oban and Oban Pro — the closest thing native to Elixir.

Eval and observability tooling I name-checked

Inspect AI, LangSmith, Braintrust, Weights & Biases Weave, dspy.

Adjacent

Model Context Protocol — the spec referenced throughout.

Discussion about this post

Ready for more?