The Runtime Under the Harness

A good harness makes an agent smart. A good runtime keeps it alive. Here's everything a production agent needs underneath the loop — and how Herm ships it in one API call.

Alex Liu
9 min read
The Runtime Under the Harness

TL;DR: The harness — prompts, tools, skills — is what makes an agent good at its job. But the moment that agent serves real customers, a second system appears underneath it: the runtime. Durable execution, memory, isolation, oversight, observability, scheduling. You can build all of it yourself over a couple of quarters, or you can get it with your agent in a single API call. We think that choice decides which agent startups ship and which ones stall.

Two systems, not one

When Rajit wrote about why we stopped building our own agent, the argument was about the harness: the loop, the memory, the skills, the self-learning. Harnesses are commoditizing fast, so don't write one.

But there's a second argument hiding under the first. Even after you adopt a great harness, an agent that runs on your laptop and an agent that runs for paying customers are different machines. The difference isn't intelligence. It's everything that happens when the process dies mid-task, when customer two shows up, when a run needs a human's sign-off, when something breaks at 3am and you need to know why.

That layer is the runtime. Here's what it has to do, and how we built Herm to do it.

Survive the loop

An agent loop is not a web request. A request returns in milliseconds; an agent run can span minutes or hours, burn dozens of model calls, spin up subagents, and then sit idle waiting for a person to make a decision. Treating that as one fragile in-memory process means a deploy, a crash, or a flaky network erases everything the run already paid for.

The runtime's first job is making runs durable. In Herm, every customer's agent runs in its own container with a persistent filesystem, and session state survives restarts. A run that gets interrupted resumes where it stopped instead of replaying from zero. An agent that's waiting on input isn't holding a worker hostage — it's genuinely asleep, and it wakes up with its context intact whether the answer arrives in thirty seconds or three days.

Everything else on this list builds on that property. Scheduled work, human oversight, long research tasks — none of it is possible if the agent forgets who it is every time a process recycles.

Remember across sessions, not just within them

Agents need two memories, and conflating them causes real product bugs.

Within a session, the harness already does the work: Hermes manages per-session context and compaction so a long conversation doesn't drown the model. That memory belongs to the thread and ends with it.

Across sessions is the part teams underestimate. Your customer's preferences, their brand voice, the correction they gave the agent last Tuesday — that context belongs to the customer, not to any conversation. Hermes ships observational memory for exactly this, and Herm persists it per deployment, so every customer's agent accumulates its own understanding without any customer's context bleeding into another's.

This is also the most valuable data your product will generate. Months of accumulated preferences are why a customer can't churn to a competitor without losing something real. It should live somewhere you can see and query — in Herm, it's files on the agent's filesystem, not an encrypted blob inside someone else's black box.

Keep customers apart

Single-player agents have no tenancy problems. Customer two creates three at once:

  • Data isolation. Customer A's agent must never read customer B's files, memory, or sessions. Herm gives every deployment its own container and filesystem — isolation by construction, not by query filter.
  • Acting on the customer's behalf. The magic moments come from connectors: the agent reads Meta Ads performance, pulls assets from Google Drive, sends an email through Resend. That requires customer credentials, which requires handling them like they're radioactive. Herm's deployments take credential references — "META_ADS_TOKEN": "sec_meta_ads_token" — so raw secrets stay in a vault and never appear in prompts, logs, or the agent's own filesystem.
  • Operating the system. Your engineers need to create, inspect, and tear down deployments without touching customer data. That's what the API key and deployment model are for: POST /v1/deployments is an operator action, scoped away from anything the end customer's agent can do.

Let humans interrupt

Most of an agent's value comes from running unattended. All of its risk concentrates in the few moments it shouldn't. Sending the email, posting the ad, spending the budget — consequential actions deserve a checkpoint where a person sees exactly what's about to happen and can approve, edit, or redirect.

Hermes calls this steering, and it's built into the loop rather than bolted on: the agent pauses, surfaces what it intends to do, and continues with whatever the human decides. Because runs are durable, "pauses" is literal — the agent can wait a weekend for an approval without costing you anything, then proceed as if no time passed.

The same mechanism covers the inverse case: the agent hits a fork it shouldn't resolve by guessing ("two campaigns match that name — which one?") and asks instead. Cheap question, expensive mistake avoided.

Stay visible in real time

While the agent works, the user is watching. Two problems follow.

First, progress needs to stream. A spinner for ninety seconds reads as a hang; the same ninety seconds narrated — searching, found four candidates, generating the second variation — reads as work. Every Herm deployment exposes an SSE events endpoint, so your UI renders tokens, tool calls, and status changes as they happen. Wire it to your chat component and you're done.

Second, users don't wait their turn. They send a message, reconsider, and send a correction while the first run is still going. The runtime has to pick a policy — queue the new message, or fold it into the run in flight. Because Hermes sessions are durable and steerable, a mid-run message becomes steering input rather than a state-corrupting surprise.

Watch what it actually did

You can't reason about an agent from its source code, because the model writes the control flow at runtime. When a customer says "it kept generating the same video over and over," the only useful artifact is the record of what actually happened: every message, every tool call, every result, in order.

Herm keeps that record per session, and the same SSE stream that powers your UI doubles as your audit trail — pipe it to your logging stack and you have traces with zero extra instrumentation. The agent's filesystem adds a second, underrated layer: its notes, drafts, and intermediate outputs persist after the run ends. Debugging often starts by just reading the workspace, the same way you'd look over a colleague's shoulder.

Execute code without trusting it

The gap between "chatbot with function calling" and "agent that does things" is arbitrary code execution. It's also the scariest capability to ship. An agent that can run shell commands can also run them maliciously the moment someone poisons its input with a hostile webpage or email.

The answer is the boundary, not the model's good behavior. Herm runs every agent inside its own Docker sandbox: full shell, full filesystem, real network — and nothing on the other side of the container wall. Combined with credential references, a prompt-injected agent has remarkably little to steal: it can't reach another customer's container, and it can't exfiltrate secrets it never possessed.

Plug into everything else

An agent that only knows your hand-wired tools is capped at what you anticipated. Herm deployments take integrations in three forms: your own MCP servers (your product's capabilities — for us, that's generate_video, search_models, and friends), connectors to the services your customers already live in, and skills — markdown playbooks loaded from files, URLs, or inline — that teach the agent your domain without a single line of glue code.

The common thread is open standards. MCP for tools means anything in the ecosystem works with your agent on day one, and nothing about how your agent works is hidden from you.

Work while nobody's watching

Reactive agents answer messages. The valuable ones also act on a schedule. "Every week, look at our top-performing influencer video and make five variations" is the canonical Herm example for a reason — it's the moment an agent stops being a chat feature and starts being an employee.

Automations in Herm are first-class: customers describe recurring work in plain language, and the runtime handles the scheduling, with each scheduled run getting the same durability, isolation, and steering as an interactive one. Hermes's dreaming fills the gaps between jobs — idle-time processing that consolidates what the agent learned, so tomorrow's runs start smarter than today's. And for goals that outlive any single run, the persistent goal loop keeps the agent pointed at an objective across sessions until it's done.

The runtime is the moat you shouldn't build

Add it up: durable execution, two-tier memory, per-customer isolation, credential management, steering, streaming, observability, sandboxing, open integrations, scheduling. None of it differentiates your product. All of it is mandatory. Building it yourself is a quarter of engineering before your agent does anything a customer would pay for — and the treadmill doesn't stop, because the next harness feature everyone wants will assume runtime capabilities you haven't built yet.

Our bet with Herm is that this entire layer should arrive with the agent, in the same API call:

POST /v1/deployments

Bring the things only you can bring — your system prompt, your skills, your tools, your connectors. The runtime is our job.

If you're putting an agent in front of customers and would rather not build any of this, email me: alex [at] prismvideos [dot] com. Or book a demo and we'll deploy one together.

Related Articles