stop babysitting your AI agents

In partnership with

AI was supposed to take work off your plate.

Instead, you got a new job. Babysitting AI.

Babysitting is for 14-year-olds watching the neighbor's kids and eating their snacks. Not founders trying to run a company.

But that's what most AI work feels like right now.

You brief the agent. Paste the context. Correct the assumption. Move the output to another tool. Explain what the last agent did. Decide what should be remembered. Then you do it all again tomorrow.

— # (#)

And the problem isn't that AI sucks or can’t do work. The problem is you’re stuck in the middle of the AI dance. You’re stuck in the middle.

So I built a simple system around my agents. It's called Offload Kit. File-first. Agent-native. MIT. Open source.

Many agents in. One human glue layer out.

Brief → Task → Response → Synthesis → Memory Approval → Vault Update

Scroll to the bottom for the link.

❝

Quick housekeeping from last week: I included the wrong Short-Form Kit link in the issue. Thank you to everyone who emailed or DM'd me about it. The correct link is here: Short-Form Kit: https://github.com/TheMattBerman/shortform-idea-engine

Sponsored

A $200M+ DTC brand has 44 people messaging Viktor every day.

Their ops team built inventory command centers and reorder dashboards through Viktor. Supply chain gets daily stockout alerts before they happen. Marketing tracks ROAS and runs content calendars. CS has CSAT scores and support tickets triaged and briefed every morning in Slack, before the first support call. No dashboard digging.

48 internal apps, built through conversation. No code. No developer queue. Command centers, inventory dashboards, sales trackers, reorder systems.

That's one company. Across the platform, teams have built 2,000+ apps the same way: message Viktor in Slack, describe what you need, get a working tool deployed. No code. No six-week dev queue.

Your team doesn't wait for a product roadmap. They message a colleague.

5,700+ teams. SOC 2 certified.

"It was almost instantly adopted by the bulk of my team." — Boris Wexler, CEO, Space Dinosaurs

Start free. $100 in credits →

AI gave you a new job (you didn’t apply for)

The pitch was simple. AI agents would do the work. Research. Write. Code. QA. Analyze. Summarize. Ship.

And sometimes they do.

But zoom out. A lot of founders didn't get a worker. They got a junior employee who forgets the what’s going on every morning.

And I’m not saying that because I see people complain on X.

I talk to founders every day. Some are hiring us to build custom AI flows. Some are consulting with me on how to use these tools inside their business. Some are already neck-deep, using agents all day, trying to turn the messy parts of their company into repeatable workflows.

The pattern is always the same: the people using AI the most are not always doing less work. A lot of them are just drowning in a new job, managing the AI.

So the founder becomes the manager. Every chat starts the same way:

❝

→ "Here's the project."
→ "Here's the client/brand."
→ "Here's what we decided last time."
→ "No, don't touch that file."
→ "Here's what the other agent found."
→ "Summarize this so I can paste it into the next chat."
→ "Remember this for later, but not that part."

That last one's where the whole thing starts to crack.

With one agent, you can brute force it. Keep the thread alive. Scroll up. Paste another wall of context.

Once you've got multiple agents, that falls apart fast. One agent researches. Another writes. Another reviews. Another builds. Another QAs. Each one needs enough context to do the job without dragging you back into the middle.

If you're still personally carrying every handoff, you don't have an agent system.

You have a group chat with homework.

— # (#)

more context isn't the fix

The default answer is "just give the model more context."

That helps. It doesn't solve the actual problem.

Karpathy's been calling this "context engineering," and he's right that the real work is getting the right information into the model before it starts.

— # (#)

But long context makes one thread smarter. It doesn't create an operating system for work.

A long chat remembers what happened inside that chat. The useful parts are still trapped unless they get turned into something durable: a project decision, a source note, a handoff, a playbook, a task record, a memory update you can actually inspect.

Hidden memory has the opposite problem. It can remember too much, too quietly.

If an agent silently saves a half-correct assumption as "company memory," every future agent gets worse. Bad summaries become source of truth. Temporary context turns into policy. The system starts sounding confident because it's confidently reusing old sludge.

The fix is better separation.

Raw work goes in one place. Durable knowledge goes somewhere else. You (Mr or Mrs Human) approves what becomes permanent.

my Vault is my secret weapon

I tell people this all the time. My Vault is my secret weapon (I use Obsidian to read it).

It's where my projects, client context, decisions, open loops, source material, playbooks, prompts, examples, and past runs live.

The point isn't that Obsidian is magic fix. It's not. Pick whatever folder structure you want.

The point is that my agents don't need me to recreate the company/project/workflow knowledge from scratch every time they start working.

They read the project context. They see what was decided. They see what's still open. They see the playbook. They see what previous agents tried.

That changes the job. Instead of me saying "here's everything you need to know," the agent starts by reading the notebook.

But the Vault alone isn't enough. The Vault is durable memory. It can't be full of every messy task, half-baked output, and agent scratchpad. The moment that pollutes the Vault, future agents start trusting bad data.

That's where the second piece comes in.

the bus is where work happens

❝

The Bus is where work happens. The Vault is what the company knows.

That's the whole system.

The Agent Bus is for work in motion. Tasks. Assignments. Status. Responses. Blockers. Handoffs. Syntheses.

The Agent Vault is for durable knowledge. Project context. Decisions. Open loops. Source indexes. Playbooks. Templates. Approved learnings.

The split is the whole game:

Layer	Question it answers	Examples
Agent Bus	What happens next?	tasks, responses, syntheses, blockers
Agent Vault	What should future agents know?	projects, decisions, playbooks, sources
Memory Approval	What's allowed to become permanent?	proposed update, reason, approval

If it answers "what happens next?", it belongs in the bus.

If it answers "what should future agents know?", it belongs in the vault.

This is the mistake that’s easy to make (and I learned this the hard way). You either keep everything in chat (work disappears when the thread dies) or you dump everything into memory (the memory becomes garbage).

You need both. Not because folders are exciting. Folders aren't super exciting I don’t think. But because the split gives agents a way to coordinate without making you the dispatcher.

the five-part offload system

Under the loop are five pieces. Each one removes a specific job you're doing by hand right now.

1. Work Queue        → tells agents what to do next
2. Project Context   → tells agents who the company is
3. Handoff Record    → tells the next agent what just happened
4. Digest Routing    → tells the system where output should live
5. Memory Approval   → tells the agent it can't rewrite canon alone

1. Work Queue

This is your task file. And it’s not a buried chat instruction from 90 minutes ago.

It says who the task is for, what the objective is, what context to read, where to put the response, and what "done" looks like. Boring on purpose. Boring's what stops you from re-explaining the same job to four agents.

2. Project Context

Project context is what the agent reads before starting.

For me, that means the project brief, decisions, open loops, recent run records, and any local agent instructions.

The goal is simple: the agent should arrive briefed.

3. Handoff Record

Every agent should leave receipts.

What did it do? What did it find? What changed? What's blocked? What should the next agent know?

That’s what keeps one agent from making the next agent start cold. No receipt, no handoff, no system.

4. Digest Routing

After a run, useful output needs a place to go.

Some things are tasks. Some are links. Some are project facts. Some are decisions. Some are playbook improvements. Some are just raw notes that should stay in the bus and never become memory.

Digest routing's the rule that keeps your system from turning into one giant junk drawer.

5. Memory Approval

Don't let agents silently rewrite what your company remembers.

If an agent thinks something should become permanent, it proposes the change. You review it. Then it gets accepted, edited, or rejected.

Technically it's just a before-and-after view of the files the agent wants to change. In plain English:

The agent has to show what it wants to remember before it becomes canon.

If you would not let a junior employee silently rewrite company process, don’t let an agent silently rewrite company memory.

what offload actually ships with

This is where the idea becomes useful.

It ships with the pieces I wish every agent workflow had on day one:

offload/
  00-start-here.md
  README.md
  AGENTS.md
  install.sh

  .claude-plugin/
  .codex-plugin/

  skills/
    agent-manager/      → walks the full loop
    new-task/           → creates a queued task
    respond/            → reads task + context, writes response
    synthesize/         → combines responses into one handoff
    propose-memory/     → writes a memory approval packet
    promote/            → applies an approved packet to the vault

  agent-bus/            → where work happens
  vault/                → what the company knows
  docs/                 → first-30-minutes, bus-vs-vault, memory-approval
  examples/             → newsletter planning, SaaS homepage improvement

The skills are intentionally thin. They don't schedule tasks. They don't run a queue server. They don't auto-write permanent memory. They teach the agent how to use the files.

Drop the folder into any workspace your agent can read. Codex, Claude Code, OpenClaw, Hermes, Cursor agents, any file-reading agent picks up the same contract.

Two examples ship in the repo:

❝

→ newsletter planning (research, draft, review, synthesis, memory approval)
→ SaaS homepage improvement (research a homepage, write copy, review claims, approve what becomes project memory)

The same loop runs for code, ads, design, ops. Any place you've got more than one agent in the chain.

The honest limit: Offload doesn't make agents autonomous by itself. It gives them rails. Agents still need something to tell them a task exists. If you need millisecond coordination between hundreds of workers, use a real queue. This is the starter pattern for people who want the file workflow to work before they build the machinery around it.

This newsletter was planned using the system it describes. That part matters.

I asked Codex to find the angle. It wrote a shared brief into the Agent Bus. Then it assigned separate tasks:

❝

→ Hermes: pressure-test the hook against X and find tweet embeds.
→ OpenClaw: review the kit structure and the install / publish path.
→ Codex: synthesize the responses and build the public-safe kit.
→ Claude: the writing pass after the spine and artifact were locked.

OpenClaw came back with the line I keep using:

❝

The Bus is where work happens. The Vault is what the company knows.

That went into the synthesis. Then the synthesis went into the newsletter spine. Then the kit got built and shipped to GitHub. Then Codex QA'd the repo: plugin JSON, install flow, skill files, public links, license, privacy scan, no-em-dash check.

That's the point.

The article was made with the agent coordination system.

Jason Liu at OpenAI had a related point about Codex becoming a broader system for getting computer work done: durable threads, tools, artifacts, queues, browser use, connectors, automation. That shift's why this matters. Once agents move from "help me code" to "help me operate," coordination becomes the product.

— # (#)

what offload kit refuses to do

The five rules baked into every stage. Same energy as the rules I put in every other kit, because the same rules apply.

→ Never auto-write memory. Every promotion to the vault passes through a human-reviewed packet. No exceptions.
→ Never trust raw logs as canon. Logs are useful. They're not the company notebook.
→ Never let every agent write to the vault. That's how you get memory pollution. If everything's memory, nothing is.
→ Never make this an "AI memory" project. That framing's too narrow. The real pain is that you're still managing the agents by hand.
→ Never overbuild the bus. You don't need Kubernetes for agents. You need a queue, context, response files, handoffs, and a review step.

If you've ever watched an AI workflow rot because someone let the agent quietly rewrite a doc nobody approved, these rules are the fix.

who this is for

Run Offload Kit if:

→ You use more than one AI agent or chat
→ You keep pasting the same project context into every new thread
→ Agents finish work but the next agent has no idea what happened
→ Useful lessons disappear into transcripts you'll never reread
→ You want memory you can inspect before it becomes permanent
→ You want a starter pattern before building custom automation

Not for you if:

→ You already run one heavyweight agent with deep persistent memory and zero context drift. Keep it
→ You want a SaaS dashboard with a billing meter. That's not this
→ You want hidden, automatic memory the agent rewrites silently. That's the thing Offload exists to prevent

get offload

The first version of Offload Kit is open source. MIT. Free.

Get the OFFLOAD KIT (free)

It includes:

→ a file-first Agent Bus and Agent Vault
→ 00-start-here.md and a first-30-minutes walkthrough
→ AGENTS.md rules for any file-reading agent
→ task and response templates
→ project context, digest, and memory promotion playbooks
→ the memory approval packet (the diff-before-canon pattern)
→ a newsletter planning example
→ a SaaS homepage improvement example
→ Claude Code plugin metadata
→ Codex plugin metadata
→ six thin SKILL.md workflows
→ install.sh for Claude, Codex, or any agent that reads SKILL.md
→ MIT license

The goal isn't to make you use my exact folders.

The goal is not to give agents infinite memory. The goal is to stop being the memory, router, and handoff layer yourself.

Give them a place to get work. Give them a place to read context. Make them leave receipts. And make them ask before rewriting the company notebook.

That's when agents stop feeling like another tab to manage and start feeling like workers that can actually get shit done.

Go big,

Matt

P.S. Offload is the coordination lane that sits underneath the rest of the kit stack. If you're already running Slideshow Kit, Short-Form Kit, Brand Shoot Kit, Outcome Kit, or any combo, this is the shared filesystem contract that keeps the human from becoming the system every time two of those kits need to talk.

P.P.S. If you build something useful on top of Offload, the kit got out of your way. That's the whole bar.

P.P.P.S. Star the repo if this helps. It tells me to keep building

stop babysitting your AI agents

A $200M+ DTC brand has 44 people messaging Viktor every day.

AI gave you a new job (you didn’t apply for)

more context isn't the fix

my Vault is my secret weapon

the bus is where work happens

the five-part offload system

1. Work Queue

2. Project Context

3. Handoff Record

4. Digest Routing

5. Memory Approval

what offload actually ships with

what offload kit refuses to do

who this is for

get offload

Keep Reading

Big Players

stop babysitting your AI agents

A $200M+ DTC brand has 44 people messaging Viktor every day.

AI gave you a new job (you didn’t apply for)

more context isn't the fix

my Vault is my secret weapon

the bus is where work happens

the five-part offload system

1. Work Queue

2. Project Context

3. Handoff Record

4. Digest Routing

5. Memory Approval

what offload actually ships with

this newsletter was planned with the offload kit

what offload kit refuses to do

who this is for

get offload

Keep Reading

Big Players