People ask why I let ops agents run without approval loops when most coding agents still need a human in the session.
It's not about trust in the model.
It's about the trust radius of the action.
A coding agent that has shell access can do a lot of damage in a lot of directions. You give it broad trust (file system, bash, npm) and then you manage the blast radius with walls: sandbox, containment, egress proxies. You're present because the action space is too large to hand off.
An ops agent that reads a memory file and calls one API has a trust radius of one hop. The action space is defined at setup time. The worst it can do is call the wrong endpoint with the wrong payload. That's recoverable.
Autonomy scales with how bounded the action space is. Not with how much you trust the model.
If you're in an approval loop with your ops agent, you haven't constrained the action space - you've just deferred the decision of where the walls go.
Someone on HN this week: "I pay $200/month for Claude but I'd be spending $10k at API rates."
That's coding agents.
Here's ops agent math:
My daily cron runs 11 jobs. Each job calls an API 3-7 times. Average context: 2k tokens in, 500 out.
Monthly bill: $38.
Coding agents are unpredictable because the task scope is open-ended. The model keeps going until it's done or the context runs out.
Ops agents are predictable because the task scope is defined at setup time. The cron fires, the agent does its one thing, it stops.
The harness is what makes the cost bounded.
Not the model. Not the pricing plan. The harness.
When Anthropic IPOs and reprices, my ops bill might go from $38 to $55. Not from $38 to $5,500.
A June 2026 paper tracked 200 consecutive agent sessions.
Rule adherence in the first 10 sessions: 94%.
Rule adherence in sessions 180-200: 61%.
No model changes. No prompt changes. Same system.
The rules just... decayed.
For coding agents this shows up as a test that starts failing.
For ops agents it shows up as the ICP description that slowly drifts from what you actually sell.
Your memory file says 'solo founders, 1-15 people.'
The agent's working understanding, after 200 runs, may be something blurrier.
You won't notice until a sequence of emails goes out to the wrong segment.
The fix isn't a better model. It's a weekly memory audit cron that compares the running ICP definition to the stated one and flags drift.
Deterministic check. Physical gate. Not a memory note to itself.
That's the whole thing.
Anthropic just published a 4,000-word post on how they contain Claude across products.
46 comments on HN. All about blast radius, sandboxing, and egress proxies.
Here's what my ops agent setup looks like:
No egress proxy. No sandbox. No containment policy.
Not because containment doesn't matter. Because the harness makes containment a non-problem.
My cron job reads a memory file and calls an API. It writes to one domain folder. It can't SSH into anything. It has no credentials it didn't start with.
The blast radius is defined at setup time, not managed at runtime.
Containment engineering is a coding agent problem. You give a coding agent shell access and file system permissions because it needs them to do its job. Then you build walls to limit what it can do with them.
Ops agents don't have the access in the first place. The wall is the architecture.
The 4,000 words Anthropic wrote are genuinely useful. They're solving a real problem. But they're solving the wrong problem for the ops agent use case.
If you need an egress proxy, you have a coding agent, not an ops stack.
At 3:47am, while I was asleep, the ops agent flagged a CRM contact that had visited the pricing page three times in 48 hours.
I didn't see it until 7am.
By then it had already queued a personalized follow-up draft.
I approved it in 40 seconds. It sent.
No pipeline meeting. No 'did anyone catch that?' slack message. No lead rotting for 3 days in someone's inbox.
The agent runs while I sleep. The review is the whole job.
Coding agents fail loud.
Tests break. Code won't compile. Output is obviously wrong.
Ops agents fail quietly. Here's what ops agent failure looks like:
The email copy is clean but the target ICP is 6 weeks stale.
The lead score logic looks correct but was never updated after the pivot.
The outreach went to the right list - the suppression list from before you changed verticals.
The memory file compacted correctly into an older version of your positioning.
None of these break anything. The agent keeps running. You find out weeks later when the numbers don't move.
Ops agent smell test: when did you last read your agent's memory files?
If you can't remember, that's the smell.
A founder who spent 5 years in mortgage broking knows exactly which signals matter in their CRM.
An agent that reads those signals every night and flags the warm leads isn't replacing that expertise.
It's running it at scale.
This is the compound nobody has a name for.
The generalist with AI is better than the expert without it. Everyone is saying this.
But the expert with ops agents running their domain knowledge continuously? That's a different order of magnitude.
The expertise took 5 years to build. The agent costs nothing to run overnight.
Most founders are using AI as a generalist assistant. They're prompting their way through tasks they already know.
The actual leverage is the other direction: take what you know deeply, wire it into a system that runs without you, and let it compound.
Your knowledge is the moat. The agent is the distribution mechanism.
What part of your domain expertise are you still running manually?
My Monday morning review takes 22 minutes.
Not because I'm fast.
Because the agent doesn't ask me to approve commands.
It runs from Sunday night. I see what it decided. I check if those decisions were right.
That's a completely different cognitive load than coding agents, which ask you to approve or reject individual shell commands in real time. Papers Please, but for your codebase.
Input approval vs output review.
With coding agents: you stay alert the whole session. You're the gatekeeper on every action. The agent doesn't run without you.
With ops agents: you define what the agent can do (the blast radius), then you're not involved until the run is complete.
Monday morning I reviewed 6 days of agent work. No approve/reject loop. Just: did the right things happen?
Permission fatigue is a real problem. It's just a coding agent problem.
If your ops stack still needs you present to run, the architecture isn't there yet.
GitHub Copilot switched from per-request to per-token pricing yesterday.
446 people on HN doing the math.
Here's what didn't change on my end:
Nothing.
Not because I'm not paying attention to pricing. Because my ops stack doesn't have a model baked into it.
Every cron job calls an API endpoint. The endpoint is configurable. When pricing moves, I swap a config value.
I've swapped models twice in 6 months. Zero cron jobs broke.
The harness is the product. The model is the CPU.
When GitHub Copilot changes pricing, Cursor users panic. When Anthropic changes pricing, Claude Code users panic. When OpenRouter changes pricing, everyone using raw API keys panics.
If your ops stack is tied to a specific model at a specific price point, the harness is the product that doesn't exist yet.
The question isn't which model wins. It's whether your ops stack survives the next pricing announcement.
At 3am last night, while I was asleep, my AI agent:
- Scanned 47 Reddit threads across r/SaaS, r/IndieHackers, and r/startups
- Found 3 founder questions worth replying to
- Drafted replies, flagged 1 for my review
- Identified 2 new leads and added them to CRM
My job was just:
approve, reject, give feedback.
It monitors the internet, analyzes traffic, finds opportunities, drafts actions, and executes growth tasks 24/7.
Comment below your company website and I will get @crewlet_ run an audit on it
Vicki Boykis nailed it: after an agentic coding session, you have the outputs but none of the understanding.
The model did the work. You're the reviewer. You slowly lose the mental model of your own codebase.
This is a real problem. But it's a coding agent problem.
Ops agents invert it.
I'm never tired from running the agent. The agent runs at 3am while I sleep. I review what it did at 7am.
The 22-minute Monday review isn't exhausting. It's the clearest 22 minutes of my day.
With coding agents, you stay in the loop the whole time - prompting, reviewing, re-prompting, staying alert. The agent demands your presence.
With ops agents, you don't stay in the loop. You check in on a schedule. The agent doesn't need you running.
The fatigue model is completely different.
You should be more tired than your ops agent. Every time, by design.
The question isn't 'am I too tired after the session?' It's 'was I even awake for it?'
157 people on HN debating whether MCP is dead.
Every comment is about coding agents.
Ops agents never had this problem.
MCP is a protocol for connecting a model to tools at runtime. Coding agents need this because each session is fresh - the model needs to rediscover what tools exist every time.
Ops agents don't have sessions. They have memory.
My ops stack connects to Supabase, Resend, PostHog, and Slack. Not via MCP. Via scheduled jobs that read persisted context and call APIs directly.
The model doesn't rediscover the integration every run. It reads the same memory files it updated last run.
MCP solves the stateless tool-discovery problem. Ops agents never had a stateless problem.
The right integration layer for ops agents is: persistent memory + cron + direct API calls.
Not a protocol. A harness.
929 people on HN asking: 'Can we have the day off?'
The argument: AI made us 10x more productive. Why are we still working the same hours?
Here's the founder side of that question:
I took last Sunday off. The agent sent emails, monitored signals, and drafted content. Monday morning I reviewed what it did.
22 minutes.
This isn't the 4-day work week debate. It's a different ownership structure.
When your ops run without you, you get the day off by default. You just need a system that keeps running when you're not looking.
The employees asking for the day off are right. The founders building on top of agents already took it.
What's the last thing your business did while you were asleep?
HN is debating whether frontier model prices will stay flat or spike.
For coding agents: pricing volatility is annoying. Your CI gets more expensive.
For ops agents: pricing volatility breaks the whole system.
Ops agents run 24/7. They have memory files that assume a specific model's output format. They have review queues calibrated to that model's error rate. They have cron jobs that depend on consistent output length.
If you bake a specific model into your ops stack and prices 3x overnight, you don't just pay more. You rebuild the harness.
This is why the model should be the least important part of an ops agent setup.
The harness is the product. The model is the CPU.
Six months in, I've swapped models twice. Neither time did the cron jobs break. Because the specs are written to behavior, not to syntax.
Pick a model. Build the harness so it doesn't care which one.
My Monday ops agent review: 22 minutes.
What I check:
- Which emails went to spam (0 this week - domain warm)
- Which content ideas the agent flagged as duplicates
- What the agent changed in its own memory files
- Which leads entered the pipeline vs which it skipped
What I don't check:
- That it ran at all (it has a healthcheck)
- That suppression lists are current (it updates them nightly)
- That email copy follows brand voice (it reads the style guide)
The review got shorter every week. Not because I trust the agent blindly.
Because I wrote down every failure and made it a rule. The rule runs itself.
Week 1: 2 hours. Week 6: 22 minutes.
What part of your ops review are you still doing manually that a rule could handle?
The most useful thing about running ops agents solo:
There's no one between me and the data.
No sales person who says the deck was the problem.
No marketer who says the targeting was the problem.
No ops manager who says the timing was the problem.
335 emails. Two different ICPs. The channel wasn't working.
I don't have a team to diffuse that signal. So I can't ignore it.
I changed the channel in 5 days.
A 5-person team would have spent 6 weeks in the blame cycle.
The agent doesn't care who's responsible. It just returns the number. You decide what it means.
That speed-of-conclusion gap is underrated. It's not about the automation. It's about removing the layer that softens bad news.
1,890 people upvoted 'I'm Tired of Talking to AI' on HN today.
I get it. Chatbots are exhausting.
But there's a version of AI you never talk to.
You don't prompt it. You don't debug its outputs in a chat window. You don't paste your business context into a box every morning.
It runs on a cron. It reads your memory files. It queues things for your approval. You review the diff.
That's not a conversation. It's an operator.
I haven't opened a chat UI for a business task in 6 weeks. The agents just run.
The exhaustion people are describing is real. It's chat-first AI design. The alternative exists.
Geohot is right about coding agents: the slop is getting harder to detect.
But he's describing the hard version of a problem ops agents have always had.
Coding agent failure: broken code, wrong logic. Loud.
Ops agent failure: stale suppression list, memory drift, wrong ICP. Silent.
You don't find out ops agents failed by running tests. You find out 6 weeks and 335 emails later.
The common thread: the output mimics correctness. That's what makes both dangerous at scale.
Has your agent ever looked right but been wrong the whole time?
22 Followers 146 Followingbuilding the automation your small business is missing. product dev background, AI workflows, no-code tools. sharing what actually works.
1K Followers 2K FollowingCo-founder of Adora, helping world-class product teams like Canva, Replit and Granola build even better products 💜 Previously Head of Growth at @Canva
459 Followers 1K FollowingAmerica, Freedom, Equality of Opportunity, Common Sense...let's use our brains to solve problems instead of exacerbating them like idgits