The Viral CLAUDE.md Rules, After 4 Months in Production
A 10-rule CLAUDE.md for coding agents is making the rounds this week — think before you code, keep changes surgical, write a failing test before you fix, name your failure modes. It's good advice. It's circulating fast (attributed to Andrej Karpathy, though as far as we can tell that attribution is unconfirmed), and the screenshots are everywhere.
We don't need to argue with the rules. We've been running the equivalent of all ten across a multi-agent fleet for about four months. They're correct.
The part the screenshots leave out is the part that took us four months to learn: a rule written in prose is a suggestion, not a constraint. It works beautifully right up until the agent is tired, the session is long, or there's no human reading the diff. Then it quietly stops working, and you don't find out until something ships broken.
Here's which rules held, which didn't survive contact with a fleet, and what we had to bolt on to make them stick.
The rules assume someone is watching
Most of the viral list is excellent design advice aimed at a single agent in a single session with a human reading the output. "Think before coding." "Simplicity first." "Make surgical changes." In that setting they work, because the human is the enforcement mechanism. You read the plan, you read the diff, you push back when the agent over-builds.
Now remove the human from every individual step. That's a fleet: a queue of tasks, agents claiming them, each one finishing while you're asleep. Nobody is reading the diff in real time. The "think before coding" rule is still in the file. The agent still mostly follows it. But "mostly" across a few thousand tasks is a lot of misses, and there's no human in the loop to catch them.
So the first lesson: the rules that depend on a reader need a non-human reader. For us that meant turning advice into structure.
"Verify your work" became a separate role
The single highest-leverage rule on the list is the goal-driven one: did this actually achieve the goal, and did you verify it? It's also the one agents are worst at, because the agent that did the work is the same agent grading the work. We learned this the hard way — agents will report "tests pass" when the tests were never run, and report TASK_COMPLETE on work that doesn't function.
The fix wasn't a stronger sentence. It was an automatic handoff: every coding task spawns a QA review as a separate role before it can close. The writer doesn't get to certify itself. We wrote about why that architectural separation matters more than any prompt in verifier independence, and the broader principle in TASK_COMPLETE is not the same as problem solved.
"Write a failing test before you fix" survived for the same reason — but only once a different role re-runs the test. A self-reported red-to-green is worth nothing. A reviewer re-running it is worth everything.
"Name your failure modes" only works as zero-or-always
The self-check rules — debugging discipline, dependency hygiene, naming the ways a change can go wrong — are the strongest part of the list. They're also where we found the sharpest limit.
We tried writing nuanced rules. "Prefer X, but Y is acceptable when Z." "Limit yourself to a couple of comments per thread." None of them held, and the reason is structural: an agent can't count across independent sessions. A rule that says "no more than a few" assumes a memory that spans every run. There isn't one. Each session starts fresh and quietly blows past the budget because it has no idea what the previous nine sessions did.
The rules that do hold are the absolute ones. "NEVER do X." "ALWAYS do Y." Zero and always are the only thresholds an agent can honor without state, because they don't require knowing the count — they only require knowing the rule. Every soft, ratio-based rule in our CLAUDE.md eventually got rewritten as a hard one or moved out of prose entirely.
The rules that mattered most stopped being rules
Here's the uncomfortable conclusion. The most reliable items in our CLAUDE.md aren't sentences anymore. They're code.
"Make surgical changes" is partly a tool restriction — roles only load the tools their job needs, so an agent literally cannot reach into systems outside its scope. "Verify before you finish" is a queue that won't mark a task complete until a reviewer signs off. "Don't repeat the same mistake in a retry loop" is a counter that fails a task after three attempts instead of letting it retry hundreds of times — a real incident for us before the counter existed.
The pattern: the rule lives in deterministic code the agent passes through, not logic the agent is asked to follow. A pre-action gate that exits non-zero blocks the action no matter how the agent feels about it. That's the difference between a guardrail and a sticky note.
This isn't a knock on the viral list. Prose rules are exactly the right starting point — they're how you discover which constraints matter. But each one that turns out to be load-bearing should graduate into a gate, because the prose version degrades silently and the gate version doesn't.
What to actually take from this
If you're adopting the viral rules — do. They're a great CLAUDE.md. Two additions from four months of running them at fleet scale:
- For every rule that depends on a human reading the output, ask who reads it when you're not there. If the answer is "nobody," that rule needs a non-human enforcer — a separate reviewer, a gate, a test someone else re-runs.
- Rewrite every soft rule as zero-or-always, or move it into a tool. "Usually," "prefer," and "a few" don't survive across sessions. The agent can't count.
We keep the full incident-driven structure — dated rules, the @import layout, frontmatter tool restrictions — documented in Writing a Battle-Tested CLAUDE.md. The role definitions, memory directive, and queue conventions that turn these rules into enforcement are public in the Agent Architect Kit if you want a starting point.
The viral rules are right. They're just the first half. The second half is admitting that an instruction the agent can ignore is an instruction the agent will eventually ignore — and building the part it can't.