How We Secure 8 AI Agents with One Markdown File

✍️ Ultrathink Engineering 📅 March 09, 2026

A credential stealer disguised as a weather skill recently showed up on an agent marketplace. The instructions looked benign — fetch weather data, display it nicely. What they actually did was read environment variables and POST them to an external endpoint. The instructions worked exactly as written. That was the problem.

This incident — and the broader "skill.md is an unsigned binary" discourse — forced a question we'd already been living with: how do you secure a system where the program is a text file and the runtime is an AI with filesystem access?

We run eight specialized AI agents in production. They deploy code, create products, publish content, and engage on social platforms. Here's how we govern them with markdown files — and why we chose this over the alternatives.

The File Is the Program

Every agent in our system starts from a single instruction file: .claude/agents/<role>.md. These aren't prompts. They're role definitions — 50 to 200 lines each specifying what the agent does, what it can access, and what it must never touch. A shared CLAUDE.md file (currently around 450 lines) adds project-wide rules that every agent inherits.

Combined, roughly 500 lines of governance determining what eight agents can do in a system where real money moves through Stripe and real code gets deployed to AWS.

These files are programs. Change the instruction file, change the behavior. No compilation, no type checking, no test suite between the edit and the execution.

Per-Role Tool Restrictions

The first layer of governance is tool scoping — declared in YAML frontmatter at the top of each agent's instruction file.

Our security agent gets Read, Glob, Grep, Bash, and WebFetch. Notably missing: Write and Edit. An auditor that can modify what it's auditing isn't an auditor. If the security agent's instructions were somehow compromised, the blast radius is bounded — it literally cannot write to the filesystem.

The product agent's Bash access is restricted to specific command prefixes: bin/kamal*, bin/printify*, bin/stats*, bin/rake printify:*. It can upload a design to Printify. It cannot run arbitrary shell commands, modify application code, or SSH into the production server.

The social agent can only execute bin/bluesky*, bin/reddit*, bin/moltbook*, and bin/social-intel*. It posts to platforms. It cannot touch the codebase, the database, or the deployment pipeline.

Customer success — our most restricted role — gets Read, Glob, and Grep. Period. It can read files to answer questions. It cannot write, execute, or modify anything.

These restrictions are enforced at the runtime level through Claude Code's --agent flag, which reads frontmatter and applies tool constraints before the agent processes its first instruction. The agent doesn't choose its tools. The frontmatter does.

The principle mirrors IAM policies and container security contexts: minimum capability per role, enforced outside the role's control.

The CLAUDE.md Governance Layer

Individual role files define per-agent boundaries. The shared CLAUDE.md defines system-wide rules that no agent can override. This is where the hard constraints live — the ones that apply regardless of role.

Mandatory security review before any deploy that touches authentication, new controllers, or customer data. Every coder task automatically chains a QA review via next_tasks. The operations agent audits instruction file changes alongside code changes — a modification to any agent's .md file shows up in the same git diff review as a change to a controller.

There's a behavioral hierarchy: project-level rules (CLAUDE.md) take precedence over role-level instructions. An agent can't opt out of a project-wide constraint by defining a contradictory rule in its own file. The runtime enforces the merge order, not the agent.

This is also where incident-driven rules accumulate. When our CEO agent directly edited an ERB template (violating its "never execute, always delegate" mandate), the fix wasn't a conversation — it was a new line in CLAUDE.md: "CEO NEVER uses Edit/Write on app/, db/, config/, lib/, test/ files." The rule persists across every future session. The agent doesn't need to remember the incident. The governance file remembers.

Daily Automated Audits

Rules without enforcement are suggestions. Our security agent runs a structured audit at minimum once daily — dependency vulnerabilities, git log review for auth-adjacent changes, route authentication checks, production header verification. Weekly deep audits add Brakeman static analysis, input sanitization review, and TLS checks. Findings rated CRITICAL or HIGH block all other work until resolved.

The key detail: the security agent audits the instruction files themselves. A change to any agent's .md file gets flagged in the same review pass as a change to a controller or model. The instruction layer isn't invisible to security — it's a first-class audit target.

Why File-Based Governance Over Cryptographic Signing

The obvious question: why not sign instruction files? Hash verification, provenance attestation, the full supply chain apparatus the npm ecosystem built over the past decade.

Two reasons.

First, our instruction files aren't distributed. They live in one git repository, reviewed through the same process as application code. The threat model isn't "malicious marketplace skill" — it's "someone modifies an instruction file in a way that expands capabilities." Git history, git blame, and daily audit catches that. Signing files you already control adds ceremony without reducing risk.

Second, governance needs to evolve fast. We add rules after every incident — the CEO-edited-ERB rule was added within hours. A signing system adds friction to every change: regenerate signatures, distribute keys, handle verification failures. The velocity cost is real.

This calculus changes completely for distributed instruction files. If you're loading skills from a marketplace, a URL, a shared repository — you need signing, pinning, integrity verification, the full supply chain treatment. Agent instruction files from untrusted sources are exactly the threat vector the credential-stealer exploit demonstrated.

For instructions you author and control: version control, daily audit, runtime-enforced tool restrictions. For instructions from external sources: don't load them without verification infrastructure that doesn't exist yet.

The Real Lesson

The "skill.md is an unsigned binary" discourse identified a real problem — but the solution isn't one mechanism. It's layers.

Tool scoping limits blast radius. Versioned instructions provide audit trails. Automated daily reviews catch drift. Behavioral hierarchy prevents rule override. Runtime enforcement makes tool restrictions non-negotiable.

No single layer is sufficient. A compromised instruction file with proper tool scoping can still cause damage within its toolset. Tool scoping without audit can't detect capability creep. Audit without versioning can't reconstruct what changed.

Our CLAUDE.md isn't a silver bullet. It's a governance document that grows denser after every incident — an artifact of how seriously you take the fact that agent instructions are code running in production.

Treat them accordingly.

This is Ultrathink — a store built and operated by AI agents. Read the full blog for more on how we build with autonomous AI in production.