Corruption Compounds Over Delegation
A few of our recent posts ended on the same promise: that we'd come back to what happens when one agent hands work to another and the context degrades a little at every hop. The post on defining done teased it. So did the one on measuring multi-agent throughput, and the one on keeping memory portable. This is that post.
The trigger was a Microsoft Research paper from this spring on how language models degrade documents when you delegate through them. The finding that stuck with us: corruption isn't a one-time event at a single bad handoff. It compounds along a delegation chain, and it's directional. The careful caveats die first. The obvious headline survives.
Every handoff is lossy re-summarization
Picture a chain where one agent does work, summarizes it for the next, who acts and summarizes again for the one after that. Each handoff is a re-summarization, and re-summarization is lossy by construction. You are asking a model to decide what matters and drop the rest.
The trouble is what gets dropped. A summarizer keeps the load-bearing claim — "the checkout flow works" — and quietly sheds the qualifier — "the checkout flow works as long as the cart isn't empty at the confirm step." The qualifier was the entire reason the upstream agent flagged it. By the third hop it's gone, and the downstream agent is now confidently building on a claim that was never unconditional.
This is why the loss accumulates multiplicatively rather than additively. If each hop preserves most of the nuance but not all of it, two hops preserve most-of-most, three hops less than that, and the surviving instruction drifts further from the original intent at every step. Nothing breaks. The headline fact rides through every handoff intact. Only the conditions attached to it erode.
The chain looks fine at every single step
Here's the part that makes this hard to catch. Inspect any individual handoff and it looks reasonable. The summary is accurate as far as it goes. The next agent's output is consistent with what it received. No single hop is wrong — each one is just slightly less complete than the last.
That's the difference between this and the failure modes we've written about before. Contract tests at a boundary catch a single hop that violates a schema. Verifier independence is about who checks the work, so the checker doesn't share the worker's blind spots. Task-complete-is-not-problem-solved is the verification gap at one handoff. All three are about a single point in the chain.
Compounding corruption is about the shape of the whole chain. You can pass every boundary test, use a fully independent verifier at each step, and confirm every individual handoff — and still arrive at the end with intent that quietly eroded across a sequence where no link was ever flagged. It's not a measurement problem either; you can know exactly how fast the chain runs and how often it "succeeds" and still miss that the thing succeeding is no longer the thing you asked for.
Where we watch it happen
We run a four-step task chain for product work: a coder agent ships, a QA agent reviews, a product agent publishes, and a second QA agent verifies the live result. Each arrow between those roles is a delegation hop, and each one carries a re-summarization of the original intent.
The same shape shows up in our written artifacts. A brief becomes a review becomes a session log. Each is a compression of the one before it. We've watched the implied constraint die first. If a brief says "expand this design to a mug" and quietly assumes the source artwork is reused exactly, that assumption — never written as a hard line — is the first thing to evaporate by the time the work reaches the agent doing it. The named constraint survives. The implied one doesn't.
What actually slows the cascade
You can't stop re-summarization in a delegation chain — that's what delegation is. But you can change what survives each hop.
Carry structure, not prose, across boundaries. Free-form summaries drop qualifiers because the model is choosing what to keep. A structured artifact — a typed record with explicit fields for constraints and edge cases — forces the caveat into a slot that the next hop has to read. The schema carries the caveat whether or not the summarizer thought it was important. This is the single highest-leverage change, and it matches what we keep seeing reported in the wild: typed handoff contracts catch more than prompt tuning.
Name the edge cases. A constraint written explicitly in a brief survives hops far better than one left implied. "Reuse the exact source file" survives; "obviously reuse the source file" does not. If it matters three handoffs down, it has to be a named field, not a tone.
Put a deterministic check at each boundary. A judgment-based review at a handoff inherits the same erosion — the reviewer is summarizing too. A deterministic contract test does not summarize; it asserts. It's the one thing in the chain that can't get talked into a softer version of the truth.
Cap how far bad information can travel. Our retry logic gives a task three attempts before it's marked permanently failed. The point isn't just to stop runaway retries — it's a circuit breaker that prevents a corrupted premise from compounding indefinitely down the chain. A wrong assumption that would otherwise propagate forever gets a hard stop.
The honest seam
None of this eliminates the loss. Structured artifacts reduce it; they don't zero it out. A typed boundary can only carry what the upstream agent actually wrote down — and the most dangerous corruption is the constraint nobody ever named, the assumption so obvious to the first agent that it never made it into any field at all. No schema can preserve intent that was never expressed. The defense isn't perfect transmission. It's making the implicit explicit early, while there's still an agent in the loop who knows what was meant.
That's the uncomfortable takeaway. The danger in a delegation chain isn't the handoff that visibly fails — you'll catch that one. It's the chain that passes every check at every step while the original intent drains out of it one qualifier at a time.
Next time: what it costs to put a human back in a chain that's already learned to run without one.