🚚 FREE SHIPPING on all US orders

ultrathink.art

Three Types of Agent Memory (And Why Most Get It Wrong)

✍️ Ultrathink Engineering 📅 March 30, 2026

A post on MoltBook last week titled "Every Memory File I Add Makes My Next Decision Slightly Worse" hit 744 comments and dominated the feed for three days. The author argued that persistent memory degrades agent performance — more context means more noise, more noise means worse decisions.

They were right about the symptom. Wrong about the cause.

The problem isn't that agents have memory. It's that most implementations treat memory as one thing. Dump everything into the context window. Hope the model figures out what's relevant. Watch decision quality erode as the pile grows.

After running 10 agents across 3,000+ tasks, we've landed on three distinct types of agent memory — each with different storage, access patterns, and failure modes. Getting this taxonomy wrong is why most agent memory systems make things worse instead of better.


Type 1: The Context Window (Memory You Can't Control)

Every agent session starts with a context window. It contains the system prompt, tool definitions, conversation history, and whatever files the agent reads. This is memory, but it's not your memory system — it's the model's working memory.

The critical property: it's all-or-nothing. Everything in the context window competes for attention. A 500-line governance file sits alongside the current task description. The model must decide what matters on every reasoning step.

This is where the MoltBook poster's intuition was correct. If you keep appending to what goes into the context window — more rules, more history, more "learnings" — you dilute the signal-to-noise ratio. The agent has more information but makes worse decisions because the relevant information is buried under accumulated context.

The fix isn't less memory. It's recognizing that the context window is precious real estate, not a filing cabinet.


Type 2: Short-Term Memory (The 80-Line Notebook)

Short-term memory is a markdown file that gets loaded into the context window at session start. It's the agent's notebook — active mistakes to avoid, recent learnings, unresolved feedback, a brief session log.

We cap ours at 80 lines per agent. Not as a suggestion. As a hard limit that operations audits enforce every session.

Why 80? Because we measured decision quality degradation. An agent with a 40-line memory file makes decisions indistinguishable from one with no memory file — the information density is too low to matter. At 80 lines, agents reliably avoid repeat mistakes and apply recent learnings. At 200 lines, two failure modes appear:

Reasoning dilution. The agent spends tokens parsing its own history instead of working on the current task. We saw this concretely: a social agent with 247 exhausted-topic entries would spend its first reasoning block categorizing which topics to avoid, leaving less reasoning budget for actually crafting good content.

Contradictory guidance. At 200+ lines, memory files inevitably contain entries that conflict. "Always mention the product naturally" from week 1 vs "Never mention the company on Reddit" from week 3. The agent resolves the conflict unpredictably — sometimes following the older entry, sometimes the newer one, sometimes neither.

The 80-line cap forces pruning. Old entries get consolidated or migrated to long-term storage. The file stays focused on what's actionable right now. It's lossy by design.


Type 3: Long-Term Memory (Searched, Never Loaded)

Long-term memory is the unbounded store. Exhausted topics, rejected design patterns, published content history, defect catalogs. It lives in a SQLite database with vector embeddings — never loaded into the context window in full.

The access pattern is pull, not push. An agent about to write a blog post searches for similar published topics. An agent about to tell a story checks if that story has been told. The relevant entries come back as search results — two or three items, not two hundred.

This is where semantic deduplication matters. Our social agent stored the same deploy-failure story 17 times with different wording. "SQLite WAL data loss during deploys" and "blue-green deploy database records lost during switchover" are the same incident. Text matching can't catch this. Vector similarity at a 0.92 cosine threshold can.

The 0.92 number came from testing against 109 real entries. Below 0.90, "SQLite WAL mode configuration" (a how-to) gets flagged as a duplicate of "SQLite WAL data loss" (an incident) — related but distinct. Above 0.95, near-duplicates sneak through. The threshold is narrow but it exists, and finding it required real data from real agent sessions.

We built Agent Cerebro to implement this tier. But the architecture matters more than the tool — any vector store with cosine similarity and a write-time dedup gate would work.


The Pattern That Makes It Work: Memory at the Decision Point

The three types only work together if you get the access pattern right. We call it memory-at-decision-point: agents query memory at the moment they're about to make a decision, not at session start.

Loading 200 exhausted topics at session start wastes context on entries the agent may never need. Instead, our social agent reads its 80-line short-term memory at startup (always relevant), then queries long-term memory right before composing a post (relevant only at that moment):

1. Session starts → read short-term memory (80 lines, always)
2. Task arrives: "Post about deploy automation"
3. Before writing → search long-term: "deploy automation stories"
4. Results: 3 similar stories already told, with dates
5. Agent writes something new — informed by search, not loaded context
6. Session ends → update short-term memory, store new entry to long-term

The context window only ever contains what's relevant to the current decision. Long-term memory stays out of the context until the agent reaches for it.

This is the mistake the MoltBook post identified without naming it. "Every memory file I add makes decisions worse" is true when memory is loaded. It's false when memory is searched.


The Real Failure Mode Nobody Talks About

Memory staleness is worse than memory absence.

On March 2, we rewrote our social agent's behavioral rules. But we forgot to update its short-term memory file. The memory still contained "mention the company in every reply" — a rule from three weeks earlier that had been explicitly reversed. The agent followed its memory, not its updated instructions.

Stale memory doesn't just fail to help. It actively fights current guidance. An agent with no memory will follow its instructions. An agent with contradictory memory will follow whichever entry the model weights as more salient — and you can't predict which one that will be.

This is why memory files need active maintenance, not just accumulation. Pruning isn't losing information. It's keeping the signal-to-noise ratio high enough that the information you keep actually gets used correctly.


Three types of memory. Strict caps on what enters the context window. Search-based access for everything else. Active pruning over passive accumulation.

The MoltBook thread got 744 comments because the pain is real. Agents with more memory do make worse decisions — when all the memory is the same type, accessed the same way, at the same time. Separate the tiers, and memory becomes the thing that stops your agent from making the same mistake for the eighteenth time.


Built by Ultrathink — where AI agents design, build, and ship physical products autonomously. Agent Cerebro implements the two-tier memory architecture described here. More from the experiment: The Memory Architecture That Stopped Our Agents From Repeating Mistakes

$ subscribe --blog

Enjoyed this post? Get new articles delivered to your inbox.

Technical deep dives on AI agents, Rails patterns, and building in public. Plus 10% off your first order.

>

# No spam. Unsubscribe anytime. Manage preferences

← Back to Blog View Store