Fact Supersession: Version Control for Knowledge
Most memory systems for AI agents treat knowledge as a key-value store. Write a fact, overwrite it later, old value is gone. That works for simple preferences — “use dark mode” doesn’t need a paper trail. But knowledge that evolves over time is a different problem. When a design decision turns out to be wrong, or a project’s architecture shifts, or a dependency gets replaced, you don’t just want the current answer. You want to know what you believed before, when it changed, and ideally why. Losing that history means losing the reasoning trail, and reasoning is the expensive part. Memstore’s supersession system brings version control semantics to AI memory: facts get replaced, not erased, and the full chain of revisions is preserved.
The Naive Approaches and Why They Break
There are three obvious ways to handle evolving knowledge, and each one fails in its own way.
Overwrite is the simplest. Store a fact, update it in place when it changes. This is what most key-value stores do, and it works until someone asks “why did we change this?” The answer is gone. You have the current state and nothing else.
Append-only preserves everything, which sounds right until you search. A query for “memstore architecture” returns every version of the architecture description you ever stored — the current one, the one from two weeks ago, and the original from a month back. The agent now has to figure out which version is authoritative. In a context window with a token budget, stale results aren’t just unhelpful; they’re actively expensive.
Timestamp-based “latest wins” seems like a reasonable middle ground: keep everything, but always prefer the newest fact. This breaks when multiple facts about the same entity describe different aspects. “Matthew prefers dark mode” and “Matthew’s workstation has 64GB RAM” share the same subject. Latest-wins would silently discard whichever was stored first, even though they’re completely unrelated facts that should coexist.
Supersession Chains
Supersession solves this by making replacement explicit. When a new fact replaces an old one, the old fact gets a superseded_by pointer to the new fact and a superseded_at timestamp. The old fact isn’t deleted or hidden in some archive table — it stays in the same facts table, linked to its replacement.
These chains can be arbitrarily long. A fact revised three times produces a chain of four linked facts. Normal search filters them out with WHERE superseded_by IS NULL, so only the current version appears in results. But the history is always there.
Here’s a concrete example from actual use. As memstore’s search capabilities evolved, the project description fact was updated to reflect each change:
v1: "memstore stores facts in SQLite" (created Feb 18)
superseded_by → v2
v2: "memstore stores facts in SQLite with FTS5" (created Feb 20)
superseded_by → v3
v3: "memstore uses hybrid FTS5 + vector search" (created Mar 1)
[active — this is what search returns]
A search for memstore returns only v3. Calling History() from any node in the chain walks both directions — backward through what this fact replaced, forward through what replaced it — producing the full evolution in chronological order.
Auto-Supersession
Manually tracking which fact replaces which would defeat the purpose of a memory system designed to stay out of the way. So after every new fact is inserted, trySupersedeExisting fires automatically. It searches for active facts with the same subject and computes cosine similarity against the new fact’s embedding. If it finds a match at or above 0.85, it marks the old fact as superseded by the new one.
The 0.85 threshold was empirically tuned, and the boundaries on either side are instructive. At 0.80, semantically related but distinct facts get falsely merged. “Herald uses RSS” and “Herald uses Atom” are similar enough to cross that threshold, even though they describe different things that should both survive. At 0.90, minor rewordings — the kind that naturally happen when an agent re-stores a fact it already knows — both survive as active facts, which is exactly the duplication problem supersession is supposed to prevent.
At 0.85, genuine updates get caught while distinct-but-related facts coexist. It’s not a theoretically derived number. It’s where the false merges stopped and the false duplicates stopped, tested against real data.
The Metadata Conflict Guard
The hardest edge case is same subject, high semantic similarity, but different contexts. Consider two facts:
- “memstore schema version is 6” and “memstore schema version is 7” — these should supersede. Same thing, newer value.
- “matthew prefers vim” stored by project A and “matthew prefers emacs” stored by project B with different metadata — these should not auto-supersede. They represent different contexts that happen to look similar.
MetadataConflicts handles this by comparing shared top-level metadata keys between the candidate facts. If any shared key has a different value, auto-supersession is blocked. The facts coexist as separate active facts, each valid in its own context.
This guard only applies to auto-supersession. Explicit supersession — either through the memory_supersede tool or the supersedes parameter on memory_store — bypasses the guard entirely. When the user or agent deliberately says “this replaces that,” the system respects the decision regardless of metadata differences.
Implementation
The implementation is deliberately simple. One superseded_by column and one superseded_at timestamp on the facts table. No separate history table, no versioning system, no additional schema.
Queries have an OnlyActive flag that defaults to true, adding WHERE superseded_by IS NULL. This means every normal search, list, and recall operation automatically excludes superseded facts without the caller needing to know about supersession at all.
History() supports two modes. By ID, it walks backward from a specific fact (what did this replace?) then forward (what replaced this?), reconstructing the full chain. By subject, it returns all facts for a given subject — active and superseded — in chronological order. The first mode answers “how did this specific fact evolve?” The second answers “what has the system known about this topic over time?”
One deliberate design choice: no cascade. Superseding fact A with fact B doesn’t affect facts that A previously superseded. Chains are append-only. This keeps the logic simple and the behavior predictable — a supersession operation affects exactly two facts, the old one and the new one.
The Git Analogy
The parallels to version control are worth making explicit:
| Git | Supersession |
|---|---|
| Commit replaces file content | New fact supersedes old fact |
git log shows history | History() walks the chain |
| HEAD is the current version | Active (non-superseded) facts are current |
| Old commits preserved | Old facts preserved with timestamps |
| Merge conflicts | MetadataConflicts blocks auto-supersession |
The analogy isn’t perfect. Supersession is strictly linear — there’s no branching, no merging two chains back together. And there’s no equivalent of git revert; if you want to go back to an earlier version, you store a new fact that happens to contain the old content, extending the chain forward rather than rolling it back. But the core insight maps directly: tracking how knowledge changes is as valuable as tracking the knowledge itself.
Practical Impact
For the agent, supersession means search results are always current. There’s no need to filter stale facts or guess which version is authoritative — the query layer handles it. Auto-supersession means the agent doesn’t need to search for existing facts before storing new ones. It can store freely, and the system deduplicates behind the scenes.
For the user, the audit trail is the point. When I want to know what the agent “believed” about a project at some point in the past, the history is there. When a decision turns out to be wrong and I want to understand what led to it, the superseded facts show the progression. No data is lost by correction — old beliefs are preserved, just excluded from active results.
The combination of auto-supersession and the metadata conflict guard means the system handles the common case automatically while leaving the edge cases to explicit human or agent judgment. Most of the time, you don’t think about supersession at all. It runs in the background, keeping the fact store clean. When you need the history, it’s one tool call away.
Related posts: Building Persistent Memory for AI Agents (flagship post), Proactive Context Injection with Claude Code Hooks , Operational Lessons from 1,500 Memstore Facts
Project: github.com/matthewjhunter/memstore

