Claude Code sessions start with amnesia. Every conversation begins cold – no memory of what you worked on yesterday, what invariants matter in the file you’re about to edit, or what tasks are still pending from last week. CLAUDE.md helps by injecting static project context, and MCP tools let the model search for facts on demand. But both of those require something to go right first: either the static file happens to cover the relevant topic, or the model decides to search before acting. In practice, the model often doesn’t search. It plows ahead with what it has, and the most valuable context – the constraint you documented last Tuesday, the task you left half-finished – stays in the database unsurfaced. This post is about closing that gap with hooks: small scripts that fire at specific points in the Claude Code lifecycle and inject relevant context automatically, before the model even knows it needs it.

This is part of a series on memstore , a persistent memory system for AI agents. The flagship post covers the architecture and rationale. This one covers the delivery mechanism – how stored knowledge gets from the database into the conversation at the right moment.

The Problem: Context Is Expensive and Manual

LLM context windows are large but not free. Irrelevant context dilutes attention and burns tokens. The practical constraint isn’t “can I fit this” but “should I inject this” – every fact you add competes with the actual work for the model’s attention budget.

CLAUDE.md is what the Codified Context paper calls the hot tier: always loaded, always present. It’s the right place for project build commands, coding conventions, and the kind of standing instructions that apply to every session. But it can’t cover everything. A project with a hundred invariants, fifty design decisions, and a thousand symbol-level descriptions would need a CLAUDE.md the size of a novella. The model would spend half its context window reading instructions before doing any work.

The paper describes a three-tier model: hot context that’s always loaded, warm context triggered by specific files or tasks, and cold context searched on demand. CLAUDE.md handles hot. MCP tools handle cold. What was missing was warm – context that loads automatically based on what you’re doing, without anyone asking for it.

Claude Code’s hook system fills that gap.

Claude Code’s Hook System

Hooks are small scripts registered in .claude/settings.json that fire at specific points in a session’s lifecycle. memstore wires up six events: SessionStart, UserPromptSubmit, PreToolUse (matched separately for Read and Edit), PostToolUse (Write and Bash), Stop, and SessionEnd. Each hook receives a JSON payload on stdin describing the event – the prompt text, the tool name and arguments, the session metadata – and returns JSON on stdout. The key field in the response is additionalContext: any text you put there gets injected into the conversation as a system reminder, visible to the model but not to the user.

The constraint that shapes everything is the default five-second timeout. Hooks must be fast. They can’t run a full RAG pipeline or call an external API with retry logic. They need to do one thing quickly and get out of the way. This pushes the design toward pre-computed results and local HTTP calls rather than complex processing at hook time.

The Hooks

SessionStart: Load Pending Tasks and Recent Context

The first thing that happens when a session opens is a pair of CLI calls:

memstore tasks --surface startup
memstore search -query "homelab hosts" -limit 1

The task surfacing solves the “where was I?” problem. Memstore tracks tasks with status (pending, in_progress, completed, cancelled) and scope (user TODOs, agent follow-ups, collaborative work). The startup hook queries for anything pending or in-progress and injects it inside a <session-restore> block. If you left three tasks half-finished yesterday, the new session knows about them immediately. No manual search, no re-explaining.

The second call pulls a narrow slice of standing reference – the homelab host inventory, in my setup – so the model starts already knowing the infrastructure it’s most often asked about. It’s deliberately -limit 1: the startup budget is tight, and most context is better surfaced on demand by the prompt hook than dumped at session open.

(An earlier version also ran a drift check here, comparing fact timestamps against the source files they referenced. The whole drift-detection surface was removed along with the codebase-ingestion subsystem it depended on, so it’s no longer part of the startup path.)

UserPromptSubmit: Automatic Memory Recall on Every Prompt

This is the highest-traffic hook. Every qualifying prompt fires two calls to memstored in parallel – /v1/recall for facts and /v1/context/hints for any pending hints left by the previous session. The recall call runs the full hybrid-plus-rerank pipeline on CPU, so it isn’t instant: the hook gives it a 4.5-second budget (under Claude Code’s 5-second hook timeout), and the reranked path can peak near 3.8 seconds on a heavy prompt. That budget is the whole reason recall truncates as aggressively as it does – the cross-encoder post covers the latency math.

The server-side processing is where the work happens. Rather than using the raw prompt as a search query, memstored runs IDF (inverse document frequency) keyword extraction to pick the most distinctive words. The word “function” appears in hundreds of facts and tells you nothing. The word “supersession” appears in a handful and strongly signals what the user cares about. The pipeline extracts the highest-IDF terms, runs per-keyword full-text search queries, and merges the results with a multi-keyword boost: facts matching several of your distinctive terms score higher than facts matching just one.

On top of keyword scoring, there’s a project boost for facts belonging to the current project, a file-context boost when the active file matches a fact’s source, and a feedback multiplier from the rating system – a confidence-weighted exponential, 2 ^ (avg * (0.4 + 0.6 * conf)), where avg is the fact’s mean rating and conf ramps with how many ratings it has accumulated. A single rating moves the score by roughly ±40%; a fact with five or more consistent ratings reaches the full 2.0x when helpful or 0.5x when not.

Session-level deduplication prevents the same fact from being injected twice. Memstored tracks which fact IDs have been included in recall responses for the current session. On a fifty-prompt session about authentication, the foundational auth design decision gets injected on the first relevant prompt. Subsequent prompts get increasingly specific facts that the earlier injections didn’t cover. The context window stays clean.

The hook skips short prompts under five words and slash commands, which are unlikely to benefit from memory recall. Output is budget-capped at 2000 characters by default, with individual facts truncated to 300 characters to keep the injection compact.

PreToolUse:Read – Surface Facts Before You Read a File

When Claude is about to read a file, the hook does two things. First, it touches the file in memstore (POST /v1/context/touch), recording that this path was active in the session – the same signal the file-context boost uses during recall. Second, it runs memstore eval-triggers --file <path>, which checks the path against trigger patterns and loads matching subsystem context as a <memstore-file-context> block.

The trigger system is where the highest-value context lives. A trigger fact encodes a mapping: “when touching files matching this pattern, inject these constraints.” For example, reading sqlite.go in the memstore codebase auto-loads the invariant “scanFact and factColumns must stay in sync” – a cross-cutting constraint that isn’t visible in any single file but matters every time you edit the schema handling code. More on triggers below.

PreToolUse:Edit – Same Treatment Before a Change

The Edit hook is the Read hook. It touches the file and runs the same trigger evaluation on the file_path, so the invariants and conventions for whatever subsystem the file belongs to land in context before the edit, not after. There’s no symbol-level narrowing: the hook keys on the file path, not on what’s being changed inside it. An earlier design parsed the edit’s old_string to load facts about the specific function being touched, but it leaned on the symbol-level facts the codebase learner produced – and when that subsystem came out, the symbol targeting went with it. The file-path triggers do that work now.

PostToolUse:Write and Bash – Nudge to Store What Just Happened

The other direction: after certain actions, the model should write to memory, not just read from it. A PostToolUse hook on Write and Bash (store-nudge.mjs) pattern-matches the tool’s input and emits a reminder when something memory-worthy just happened – a git init (update the repo inventory), a gh pr create (record the decisions behind the PR), a new CLAUDE.md or go mod init (store a project fact). It’s a nudge, not an action: the hook injects a <store-nudge> suggestion and leaves the actual memory_store call to the model. Its budget is tighter (2 seconds) because all it does is match a command string.

SessionEnd: Record Activity

When a session ends, the SessionEnd hook does two small things. It stores a lightweight activity fact – “last active in project X at time Y” – so future sessions know when you were last working here. And it prints any still-open startup tasks to stderr as a departing reminder, visible in the terminal but not injected into the conversation.

It does not upload the transcript, though that was the original plan and the place I burned the most time for the least result. The first cut tried to capture the transcript here, on SessionEnd, and an early version of the capture path could re-enter itself – the upload triggered work that triggered the upload – and fan out into a runaway loop of calls against a paid API before I caught it. I pulled transcript handling out of SessionEnd entirely.

Stop: Capture the Transcript

Transcript capture lives in the Stop hook instead, which fires when the model finishes responding. The hook is a thin shim (stop-hook.mjs) that forwards to memstore-mcp --hook; all the real work – the upload state machine, per-session tracking, and draining of any pending transcript – lives in the Go binary, where it can be made idempotent and given a longer (10-second) budget than the other hooks. Keeping that state machine in one place, in Go, is what finally made transcript capture reliable after the SessionEnd misadventure. The captured transcript is what the feedback loop later rates.

Triggers: The Warm Tier

Triggers are the mechanism that makes warm context work. A trigger is a fact with kind=trigger that encodes a file-pattern-to-context mapping: “when the agent touches a file matching **/sql*.go, load the SQLite subsystem’s invariants and conventions.”

The eval-triggers CLI evaluates glob patterns against file paths, supporting *, ?, and ** with suffix matching against absolute paths. When a pattern matches, it loads the associated subsystem’s facts, filtered by kind – invariants, conventions, and failure modes. These are the facts that represent cross-cutting constraints: things that aren’t visible in any single file but that you need to know before making changes.

The distinction from the Read/Edit hooks’ file-level facts is scope. File-level facts describe what’s in the file. Trigger-loaded facts describe what the file participates in – the broader system constraints that govern how it should be changed. Both are useful. They serve different purposes.

What Works and What Doesn’t

Works Well

Startup task surfacing genuinely solves the session-continuity problem. Before this, every new session required either a manual status update or hoping the model would search for pending work. Now it’s automatic. I open a session, and it already knows what’s in flight.

Server-side recall with IDF extraction produces noticeably better results than raw prompt search. The combination of distinctive keyword selection, multi-keyword boosting, and session dedup means the context that gets injected is relevant and non-repetitive. The operational lessons post covers the scoring details.

Trigger-based invariant loading is the single highest-value hook behavior. The invariants that triggers surface are exactly the kind of knowledge that prevents bugs – constraints that span multiple files and aren’t documented in any one place. Loading them automatically before an edit is worth more than any amount of post-hoc review.

Doesn’t Work Yet

The prompt hook is the slow part of a turn. Recall runs a CPU cross-encoder rerank, so on a heavy prompt there are a few seconds of wall-clock before the model even starts. It fits inside the 5-second budget today, but it’s the link in the chain most likely to blow that budget as the corpus grows. The truncation budgets are holding the line, not solving the problem.

Triggers are hand-authored. Every file-pattern-to-constraints mapping is one I wrote by hand. There’s no mechanism to propose them from the codebase, so a subsystem gets warm-tier coverage only once I’ve sat down and encoded it. The highest-value hook behavior is also the one with the most manual setup.

There’s no per-file or per-session suppression. If a hook is injecting context you don’t need for a particular task, you can’t tell it to stop for just this session. The options are global – disable the hook entirely or adjust the scoring – which is too coarse for “I know about the SQLite invariants, stop telling me.”

Setup

The hooks are registered in .claude/settings.json as an array of hook definitions. Each entry specifies the event type, an optional tool-name matcher for PreToolUse/PostToolUse events, and the command to run. memstore setup writes these entries for you – and note that a user-scoped settings.local.json is not read by Claude Code, so the entries belong in settings.json.

The hook scripts ship in the memstore repository, and they’re small Node (.mjs) programs, not shell scripts. Each reads the event JSON from stdin, calls the memstore CLI or the memstored HTTP API with fetch, and writes JSON to stdout with the context string in hookSpecificOutput.additionalContext.

What’s Next

Curator integration at injection time. A small local model running as a curator could filter and rank the recall candidate set before injection – deciding which facts are actually relevant to the current task rather than relying on the relevance threshold alone. The MCP side already has a curator tool; wiring one into the prompt hook’s hot path, under the latency budget, is the open problem.

Trigger auto-generation. Triggers are hand-authored today. Something that analyzed file dependencies and cross-references could propose trigger patterns – “these five files share a database dependency, here’s a trigger that loads the schema constraints when any of them are touched” – instead of me encoding each one by hand.

A pre-write hook for new files. The PostToolUse nudge fires after a file is created. A PreToolUse:Write hook could fire before, loading project conventions and templates so a new file follows established patterns from the first line rather than getting corrected afterward.

In-session suppression. The feedback multiplier ramps with confidence, so a single negative rating barely moves a fact – there’s no “suppress this one now” that takes effect before several ratings accumulate. A per-session override would let the agent drop a fact it knows is irrelevant to the current task without waiting for the gradual scoring adjustment.


Related posts: Building Persistent Memory for AI Agents (flagship post), Operational Lessons from 1,500 Memstore Facts , Fact Supersession: Version Control for Knowledge

Project: github.com/matthewjhunter/memstore