The Layers That Didn't Hold
A few weeks ago I wrote that defense in depth for AI agents means layers, not walls: screen untrusted content before the model acts on it, sanitize what comes back out, and never trust the data flowing through. Clean theory. Then I went back and read the code in Herald that was supposed to implement those layers.
Several of them didn’t hold.
Herald is my feed reader. It pulls RSS and Atom from across the internet, runs each article through a local security model before anything else touches it, scores the survivors for relevance, and announces the interesting ones. Every feed item is untrusted content aimed at a model. That’s the whole premise of the defense-in-depth piece, and it’s exactly the threat I built Herald to study. What follows is the v0.2.0 hardening pass – the bugs the theory missed, and a couple of ideas that worked.

