Engineering
How we built it.
Posts on the parts of PACKWOLF where the choices weren't obvious. Context, observability, scheduling. The trade-offs we made and what we'd do differently next time.
↓ RSS feedContextApr 21, 2026
Why we wrote our own context-compaction stack
Long contexts cost more, run slower, and recall worse on the middle. Compaction keeps prompts focused.
8 min readObservabilityMar 18, 2026
Building a flame graph for agent execution
Agent runtimes have a different failure shape than web apps. The model can emit a tool name with no arguments.
9 min readModelsFeb 9, 2026
A priority queue for shared local LLMs
Most local inference servers handle one request at a time per model. Concurrent requests cause crashes, model swap thrash, or both.
7 min read