Engineering

How we built it.

Posts on the parts of PACKWOLF where the choices weren't obvious. Context, observability, scheduling. The trade-offs we made and what we'd do differently next time.

↓ RSS feed

ContextApr 21, 2026

Why we wrote our own context-compaction stack

Long contexts cost more, run slower, and recall worse on the middle. Compaction keeps prompts focused.

8 min read

ObservabilityMar 18, 2026

Building a flame graph for agent execution

Agent runtimes have a different failure shape than web apps. The model can emit a tool name with no arguments.

9 min read

ModelsFeb 9, 2026

A priority queue for shared local LLMs

Most local inference servers handle one request at a time per model. Concurrent requests cause crashes, model swap thrash, or both.

7 min read