Capability · Knowledge layer

Five-layer context budget. Four-stage compaction. The model never runs out of room.

Every prompt is assembled from a fixed budget: system prompt, tool schemas, memory, conversation, output headroom. As conversations grow, four stages of compaction kick in, flush extraction, microcompaction, LLM summarization, death-spiral fallback. Identity always survives truncation.

5
Budget layers
4
Compaction stages
60%
Flush threshold
Last 10
Messages persisted
What it actually does

The parts that make this work.

Five layers, fixed budget.

System prompt, tool schemas, memory block, conversation, output headroom. Each layer competes for room, but identity always survives.

Flush extraction at 60%.

Before any compression runs, valuable facts get written to the daily log. Compression can lose detail; the log keeps it retrievable.

Microcompaction trims tool noise.

Verbose tool results get truncated first. Context-cheap, content-cheap, doesn't lose what the model actually needs.

LLM summarization preserves meaning.

When microcompaction isn't enough, older messages compress through a summarization pass with pre-sanitize and post-scan injection gates.

Death-spiral fallback always succeeds.

If everything else fails, the system resets to summary-plus-last-message. The model always gets room to generate. Graceful degradation, not crashes.

Tool-pair truncation never breaks a call.

tool_use and tool_result blocks stay bonded through every truncation pass. The model never sees half a tool exchange.

How it works

The path through context.

  1. 01

    Budget computes per turn.

    The model's context window minus output headroom = total budget. Each layer (system, tools, memory, conversation) gets a slice with priority weights.

  2. 02

    Identity-priority survives first.

    If layers compete, system prompt and self-identity sections truncate last. The agent's sense of who-it-is stays intact.

  3. 03

    Stage 0, flush extraction.

    At 60% budget, the system extracts valuable content (decisions, facts, file paths) and writes them to the daily log. Compression loses detail; the log preserves it.

  4. 04

    Stage 1, microcompaction.

    Verbose tool results trim to summaries. Cheap, fast, and the model rarely notices. tool_use/tool_result pairs stay bonded.

  5. 05

    Stage 2, LLM summarization.

    Older messages compress through a summarization model call. The summary passes through injection gates so jailbreaks in old messages can't survive into the new context.

  6. 06

    Stage 3, death-spiral fallback.

    If summarization still doesn't fit, the system resets to summary-plus-last-message. The model always has room to generate. The trace records that fallback fired.

Common questions

Things engineers actually ask.

Long contexts cost more, run slower, and are noisier, recall quality on the middle of long contexts is worse than on focused ones. Compaction keeps prompts focused without losing facts you'll need later (the daily log handles that).

Source: docs/CONTEXT_MANAGEMENT.md

See it in your workspace.

Closed-beta cohorts are small. Tell us what you'd want this capability to handle for your team.

Request beta access