Capability · Knowledge layer

Four memory layers, retrieved selectively, consolidated nightly.

Useful recall is selective; durable learning is accountable. PACKWOLF separates the two and gives each one its own retrieval rules, write gates, and audit trail.

4
Memory layers
Vector-first
Retrieval
Nightly
Consolidation
Atomic
Per-target writes
packwolf.app · Memory
Live screenshot
Memory screenshot
The memory browser. Inspect every chunk, see its source turn, prune what's stale, restore from history.
What it actually does

The parts that make this work.

Four explicit banks.

Procedural (always-on guidance), working (current session), recent episodic, durable semantic. Each bank has its own budget, its own retrieval rules, its own freshness contract.

Recall is gated before search.

A cheap classifier labels each turn as trivial / followup / recall / substantive. If recall isn't needed, no vector search runs. Most turns skip retrieval entirely.

Vector-first, FTS5 fallback.

Semantic search via sqlite-vec runs first because it benchmarks best in this codebase. FTS5 fills gaps. A temporal rerank handles freshness without losing relevance.

Write gates protect the bank.

Novelty, quality, dedup, and contradiction handling run at write time, not read time. New chunks compete for room, they don't accumulate forever.

Boundary markers stop injection.

Every retrieved memory chunk arrives with provenance markers. Content scanning + summarization gates prevent prompt-injection attacks via recalled text.

Atomic write queue per target.

Memory writes are queued atomically per target so concurrent agents can't corrupt the bank. Failure modes recover instead of dropping turns.

How it works

The path through memory.

  1. 01

    A turn arrives.

    The pipeline classifies it as trivial, active follow-up, recall, or substantive default. Each class has a different memory budget.

  2. 02

    Procedural loads first.

    Stable guidance, operator preferences, agent directives, manager feedback, loads from curated files without ranking. Always small, always present.

  3. 03

    Working memory builds.

    Session state plus the checkpoint summary from prior compaction. This is the in-flight conversation's compact form.

  4. 04

    Episodic + durable get queried.

    Recent episodic chunks query the episodic bank; durable semantic memory uses vector-first retrieval, then reranks by recency, entity intent, purpose, and quality.

  5. 05

    The model gets a bounded block.

    Selected sections inject into the prompt under a hard budget. A per-turn memory trace records what was retrieved and why, attached to the debug snapshot.

  6. 06

    Nightly, consolidation runs.

    Episodic chunks distill into durable knowledge. The bank gets cleaner over time, not noisier.

typescriptMemory retrieval at runtime, selective by design
// lib/memory-retrieval.ts (sketch)
const turnClass = classifyTurn(message);   // trivial | followup | recall | substantive
if (turnClass === "trivial") return null;  // skip retrieval entirely

const plan = buildRetrievalPlan(turnClass, agent);
const procedural = await loadProcedural(agent);
const working    = await buildWorkingMemory(session);
const episodic   = await searchEpisodic(plan.episodicBudget, message);
const durable    = await vectorSearch(plan.durableBudget, message)
                     .then(rerankByRecency)
                     .then(rerankByEntityIntent);

return assembleBlock({ procedural, working, episodic, durable });
Common questions

Things engineers actually ask.

Different recall needs have different freshness contracts. Procedural guidance should always load, working memory should reflect the current session, episodic should remember last week, durable should remember last quarter. Mashing them together makes selective retrieval impossible.

Source: docs/AGENT_MEMORY.md

See it in your workspace.

Closed-beta cohorts are small. Tell us what you'd want this capability to handle for your team.

Request beta access