Architecture

Inside the workbench.

Named teammates with BIO, IDENTITY, brief, and HEARTBEAT - running on memory that compounds, workflows your team writes, and a safety scaffold that holds. Three things change for you: the pack knows your work, the pack does the work, and you stay in control. Below is how each one is built.

Runtime path

Anatomy of a turn.

This is how one turn moves through the workbench. Eight stages, same order every time. Follow the path; drop into a subsystem when you want to see how that stage actually works.

A turn starts when something triggers an agent - an operator's chat, a heartbeat firing, a handoff from another agent, or a scheduled run. From that moment until an audit row is written, the runtime moves through the stages on the right. Each stage is independently recorded, independently replayable, and owned by exactly one subsystem in the chapters below.

Eight stages, fixed order; the trace is causal, not wall-clock
Skipped stages (a no-op recall, an unmatched workflow) still record an audit row
Each stage failure is classified at the stage that owned it, not at the call site
Click any stage on the right to drop into the subsystem that owns it

Layer · Knows you

The pack remembers your work.

Identity evolves from real episodes. Memory splits into four banks. Context preserves the layers the agent relies on. Together: agents that compound across months instead of starting fresh every morning.

Identity evolutionMemoryContext

Knows you

Identity evolution

Agents that update their own brief from what they actually did.

Every agent in the pack carries a name, BIO, IDENTITY, brief, and HEARTBEAT — markdown documents that say who they are, what they care about, and how they work. After each shift, nightly consolidation reads the week's episodes and rewrites those documents from real work. Identities don't drift; they earn their changes. The agent that helped you last quarter compounds over months instead of starting fresh every morning.

Every agent carries name, BIO, IDENTITY, brief, HEARTBEAT — readable and editable in your workspace
Nightly consolidation rewrites identity from the week's real episodes, not from prompts
Every edit is audit-logged so agents never silently mutate

vs the rest

Every other tool ships agents as prompts or skills. Specialists with evolving identity are the central PACKWOLF claim.

What this means for youThe agent that helped you last quarter knows what worked and writes that into its own brief — without you re-prompting.

Full spec for identity evolution →Source: docs/AGENT_MEMORY.md

agents / drafter / IDENTITY.md

v12 → v13

Last week

#Drafter

·Long-form writer. Editorial voice.

·Default to short sentences.

·Cite sources inline, never as a footer dump.

·If unsure, ask. Don't pad.

Nightly consolidation→

After consolidation

#Drafter

·Long-form writer. Editorial voice.

·Default to short sentences.

·Cite sources inline, never as a footer dump.

·If unsure, ask. Don't pad.

·Lead Monday briefs with the takeaway - audience reads on mobile.

·Run mid-draft past Voice Critic before sending to Editor in Chief.

·Avoid the word "unlock." The Editor cuts it every time.

Earned from

Episodic · this week8 published briefs with measurable CTR delta on mobile-first leads

Approvals · this weekVoice Critic catches dropped from 4 → 0 after pre-review handoff

Audit · this weekEditor in Chief revisions removed "unlock" 6 / 6 times

Knows you

Memory

Four banks of memory. Recalled only when it matters.

Most AI tools give you one bucket of memory and call it memory. PACKWOLF splits memory into four banks — procedural, working, episodic, durable — and retrieves each on its own terms. Every recall passes a gate that decides whether to search at all; every write passes gates for novelty, quality, and contradiction. Nightly consolidation distills the week's episodes into durable knowledge the pack carries forward.

Procedural, working, episodic, durable — four banks, four budgets, retrieved separately
Recall gate skips the search when the turn doesn't need it
Vector-first retrieval with full-text fallback and temporal rerank

vs the rest

ChatGPT, Claude, and Perplexity all flatten memory into one bucket. The pack remembers in four.

What this means for youAgents that remember your last campaign and improve from it, instead of asking what you said three weeks ago.

Full spec for memory →Source: docs/AGENT_MEMORY.md

Prior artMemGPT, Towards LLMs as Operating Systems· Packer et al., Berkeley, 2023 Introducing Contextual Retrieval· Anthropic, 2024

packwolf.app · Memory

Live demo

Recall · vector + FTS5 + temporal rerankGate: substantive · 4 hits across 3 layers

Query: “What worked the last time we ran an April content refresh?”

ProceduralHow the agent operates12 / 64

Brief structure: angle, sources, target keyword, voice notes, do-not-say list.

Recall hitBRAND_VOICE.md · v4 · 0.91

Headlines avoid em dashes and superlatives.

Recall hitBRAND_VOICE.md · v4 · 0.78

WorkingThis turn's recent context5 messages

Operator asked: "Which Q1 piece should we revive for the April refresh?"

Turn 1 · just now

EpisodicRecent events worth remembering183 events

Last April refresh used the "buyer journey" series; CTR up 22%.

Recall hitEditor in Chief · 2 weeks ago · 0.84

Drafts agent flagged sentence-length drift on the March recap.

Voice Critic · 6 days ago · 0.41

DurableCompacted, long-term knowledge1,612 chunks

Audience reads on mobile, ~40 sec average dwell. Lead with the takeaway.

Recall hitaudience-insights · daily · 0.66

Knows you

Context

Five-layer context budget. The model never runs out of room.

Stuffing everything into the prompt fails once a conversation runs long enough to matter. PACKWOLF assembles every prompt from a fixed budget across system prompt, tools, memory, conversation, and output headroom. As conversations grow, four compaction stages keep the most-relied-on layers intact — your agent's identity never gets cropped out to make room for chat history.

Five budget layers, prioritized so identity always survives truncation
Four-stage compaction kicks in before the window fills, not after
Conversation checkpoints persist last 10 messages plus summary for recovery

vs the rest

Most agents either run out of context or drop the system prompt to keep going. PACKWOLF compacts deliberately and preserves the layers the agent relies on.

What this means for youLong-running agents that stay coherent into hour twelve, not just hour one.

Full spec for context →Source: docs/CONTEXT_MANAGEMENT.md

Prior artPrompt Caching with Claude· Anthropic, 2024 Lost in the Middle, How Language Models Use Long Contexts· Liu et al., Stanford, 2023

Context budget · per call

~200K tokens

1~10K

2~20K

4~16K

5~130K

3~24K

Survival order, when budget tightens

1
System promptIdentity, never truncated
2
Tool schemasRequired for action
3
Output headroomReserved for response
4
Memory recallRe-fetchable per turn
5
ConversationCompacted first under pressure

When pressure exceeds 60%, four stages fire in order

Stage 01

Flush extraction

At 60%

Stage 02

Microcompact

Tool results

Stage 03

LLM summarize

History

Stage 04

Summary + last

Last resort

Seen enough?

If a pack that knows your work is what you've been looking for, request beta now. The rest of the page is evidence.

Request beta access

Layer · Works while you sleep

The pack does the work.

Heartbeats run on goals, not just prompts. Workflows are markdown your team writes. Agents delegate in the open. Models, tools, and the safety scaffold are first-class - not abstractions that hide what each one is good at.

HeartbeatsWorkflowsAgent-to-agentToolsModels

Works while you sleep

Heartbeats

Continuous work across weeks. The pack assesses, picks, executes — on schedule.

Real work doesn't fit inside one chat turn. Heartbeats are scheduled background runs where each agent assesses its queue, picks the most actionable thing, and executes one full turn. A priority score weighs due date, source, staleness, and momentum so the next thing is always the right thing. Every beat is recoverable — a timed-out run gets resumed, not redone.

Two-phase: deterministic priority score plus model-guided actionability check
Agents work goals that span weeks, not just single-prompt tasks
Every beat audit-logged with assessment, queue snapshot, side-effect ledger, recovery checkpoint

vs the rest

Other tools run on prompts or cron triggers. PACKWOLF runs on goals with priority scoring across the whole pack.

What this means for youThe work you'd otherwise re-explain every Monday gets picked up while you're asleep — with the right priority every time.

Full spec for heartbeats →Source: docs/heartbeat.md

Prior artReflexion, Language Agents with Verbal Reinforcement Learning· Shinn et al., Northeastern / MIT, 2023

packwolf.app · Heartbeats

Live demo

Schedule · every 30 min, business hours6 beats today · 4 executed · 1 recovered · 1 skipped