Anatomy of a turn.
This is how one turn moves through the workbench. Eight stages, same order every time. Follow the path; drop into a subsystem when you want to see how that stage actually works.
A turn starts when something triggers an agent - an operator's chat, a heartbeat firing, a handoff from another agent, or a scheduled run. From that moment until an audit row is written, the runtime moves through the stages on the right. Each stage is independently recorded, independently replayable, and owned by exactly one subsystem in the chapters below.
- Eight stages, fixed order; the trace is causal, not wall-clock
- Skipped stages (a no-op recall, an unmatched workflow) still record an audit row
- Each stage failure is classified at the stage that owned it, not at the call site
- Click any stage on the right to drop into the subsystem that owns it
Identity evolution
Agents that update their own brief from what they actually did.
Every agent in the pack carries a name, BIO, IDENTITY, brief, and HEARTBEAT — markdown documents that say who they are, what they care about, and how they work. After each shift, nightly consolidation reads the week's episodes and rewrites those documents from real work. Identities don't drift; they earn their changes. The agent that helped you last quarter compounds over months instead of starting fresh every morning.
- Every agent carries name, BIO, IDENTITY, brief, HEARTBEAT — readable and editable in your workspace
- Nightly consolidation rewrites identity from the week's real episodes, not from prompts
- Every edit is audit-logged so agents never silently mutate
Every other tool ships agents as prompts or skills. Specialists with evolving identity are the central PACKWOLF claim.
What this means for youThe agent that helped you last quarter knows what worked and writes that into its own brief — without you re-prompting.
Memory
Four banks of memory. Recalled only when it matters.
Most AI tools give you one bucket of memory and call it memory. PACKWOLF splits memory into four banks — procedural, working, episodic, durable — and retrieves each on its own terms. Every recall passes a gate that decides whether to search at all; every write passes gates for novelty, quality, and contradiction. Nightly consolidation distills the week's episodes into durable knowledge the pack carries forward.
- Procedural, working, episodic, durable — four banks, four budgets, retrieved separately
- Recall gate skips the search when the turn doesn't need it
- Vector-first retrieval with full-text fallback and temporal rerank
ChatGPT, Claude, and Perplexity all flatten memory into one bucket. The pack remembers in four.
What this means for youAgents that remember your last campaign and improve from it, instead of asking what you said three weeks ago.
Context
Five-layer context budget. The model never runs out of room.
Stuffing everything into the prompt fails once a conversation runs long enough to matter. PACKWOLF assembles every prompt from a fixed budget across system prompt, tools, memory, conversation, and output headroom. As conversations grow, four compaction stages keep the most-relied-on layers intact — your agent's identity never gets cropped out to make room for chat history.
- Five budget layers, prioritized so identity always survives truncation
- Four-stage compaction kicks in before the window fills, not after
- Conversation checkpoints persist last 10 messages plus summary for recovery
Most agents either run out of context or drop the system prompt to keep going. PACKWOLF compacts deliberately and preserves the layers the agent relies on.
What this means for youLong-running agents that stay coherent into hour twelve, not just hour one.
Heartbeats
Continuous work across weeks. The pack assesses, picks, executes — on schedule.
Real work doesn't fit inside one chat turn. Heartbeats are scheduled background runs where each agent assesses its queue, picks the most actionable thing, and executes one full turn. A priority score weighs due date, source, staleness, and momentum so the next thing is always the right thing. Every beat is recoverable — a timed-out run gets resumed, not redone.
- Two-phase: deterministic priority score plus model-guided actionability check
- Agents work goals that span weeks, not just single-prompt tasks
- Every beat audit-logged with assessment, queue snapshot, side-effect ledger, recovery checkpoint
Other tools run on prompts or cron triggers. PACKWOLF runs on goals with priority scoring across the whole pack.
What this means for youThe work you'd otherwise re-explain every Monday gets picked up while you're asleep — with the right priority every time.
Workflows
Markdown SOPs your team writes. The pack runs them.
PACKWOLF makes procedure literally a markdown document your team writes and your agents run. Workflows cascade through products, goals, projects, and tasks — a product workflow can spawn a goal workflow, which spawns project and task workflows automatically. Agents propose new workflows from learned patterns; you approve through three-tier sign-off. The pack discovers playbooks you didn't write.
- Authored in markdown — natural language is the source of truth
- Cascade: product → goal → project → task chains automatically; the priority-scored queue picks what's next
- Playbook graduation: five-plus related ad-hoc memories trigger a workflow proposal
Zapier and n8n route between SaaS tools. Lindy chains agent calls. PACKWOLF runs the procedure your team would write down — and proposes new ones from learned patterns.
What this means for youThe playbook you used to re-explain every Monday gets written once. The pack runs it from then on.
Agent-to-agent
Managers and reports who talk to each other. You join the thread when needed.
When one agent needs another's expertise, the conversation happens in the open. Every agent publishes a card describing what it does and the boundaries it works within. Delegations show up in your chat thread as transcript blocks — you see the whole conversation, not just the result. Trust scope keeps child agents narrower than their parent: an agent can never invoke another with broader permissions than itself.
- Org chart with managers and reports; delegations render as visible transcript blocks
- Trust scope: child agents inherit or narrow parent permissions, never widen
- External agents speaking the open A2A protocol can join your pack
Other multi-agent tools ghost delegations into hidden subprocesses. PACKWOLF runs the org chart in the open.
What this means for youWhen your manager agent hands off to your drafter, you watch the conversation happen. No black boxes.
Tools
Built-in and MCP tools. Every call passes a safety gate before execution.
Agents act through tools — file ops, shell, web, communication, memory, and any MCP server you connect. Anthropic shipped the Model Context Protocol so tool integration wouldn't be reinvented per vendor; PACKWOLF treats MCP as first-class. What gets layered on top is where production differs from demo: every call passes a nine-point safety gate before execution, read tools parallelize while writes serialize, and loop detection aborts runaway sequences with diagnostics rather than burning budget.
- Nine safety checks before every tool call: allowlist, budget, approval, scope, injection, path, rate limit, network, trust
- Read tools run ten in parallel; writes and dangerous ops serialize
- Tool trust earned through five consecutive approvals; revoked on rejection
Other tools either ship raw tool access or hide it behind a black box. PACKWOLF gates every call, in the open.
What this means for youAgents that act on your behalf, with brakes you can actually see and adjust.
Models
Native SDK per provider. Claude, OpenAI, our in-house model, local LLMs — switch per call.
Wrapping every provider behind one OpenAI-shaped interface is convenient and lossy: prompt caching, extended thinking, and vision behave differently enough that the abstraction quietly drops what makes each model worth picking. PACKWOLF integrates each provider through its native SDK so Claude's prompt caching, OpenAI's vision, and our in-house model's economics are all available where they matter. Health monitoring routes around outages automatically.
- Native SDKs — no lossy provider abstraction; per-call provider switching
- Health monitor: three failures in a window triggers a 60-second cooldown and failover
- Local LLMs run behind a priority queue so a shared GPU stays alive across the pack
ChatGPT, Claude, and Perplexity are single-vendor by design. OpenClaw and Hermes are BYOK; PACKWOLF is BYOK plus an in-house model so you don't have to bring one.
What this means for youPick the right model for each turn, not the model your tool happens to be built around. Pay your provider, not us.
Approvals
Three-tier sign-off on the moments that matter. Versioned. Reversible.
Anything that changes a load-bearing artifact — a published doc, a sent message, a code change, a customer-facing decision — moves through three-tier approval: author, peer review, sign-off. Every change is versioned and revertable. Linked plans and specs reconcile through a readiness gate before the system treats a document as build-ready. An exception queue captures everything that escalates, with full audit trail.
- Three tiers: author, peer review, sign-off — gated before downstream trust
- Every change versioned; revert restores any prior version
- Exception queue captures escalations; every decision audit-logged
Most agent tools let the agent act and hope it didn't go wrong. PACKWOLF stops before the moments that matter.
What this means for youAgents that send the email, push the code, publish the post — only after you said yes. Reversible if you change your mind.
Audit and cost
Every call recorded as a span. Replayable, diffable, queryable.
Audit is the system, not an after-the-fact dump. Every model call and tool call is a node in the trace; every prompt is diffable across runs; every approval and decision records its inputs and outputs verbatim. Cost events fire on every call and roll up into per-agent and per-tool budgets with threshold alerts. When something goes sideways, you replay the trace, diff the prompt, and find the call that actually changed state.
- Per-span flame graph with input, output, metadata, and diagnosis tabs
- Prompt versioning — every system prompt diffable across runs
- Per-agent and per-tool cost events with threshold alerts on budgets
Most tools give you a chat log and call it observability. PACKWOLF gives you the trace, the cost, and the diff.
What this means for youWhen the agent did something you didn't expect, you can see what it actually did and why — not just what it said it did.
Your terms
Your data on your machine. Managed cloud when the work earns it. Your keys across providers.
Your workspace, memory, traces, and audit log stay on your machine by default. Nothing roams unless you opt in. The model layer is a deliberate choice you make per call — Claude, OpenAI, our in-house model, or a local LLM — and your keys go to your provider, never marked up. BYOK is the local app with account and license management. Basic runs the whole operation locally with a cloud command center, included usage, and BYOK overflow. Pro and Max add managed cloud execution when selected work should keep running without the desktop.
- Workspace, memory, traces, and audit log stay local by default
- BYOK stays local; Basic runs locally with command-center features and BYOK overflow; Pro+ adds managed cloud execution
- Your keys across Claude, OpenAI, our in-house model, local LLMs, MCP servers — no markup
Other prosumer tools either lock you in or are cloud-only. PACKWOLF is genuinely both — data sovereignty plus deployment flexibility without pretending every tier should sync everything.
What this means for youYour customer list, your draft work, your competitive notes — all on the machine you trust. Pick the model for the job; don't pick the vendor for your data.