Architecture

Inside the workbench.

Named teammates with BIO, IDENTITY, brief, and HEARTBEAT - running on memory that compounds, workflows your team writes, and a safety scaffold that holds. Three things change for you: the pack knows your work, the pack does the work, and you stay in control. Below is how each one is built.

Runtime path

Anatomy of a turn.

This is how one turn moves through the workbench. Eight stages, same order every time. Follow the path; drop into a subsystem when you want to see how that stage actually works.

A turn starts when something triggers an agent - an operator's chat, a heartbeat firing, a handoff from another agent, or a scheduled run. From that moment until an audit row is written, the runtime moves through the stages on the right. Each stage is independently recorded, independently replayable, and owned by exactly one subsystem in the chapters below.

  • Eight stages, fixed order; the trace is causal, not wall-clock
  • Skipped stages (a no-op recall, an unmatched workflow) still record an audit row
  • Each stage failure is classified at the stage that owned it, not at the call site
  • Click any stage on the right to drop into the subsystem that owns it
01
Layer · Knows you

The pack remembers your work.

Identity evolves from real episodes. Memory splits into four banks. Context preserves the layers the agent relies on. Together: agents that compound across months instead of starting fresh every morning.

Identity evolutionMemoryContext
Knows you

Identity evolution

Agents that update their own brief from what they actually did.

Every agent in the pack carries a name, BIO, IDENTITY, brief, and HEARTBEAT — markdown documents that say who they are, what they care about, and how they work. After each shift, nightly consolidation reads the week's episodes and rewrites those documents from real work. Identities don't drift; they earn their changes. The agent that helped you last quarter compounds over months instead of starting fresh every morning.

  • Every agent carries name, BIO, IDENTITY, brief, HEARTBEAT — readable and editable in your workspace
  • Nightly consolidation rewrites identity from the week's real episodes, not from prompts
  • Every edit is audit-logged so agents never silently mutate
vs the rest

Every other tool ships agents as prompts or skills. Specialists with evolving identity are the central PACKWOLF claim.

What this means for youThe agent that helped you last quarter knows what worked and writes that into its own brief — without you re-prompting.

Full spec for identity evolutionSource: docs/AGENT_MEMORY.md
agents / drafter / IDENTITY.md
v12 → v13
Last week
#Drafter
·Long-form writer. Editorial voice.
·Default to short sentences.
·Cite sources inline, never as a footer dump.
·If unsure, ask. Don't pad.
Nightly consolidation
After consolidation
#Drafter
·Long-form writer. Editorial voice.
·Default to short sentences.
·Cite sources inline, never as a footer dump.
·If unsure, ask. Don't pad.
·Lead Monday briefs with the takeaway - audience reads on mobile.
·Run mid-draft past Voice Critic before sending to Editor in Chief.
·Avoid the word "unlock." The Editor cuts it every time.
Earned from
Episodic · this week8 published briefs with measurable CTR delta on mobile-first leads
Approvals · this weekVoice Critic catches dropped from 4 → 0 after pre-review handoff
Audit · this weekEditor in Chief revisions removed "unlock" 6 / 6 times
Knows you

Memory

Four banks of memory. Recalled only when it matters.

Most AI tools give you one bucket of memory and call it memory. PACKWOLF splits memory into four banks — procedural, working, episodic, durable — and retrieves each on its own terms. Every recall passes a gate that decides whether to search at all; every write passes gates for novelty, quality, and contradiction. Nightly consolidation distills the week's episodes into durable knowledge the pack carries forward.

  • Procedural, working, episodic, durable — four banks, four budgets, retrieved separately
  • Recall gate skips the search when the turn doesn't need it
  • Vector-first retrieval with full-text fallback and temporal rerank
vs the rest

ChatGPT, Claude, and Perplexity all flatten memory into one bucket. The pack remembers in four.

What this means for youAgents that remember your last campaign and improve from it, instead of asking what you said three weeks ago.

packwolf.app · Memory
Live demo
Recall · vector + FTS5 + temporal rerankGate: substantive · 4 hits across 3 layers
Query: “What worked the last time we ran an April content refresh?”
ProceduralHow the agent operates12 / 64
Brief structure: angle, sources, target keyword, voice notes, do-not-say list.
Recall hitBRAND_VOICE.md · v4 · 0.91
Headlines avoid em dashes and superlatives.
Recall hitBRAND_VOICE.md · v4 · 0.78
WorkingThis turn's recent context5 messages
Operator asked: "Which Q1 piece should we revive for the April refresh?"
Turn 1 · just now
EpisodicRecent events worth remembering183 events
Last April refresh used the "buyer journey" series; CTR up 22%.
Recall hitEditor in Chief · 2 weeks ago · 0.84
Drafts agent flagged sentence-length drift on the March recap.
Voice Critic · 6 days ago · 0.41
DurableCompacted, long-term knowledge1,612 chunks
Audience reads on mobile, ~40 sec average dwell. Lead with the takeaway.
Recall hitaudience-insights · daily · 0.66
Knows you

Context

Five-layer context budget. The model never runs out of room.

Stuffing everything into the prompt fails once a conversation runs long enough to matter. PACKWOLF assembles every prompt from a fixed budget across system prompt, tools, memory, conversation, and output headroom. As conversations grow, four compaction stages keep the most-relied-on layers intact — your agent's identity never gets cropped out to make room for chat history.

  • Five budget layers, prioritized so identity always survives truncation
  • Four-stage compaction kicks in before the window fills, not after
  • Conversation checkpoints persist last 10 messages plus summary for recovery
vs the rest

Most agents either run out of context or drop the system prompt to keep going. PACKWOLF compacts deliberately and preserves the layers the agent relies on.

What this means for youLong-running agents that stay coherent into hour twelve, not just hour one.

Context budget · per call
~200K tokens
1~10K
2~20K
4~16K
5~130K
3~24K
Survival order, when budget tightens
  1. 1
    System promptIdentity, never truncated
  2. 2
    Tool schemasRequired for action
  3. 3
    Output headroomReserved for response
  4. 4
    Memory recallRe-fetchable per turn
  5. 5
    ConversationCompacted first under pressure

When pressure exceeds 60%, four stages fire in order
Stage 01
Flush extraction
At 60%
Stage 02
Microcompact
Tool results
Stage 03
LLM summarize
History
Stage 04
Summary + last
Last resort
Seen enough?

If a pack that knows your work is what you've been looking for, request beta now. The rest of the page is evidence.

Request beta access
02
Layer · Works while you sleep

The pack does the work.

Heartbeats run on goals, not just prompts. Workflows are markdown your team writes. Agents delegate in the open. Models, tools, and the safety scaffold are first-class - not abstractions that hide what each one is good at.

HeartbeatsWorkflowsAgent-to-agentToolsModels
Works while you sleep

Heartbeats

Continuous work across weeks. The pack assesses, picks, executes — on schedule.

Real work doesn't fit inside one chat turn. Heartbeats are scheduled background runs where each agent assesses its queue, picks the most actionable thing, and executes one full turn. A priority score weighs due date, source, staleness, and momentum so the next thing is always the right thing. Every beat is recoverable — a timed-out run gets resumed, not redone.

  • Two-phase: deterministic priority score plus model-guided actionability check
  • Agents work goals that span weeks, not just single-prompt tasks
  • Every beat audit-logged with assessment, queue snapshot, side-effect ledger, recovery checkpoint
vs the rest

Other tools run on prompts or cron triggers. PACKWOLF runs on goals with priority scoring across the whole pack.

What this means for youThe work you'd otherwise re-explain every Monday gets picked up while you're asleep — with the right priority every time.

packwolf.app · Heartbeats
Live demo
Schedule · every 30 min, business hours6 beats today · 4 executed · 1 recovered · 1 skipped
08:00
Approve Mon brief: SaaS retentionEditor in Chief
Approved with 2 voice notes; sent to Drafter.
0.86Executed
08:30
Long-form: Q1 retention recapDrafter
Wrote 1,420 words. Brand Voice flagged 1 line; resolved.
0.74Executed
09:00
(no actionable work above floor)Researcher
No actionable task above the floor. Logged a no-op span.
Skipped
09:30
Schedule social pack for Wed pieceDistribution
Queued 4 channels, deferred LinkedIn for editor sign-off.
0.69Executed
10:00
Pull weekly traffic vs. brief targetsAnalyst
Timed out at 9.5min; resumed from checkpoint, finished cleanly.
0.71Recovered
10:30
Triage publish queueEditor in Chief
Promoted 2 pieces, sent 1 back with notes.
0.78Executed
Works while you sleep

Workflows

Markdown SOPs your team writes. The pack runs them.

PACKWOLF makes procedure literally a markdown document your team writes and your agents run. Workflows cascade through products, goals, projects, and tasks — a product workflow can spawn a goal workflow, which spawns project and task workflows automatically. Agents propose new workflows from learned patterns; you approve through three-tier sign-off. The pack discovers playbooks you didn't write.

  • Authored in markdown — natural language is the source of truth
  • Cascade: product → goal → project → task chains automatically; the priority-scored queue picks what's next
  • Playbook graduation: five-plus related ad-hoc memories trigger a workflow proposal
vs the rest

Zapier and n8n route between SaaS tools. Lindy chains agent calls. PACKWOLF runs the procedure your team would write down — and proposes new ones from learned patterns.

What this means for youThe playbook you used to re-explain every Monday gets written once. The pack runs it from then on.

packwolf.app · Workflows
Live demo
weekly-content-cycle.mdv7 · edited 2d ago
# Weekly content cycle

Match: products labelled "marketing"
Required tools: web_search, file_write
Required skills: deep-research

## Step 1 — Editorial planning (Mon)
Editor in Chief reviews last week's analytics
and proposes 3-5 pieces.

## Step 2 — Brief authoring (Mon)
SEO + Brand Voice author one brief per piece.
Briefs cite at least three sources.

## Step 3 — Draft (Tue–Wed)
Drafts agent writes long-form. Brand Voice
flags drift before review.

## Step 4 — Review (Thu)
Editor in Chief approves or sends back.

Approvals: Editor signs off before publish.
Cascade · product → task3 of 4 active
MarketingProduct
weekly-content-cycle.md
April refresh pushGoal
campaign-cycle.md
Buyer-journey series v2Project
longform-piece.md
Brief: B2B SaaS retentionTask
brief-authoring.md
Works while you sleep

Agent-to-agent

Managers and reports who talk to each other. You join the thread when needed.

When one agent needs another's expertise, the conversation happens in the open. Every agent publishes a card describing what it does and the boundaries it works within. Delegations show up in your chat thread as transcript blocks — you see the whole conversation, not just the result. Trust scope keeps child agents narrower than their parent: an agent can never invoke another with broader permissions than itself.

  • Org chart with managers and reports; delegations render as visible transcript blocks
  • Trust scope: child agents inherit or narrow parent permissions, never widen
  • External agents speaking the open A2A protocol can join your pack
vs the rest

Other multi-agent tools ghost delegations into hidden subprocesses. PACKWOLF runs the org chart in the open.

What this means for youWhen your manager agent hands off to your drafter, you watch the conversation happen. No black boxes.

packwolf.app · Agent-to-agent
Live demo
Pack hierarchy · A2A spec compatible1 active delegation
TIER 1TIER 2TIER 3OROrchestratorELEngineering LeadMLMarketing LeadTLTech LeadIMImplementerQAQAEDEditorDRDrafterVCVoice Critic
A2A transcript · turn-7c84 · 3 nested spansTrust scope · scope-reduce only
OrchestratorQ1 planning sprint kicks off Monday. Pull a tech scope from your team for the migrate-auth track.09:14
Engineering LeadNeed a rough estimate on migrate-auth - endpoints, dependencies, blast radius. End of day Wednesday is fine.09:14
Tech LeadOn it. Spiking now, will paste numbers in this thread within 30 min.09:15
Works while you sleep

Tools

Built-in and MCP tools. Every call passes a safety gate before execution.

Agents act through tools — file ops, shell, web, communication, memory, and any MCP server you connect. Anthropic shipped the Model Context Protocol so tool integration wouldn't be reinvented per vendor; PACKWOLF treats MCP as first-class. What gets layered on top is where production differs from demo: every call passes a nine-point safety gate before execution, read tools parallelize while writes serialize, and loop detection aborts runaway sequences with diagnostics rather than burning budget.

  • Nine safety checks before every tool call: allowlist, budget, approval, scope, injection, path, rate limit, network, trust
  • Read tools run ten in parallel; writes and dangerous ops serialize
  • Tool trust earned through five consecutive approvals; revoked on rejection
vs the rest

Other tools either ship raw tool access or hide it behind a black box. PACKWOLF gates every call, in the open.

What this means for youAgents that act on your behalf, with brakes you can actually see and adjust.

packwolf.app · Tool call
Live demo
Tool call · web_search
researcher.web_search({
  query: "Q1 SaaS retention benchmarks 2025",
  freshness: "month",
  limit: 8
})
Safety gate · 9 checksAll passed in 8ms
Allowlist
web_search · researcher pack
Budget
$0.04 used / $5.00 turn cap
Approval
Read-tier · auto-approved
Scope
researcher · /workspace only
Injection
No suspicious markers in args
Path
n/a · network tool
Rate limit
3 of 10 in this window
SSRF
Resolved IP not internal
Trust
12 consecutive approvals · earned
Result · 8 sources · 1.2s · cached for 24h
Returned to Researcher. Span recorded with input + output. Loop detector clear. Trust state increments to 13.
Works while you sleep

Models

Native SDK per provider. Claude, OpenAI, our in-house model, local LLMs — switch per call.

Wrapping every provider behind one OpenAI-shaped interface is convenient and lossy: prompt caching, extended thinking, and vision behave differently enough that the abstraction quietly drops what makes each model worth picking. PACKWOLF integrates each provider through its native SDK so Claude's prompt caching, OpenAI's vision, and our in-house model's economics are all available where they matter. Health monitoring routes around outages automatically.

  • Native SDKs — no lossy provider abstraction; per-call provider switching
  • Health monitor: three failures in a window triggers a 60-second cooldown and failover
  • Local LLMs run behind a priority queue so a shared GPU stays alive across the pack
vs the rest

ChatGPT, Claude, and Perplexity are single-vendor by design. OpenClaw and Hermes are BYOK; PACKWOLF is BYOK plus an in-house model so you don't have to bring one.

What this means for youPick the right model for each turn, not the model your tool happens to be built around. Pay your provider, not us.

Provider roster · native SDK each
Pick per call
  • ClaudeLong reasoning, prompt caching, safety
  • OpenAIFunction calling, vision, breadth
  • GrokReal-time, witty, X-native
  • QwenOpen weights, multilingual, multimodal
  • KimiMillion-token context
  • MiniMaxVoice, video, generative media
  • DeepSeekDeep reasoning, MoE, open weights
Plus your own keys (BYOK) and any local LLM behind LM Studio or Ollama
03
Layer · You stay in control

You sign off on the moments that matter.

Three-tier approval on anything that publishes, sends, or commits. Every call recorded as a span you can replay and diff. Your data on your machine; your keys across providers. Basic runs locally with a cloud command center and BYOK overflow; Pro and Max add managed cloud execution.

ApprovalsAudit and costYour terms
You stay in control

Approvals

Three-tier sign-off on the moments that matter. Versioned. Reversible.

Anything that changes a load-bearing artifact — a published doc, a sent message, a code change, a customer-facing decision — moves through three-tier approval: author, peer review, sign-off. Every change is versioned and revertable. Linked plans and specs reconcile through a readiness gate before the system treats a document as build-ready. An exception queue captures everything that escalates, with full audit trail.

  • Three tiers: author, peer review, sign-off — gated before downstream trust
  • Every change versioned; revert restores any prior version
  • Exception queue captures escalations; every decision audit-logged
vs the rest

Most agent tools let the agent act and hope it didn't go wrong. PACKWOLF stops before the moments that matter.

What this means for youAgents that send the email, push the code, publish the post — only after you said yes. Reversible if you change your mind.

packwolf.app · Approvals
Live demo
Artifact · spec
Migrate auth flow to MCP token exchange
Three-tier approval · 2 of 3 complete · diff vs v1: +312 / −188
Tier 01 · Author
Drafter agent · v3 ready
Approved
Tier 02 · Peer review
Architect agent · 2 comments resolved
Approved
Tier 03 · Sign-off
Operator (CEO) · awaiting decision
Awaiting
Version trail · revertableAudit row written for every transition
v3Drafter rewrite, addressed scope feedbackCurrent
v2Architect added migration stepsRevert
v1Initial spec by DrafterRevert
You stay in control

Audit and cost

Every call recorded as a span. Replayable, diffable, queryable.

Audit is the system, not an after-the-fact dump. Every model call and tool call is a node in the trace; every prompt is diffable across runs; every approval and decision records its inputs and outputs verbatim. Cost events fire on every call and roll up into per-agent and per-tool budgets with threshold alerts. When something goes sideways, you replay the trace, diff the prompt, and find the call that actually changed state.

  • Per-span flame graph with input, output, metadata, and diagnosis tabs
  • Prompt versioning — every system prompt diffable across runs
  • Per-agent and per-tool cost events with threshold alerts on budgets
vs the rest

Most tools give you a chat log and call it observability. PACKWOLF gives you the trace, the cost, and the diff.

What this means for youWhen the agent did something you didn't expect, you can see what it actually did and why — not just what it said it did.

packwolf.app · Activity trace
Live demo
Trace · turn-7c84 · 9 spans · 14.3sReplayable · diffable · queryable
operator turn · weekly content cycle
memory.recall
context.assemble
model.generation
tool.web_search
tool.file_write
voice.drift_check
approvals.submit
audit.span_write
0s3.6s7.2s10.7s14.3s
Span · model.generation · Claude Sonnet 4.614:02:18.412 · 3.4s · prompt-cache: 87% hit
Input

System: weekly-content-cycle.md (cached). Memory block: 4 chunks. User: "What worked the last time we ran an April refresh?"

Output

Plan ready. Three pieces proposed, briefs drafted, voice review queued. Tool calls: web_search, file_write.

Cost rollup
Input tokens12.4K
Output tokens1.8K
Cache reads10.7K
Tool calls5
Turn cost$0.043
You stay in control

Your terms

Your data on your machine. Managed cloud when the work earns it. Your keys across providers.

Your workspace, memory, traces, and audit log stay on your machine by default. Nothing roams unless you opt in. The model layer is a deliberate choice you make per call — Claude, OpenAI, our in-house model, or a local LLM — and your keys go to your provider, never marked up. BYOK is the local app with account and license management. Basic runs the whole operation locally with a cloud command center, included usage, and BYOK overflow. Pro and Max add managed cloud execution when selected work should keep running without the desktop.

  • Workspace, memory, traces, and audit log stay local by default
  • BYOK stays local; Basic runs locally with command-center features and BYOK overflow; Pro+ adds managed cloud execution
  • Your keys across Claude, OpenAI, our in-house model, local LLMs, MCP servers — no markup
vs the rest

Other prosumer tools either lock you in or are cloud-only. PACKWOLF is genuinely both — data sovereignty plus deployment flexibility without pretending every tier should sync everything.

What this means for youYour customer list, your draft work, your competitive notes — all on the machine you trust. Pick the model for the job; don't pick the vendor for your data.

Full spec for your termsSource: docs/local_model_opt.md
Stays on your machine01 · Data
WorkspaceMemoryTracesAudit log
Switch per call02 · Model
ClaudeAnthropic · BYOK
OpenAIBYOK
In-houseIncluded on Basic / Pro / Max
Local LLMOllama / LM Studio · BYOK

Your keys. No markup. Pay your provider, not us.

Cloud coordinates. Desktop keeps control.03 · Continuity
Desktopyour laptop · local context
sync
Cloudstatus · approvals · Pro+ sync

Entry plans coordinate. Pro and Max add selected context continuity.

Stop driving. Start leading.

The pack runs the work. Your data stays on your machine. You pick the model and you sign off on the moments that matter.

Request beta accessSee how it compares

From $10/mo at launch · Managed cloud execution on Pro+

Built by ideius. Reach the founders any time.