Capability · Observability layer

Per-agent and per-tool spend. Threshold alerts. No surprises.

Every model call and every tool call fires a cost event. Events roll up into per-agent and per-tool budgets with configurable thresholds. Hit warning, hit hard cap, and the budget gate denies further calls until an operator clears it.

Every call
Tracked
Per-agent
Budget scope
Per-tool
Budget scope
Hard cap
Enforced
packwolf.app · Costs & budgets
Live screenshot
Costs & budgets screenshot
The cost dashboard. Spend by agent, spend by tool, daily/weekly/monthly trend, threshold alerts in queue.
What it actually does

The parts that make this work.

Cost events on every call.

Model calls fire token-estimate events with provider billing. Tool calls fire duration + handler-cost events. The trail is real-time, not end-of-month.

Per-agent budgets.

Set a monthly cap per agent. Alerts at warning threshold, denial at hard cap. Operators see the alert in the queue before the cap hits.

Per-tool budgets.

Some tools have real costs (web search APIs, MCP servers with their own pricing). Per-tool budgets cap exposure on the expensive ones without limiting the cheap ones.

Threshold alerts.

Warning, hard cap, overrun. The first two are advisory; the third (which only fires after a cap-bypass) flags audit.

Cost trail attaches to traces.

Every span in the activity trace has its cost event. Find the call that cost too much. Diff against a cheaper run to see what changed.

Token estimates are real, not approximate.

tiktoken for OpenAI, @anthropic-ai/tokenizer for Claude. Token counts are exact for those providers; locals use a tested heuristic with measured accuracy.

How it works

The path through costs & budgets.

  1. 01

    Cost event fires per call.

    Model call → token-estimate event with the provider's per-token rate. Tool call → duration + handler cost (zero for built-ins, configurable for MCP).

  2. 02

    Events roll into budgets.

    Per-agent and per-tool aggregations update in real time. Daily, weekly, monthly windows are queryable directly from the cost dashboard.

  3. 03

    Threshold alerts surface.

    When an aggregation crosses a warning threshold, an alert lands in the operator queue. Hard caps are not silent.

  4. 04

    Hard cap denies further calls.

    Once an agent hits its hard cap, the budget gate (point three of the safety gate) starts denying calls. The agent reports the budget block in the trace.

  5. 05

    Operator clears the cap.

    Operator can raise the cap, allow specific calls, or pause the agent until the next billing window. Each action records to audit.

  6. 06

    Cost trail merges with audit.

    Compliance reviews can ask 'what did we spend on agent X last quarter?' and get an answer from the audit + cost log together.

Common questions

Things engineers actually ask.

Token counts are exact for Claude (Anthropic tokenizer) and OpenAI (tiktoken). For local LLMs, we use a calibrated heuristic measured against the tokenizer where one is available. Provider billing reconciliation happens against the actual bill, the dashboard shows estimate vs. invoiced for the prior month.

Source: docs/lib/costs.ts

See it in your workspace.

Closed-beta cohorts are small. Tell us what you'd want this capability to handle for your team.

Request beta access