Trace, not just log.
Each agent run is a trace: a coherent set of observations grouped by runId / conversationId / sessionId. You see the chain of decisions as one timeline.
Every agent turn writes events to the run_events table, model:generation observations, tool:* observations, errors, denials. Append-only, ordered, replayable.
Tool calls also fire audit events: who, what, input, output, duration, approval state. The audit log is the source of truth for compliance.
Each model call and each tool call fires a cost event with token estimates and provider billing. Rolls up per-agent and per-tool.
The browser groups observations by trace ID into a flame graph timeline. Failed agent behavior becomes inspectable, not buried.
Click a span: input, output, metadata, diagnosis. Find the exact moment the model emitted a malformed tool call, or the gate denied a request, or a handler timed out.
Two trace IDs, side by side. The prompt diff shows what changed, system prompt, tool schemas, memory block, conversation history. Find the regression.