Audit & Lineage
One learner action becomes a soft_calls_entry row, stitched back through group_id into the session timeline and the per-group runs view.
Glow records every mutation that crosses the wire as an append-only
ledger row, then stitches those rows back through group_id into
the Session timeline and the Group detail tree.
This is the write-side discipline that powers post-hoc
“what actually happened” inspection, FERPA review, and replay of
failed invocations.
Audit is not a public endpoint — it is a framework wrapper that every mutation route runs through. Drilling the trail is done via the existing system endpoints documented on Activity, Session, and Group.
What gets audited
Every artifact mutation routed through
run_artifact_operation_with_audit participates in audit. The wrapper
does three things in order, regardless of whether the operation has a
registered tool:
- Emits a
{artifact}.{operation}.startedevent onto the internal socket bus (forwarded to clients byws/output/). - Invokes the runner — the actual mutation closure.
- Emits
{artifact}.{operation}.completedon success, or{artifact}.{operation}.failedon error, with the samecall_idas the.startedevent.
When the tool graph resolves a tool for the (artifact, operation)
pair, the wrapper additionally writes a tool-call audit row and may
append a soft_calls_entry ledger row from inside the runner.
Reads are not audited. Cache hits, websocket replays, and the SSE relay path do not produce ledger rows.
The soft_calls_entry shape
Each soft_calls_entry carries the wire-level call_id plus an artifact/operation pair; its status moves pending → accepted | rejected.
The ledger lives in public.soft_calls_entry and is insert-only —
state transitions are recorded as new rows, never updates. A companion
materialized view soft_calls_mv collapses to the latest row per
call_id for fast lookups.
| Column | Type | Notes |
|---|---|---|
id | uuid | Primary key, defaulted to uuidv7() so rows sort by insertion time. |
call_id | uuid | The wire-level identity carried on .started / .completed / .failed. |
artifact | text | Mirrors permissions_resource.artifact (e.g. persona, agent). |
operation | text | The operation key (e.g. create, update, delete). |
status | text | One of pending, accepted, rejected (enforced by check constraint). |
artifact_id | uuid | The row this call mutates / proposes to mutate. |
patch | jsonb | Proposed change payload (nullable). |
active | boolean | false hides the row from soft_calls_mv without deleting it. |
mcp | boolean | true when the call originated from an MCP tool surface. |
generated | boolean | true when written by the framework, false for hand-seeded rows. |
created_at | timestamptz | Defaulted to now(). |
Vocabulary mirrors permissions_resource: the (artifact, operation)
pair is the canonical identifier across the tool graph, the audit
ledger, and the permissions table.
Append-only state transitions
A soft tool call lifecycle looks like:
INSERT … status='pending' ← LLM proposes a change
INSERT … status='accepted' ← user confirms (or the runner auto-accepts)or:
INSERT … status='pending'
INSERT … status='rejected' ← user declines, no further writessoft_calls_mv returns only the latest row per call_id, so
consumers see one logical state. Historic transitions are recoverable
by reading the base table directly.
group_id stitches audit, session, and group
A single group_id resolves three ways — a timeline event, the group-detail tree, and the call_id family in the ledger.
Every audit row, every run, and every timeline event carries a
group_id. That single id is the join key across the three views:
session_id
│
▼
┌─────────────────────┐
│ Session (timeline) │
└──────────┬──────────┘
│ groups[]
▼
group_id ──────────────┐
│ │
┌──────────────┴───────────┐ │
▼ ▼ ▼
┌──────────┐ ┌──────────────────────┐
│ Group │ │ soft_calls_entry │
│ (runs + │ │ (one row per │
│ messages)│ │ mutation call_id) │
└──────────┘ └──────────────────────┘The framework guarantees stitching by:
- Minting
group_idearly — routes that create-the-group setmint_group_id_if_missing=True, which consults the active-group Redis cache (so/context,/group,/searchfiring in parallel converge on the same id viaSET NX EX) and materializes thegroups_entryrow before the first event fires. Without this the.startedevent would carrygroup_id=Noneand SSE would drop it. - Reusing
call_idacross.started/.completed/.failed— one wire-level id per invocation, threaded through both the event payloads and the audit row’s primary key viapre_minted_call_id. - Carrying
operation_key— a stable per-step identifier (see Idempotency + replay safety below).
Reading the audit trail
The audit trail is not exposed as its own endpoint — the existing system endpoints already surface the relevant slices. The canonical drill chain is:
| Step | Surface | Endpoint | Carries |
|---|---|---|---|
| 1. Engagement | Activity | POST /system/activity | sessions_count, per-profile breakdown |
| 2. Session pick | Activity | POST /system/sessions | session_id per row |
| 3. Session detail | Session | POST /system/session | groups[] + timeline[] |
| 4. Group detail | Group | POST /system/group (include_detail: true) | runs[] + messages[] with calls[] |
| 5. Receipt | filesystem | call JSON receipt (call_upload_id) | full started / completed / failed event log |
Each step narrows the time window and the entity set. The Group
endpoint’s lean mode (include_detail: false) is the cheap path used
by the audit-linking middleware — it returns identity only, enough to
hyperlink a group_id back to its session without paying for the full
runs + messages tree.
The Group page already documents the runs and messages shape; see Group → Understanding the detail response.
Idempotency + replay safety
Two ids work together to make audited calls safe to re-fire:
call_id— minted per invocation. Identifies one specific attempt. Different attempts of the same logical operation get differentcall_ids.operation_key— stable across attempts of the same logical step. Carried on the wire alongsidecall_id, and persisted on the audit row.
calls_entry.operation_key is the canonical replay key. When a
runner fails partway (network drop, model timeout, a downstream 500),
re-firing the same operation_key lets the framework detect the
duplicate, short-circuit any already-written soft-call rows, and
resume from the failure point without producing a second ledger
entry. The test picker fan-out uses this mechanism to safely retry
individual calls inside a larger replay without re-running the calls
that already succeeded.
idempotency_key is the wider deduplication contract — passed in by
callers that need exactly-once semantics across long retry windows.
The framework’s earlier brittle assumption that idempotency_key == artifact_id is gone; the soft_calls ledger is now the source of
truth.
What audit does NOT cover
The audit wrapper is the canonical entry point, but it is not the only path that emits events. Be precise about its limits:
- Direct
internal_sio.emit(...)calls — handlers that bypassrun_artifact_operation_with_auditand emit events themselves produce wire traffic without a ledger row. The catch-all forwarder inws/output.pystill relays them to clients, but nosoft_calls_entryis written. - Read endpoints —
POST /<artifact>/get,/<artifact>/search, and the various*_downloadroutes never participate in audit. There is no “who read this” trail. - Cache hits — when
bypass_cache=Falseand the resolver returns a cached result, no runner runs, so no events fire and no ledger row is written. The original ledger row from the call that populated the cache is the only audit trail. - MV reads — querying
soft_calls_mv(e.g. through the search endpoint) is itself a read, not an audited write. - Failed pre-flight — when
run_artifact_operation_with_auditraises inresolve_common_context(profile not found, permission denied), the wrapper exits before any event fires. The failure is observable in application logs but not in the ledger.
If a write needs auditing, the answer is always to route it through
run_artifact_operation_with_audit — adding ad-hoc ledger inserts
elsewhere fragments the source of truth.
Related
- Activity — aggregate engagement counts, the entry point for the drill chain.
- Session — per-session timeline and groups list.
- Group — per-group runs, messages, and
calls[]references. - API Reference — full system endpoint schemas.