Skip to content

Interaction Capture

Wilson records every LLM interaction to SQLite — the full content, not just metadata. This goes beyond the lightweight trace system (which only records model, tokens, and duration) to capture the actual prompts, responses, and tool call sequences.

The capture layer intercepts at callLlm() — the single function every LLM call in Wilson passes through. This means agent loop calls, chat summarization, relevance selection, and categorization are all captured automatically.

User query → Agent loop → callLlm() ──→ LLM Provider
├──→ traceStore (metadata)
└──→ interactionStore (full content)

Three tables store interaction data:

One row per LLM call. Contains the full prompts and response.

CREATE TABLE llm_interactions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id TEXT NOT NULL, -- groups calls within one agent.run()
sequence_num INTEGER NOT NULL, -- order within the run (1, 2, 3...)
call_type TEXT NOT NULL, -- 'agent' | 'summarize' | 'relevance'
model TEXT NOT NULL,
provider TEXT NOT NULL,
system_prompt TEXT,
user_prompt TEXT NOT NULL,
response_content TEXT,
tool_calls_json TEXT, -- JSON array of {id, name, args}
tool_defs_json TEXT, -- JSON array of tool names available
input_tokens INTEGER,
output_tokens INTEGER,
total_tokens INTEGER,
duration_ms INTEGER,
status TEXT NOT NULL, -- 'ok' or 'error'
error TEXT,
created_at TEXT
);

One row per tool execution. Linked to the LLM call that requested it.

CREATE TABLE llm_tool_results (
id INTEGER PRIMARY KEY AUTOINCREMENT,
interaction_id INTEGER NOT NULL, -- FK to llm_interactions
tool_call_id TEXT NOT NULL,
tool_name TEXT NOT NULL,
tool_args_json TEXT,
tool_result TEXT,
duration_ms INTEGER,
created_at TEXT
);

User ratings and preference labels. One annotation per interaction (upserted on save).

CREATE TABLE interaction_annotations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
interaction_id INTEGER NOT NULL, -- FK to llm_interactions
rating INTEGER CHECK(rating BETWEEN 1 AND 5),
preference TEXT CHECK(preference IN ('chosen', 'rejected', 'neutral')),
pair_id TEXT, -- links chosen+rejected pair for DPO
tags TEXT, -- JSON array of string tags
notes TEXT,
annotated_at TEXT
);

Each interaction is tagged with a call_type that identifies why the LLM was called:

TypeSourceDescription
agentAgent loopPrimary reasoning calls — user queries and tool-use iterations
summarizeChat historyGenerating brief summaries of conversation answers
relevanceChat historySelecting which past messages are relevant to the current query
categorizeCategorize toolClassifying transactions into spending categories
standaloneDirect callLlm()Any call made outside the agent loop

All LLM calls within a single agent.run() invocation share a run_id (UUID). The sequence_num field tracks the order of calls within that run. This lets you reconstruct the full conversation flow:

Run: abc-123
Seq 1: Agent call → "What did I spend on groceries?"
Seq 2: Agent call (with tool results) → spending_summary result
Seq 3: Agent call (final answer) → "You spent $342 on groceries..."

Interactions are stored in the same SQLite database as your transactions: ~/.openaccountant/data.db. Use the Dashboard Training tab or query the database directly:

Terminal window
# Count total interactions
sqlite3 ~/.openaccountant/data.db "SELECT COUNT(*) FROM llm_interactions"
# Recent agent calls
sqlite3 ~/.openaccountant/data.db \
"SELECT id, model, call_type, total_tokens, created_at
FROM llm_interactions
WHERE call_type = 'agent'
ORDER BY id DESC LIMIT 10"
# Full conversation for a run
sqlite3 ~/.openaccountant/data.db \
"SELECT sequence_num, substr(user_prompt, 1, 80), substr(response_content, 1, 80)
FROM llm_interactions
WHERE run_id = 'your-run-id'
ORDER BY sequence_num"

The capture layer follows the same singleton pattern as the existing traceStore:

import { interactionStore } from './utils/interaction-store.js';
// Wired during startup in cli.ts and headless.ts
interactionStore.setDatabase(db);

The store uses prepared statements for performance and silently catches database errors so capture never interferes with normal operation.