Annotating Interactions

The Dashboard Training tab lets you browse every LLM interaction Wilson has made, inspect the full prompts and responses, rate quality, and create preference pairs for DPO training.

Opening the Training Tab

wilson           # Dashboard starts at http://localhost:3141
/dashboard       # Opens in your browser

Click the Training tab in the navigation bar.

Training Stats

The stats panel at the top shows:

Stat	Description
Total	Total LLM interactions recorded
Annotated	How many have been rated or labeled
SFT Ready	Interactions rated 4+ stars (eligible for SFT export)
DPO Pairs	Number of chosen/rejected preference pairs

Interaction Browser

The main table lists all recorded interactions with:

ID — Database row ID
Run — First 8 characters of the run UUID
Type — agent, summarize, relevance, etc.
Model — Which LLM was used
Tokens — Total token count
Rating — Star rating if annotated
Time — When the call was made

Filters

Use the dropdowns above the table to filter by:

Type — Show only agent calls, summarize calls, etc.
Annotated — Show only annotated or unannotated interactions
Rating — Filter by minimum star rating

Interaction Detail

Click any row to expand its full detail panel:

System Prompt — The complete system prompt sent to the model
User Prompt — The user’s query or iteration prompt with tool results
Response — The model’s full text response
Tool Calls — JSON of any tool calls the model requested
Tool Results — Output of each tool execution

This gives you complete visibility into what the model saw and what it produced.

Rating Interactions

Rate each interaction from 1 to 5 stars:

Rating	Meaning	Training Use
1 star	Bad — wrong answer, hallucination, off-topic	Excluded from SFT, candidate for DPO “rejected”
2 stars	Poor — partially correct but significant issues	Excluded from SFT
3 stars	Acceptable — correct but could be better	Excluded from SFT by default
4 stars	Good — correct and well-structured	Included in SFT export
5 stars	Excellent — ideal response to learn from	Included in SFT export

Click the star icons in the annotation panel to set the rating. The default SFT export threshold is 4 stars — only high-quality interactions become training data.

Preference Pairs (DPO)

Direct Preference Optimization (DPO) training requires pairs: a chosen response and a rejected response for the same prompt.

Creating a Pair

Find two interactions with the same or similar user prompt but different responses
Open the better response, set Preference to Chosen
Enter a Pair ID (any string, e.g., groceries-1)
Open the worse response, set Preference to Rejected
Enter the same Pair ID

The pair is now linked. When you export DPO training data, both interactions are combined into a single training example with the prompt, chosen response, and rejected response.

Pair ID Tips

Use descriptive pair IDs: categorize-groceries-1, spending-review-2
Each pair needs exactly one chosen and one rejected interaction with the same pair ID
You can create pairs across different models — useful for comparing a cloud model’s response against a local model’s response

Saving Annotations

Click Save to persist the annotation. Annotations are stored in the interaction_annotations table and survive across sessions.

Saving is an upsert — if an annotation already exists for the interaction, it’s replaced with the new values.

API Endpoints

You can also annotate programmatically via the dashboard API:

List Interactions

# All interactions (paginated)
curl http://localhost:3141/api/interactions?limit=50&offset=0

# Filter by call type
curl http://localhost:3141/api/interactions?callType=agent

# Only unannotated
curl http://localhost:3141/api/interactions?annotated=false

Get Interaction Detail

curl http://localhost:3141/api/interactions/42

Returns the interaction with its tool results and annotations.

Get All Interactions in a Run

curl http://localhost:3141/api/runs/abc-123-uuid

Save Annotation

curl -X POST http://localhost:3141/api/interactions/42/annotate \
  -H "Content-Type: application/json" \
  -d '{"rating": 5, "preference": "chosen", "pairId": "pair-1", "notes": "Great categorization"}'

Annotation Stats

curl http://localhost:3141/api/annotations/stats

Returns total interactions, annotated count, rating distribution, DPO pair count, and SFT-ready count.