Skip to content

Exporting Training Data

Wilson exports training data in HuggingFace TRL-compatible JSONL formats. Two export modes are available: SFT (Supervised Fine-Tuning) for high-quality examples, and DPO (Direct Preference Optimization) for preference pairs.

Open the Training tab at http://localhost:3141 and click:

  • Export SFT JSONL — Downloads wilson-sft.jsonl
  • Export DPO JSONL — Downloads wilson-dpo.jsonl
Terminal window
# Default: rating >= 4, agent calls only
curl -o wilson-sft.jsonl http://localhost:3141/api/export/training/sft
# Custom filters
curl -o wilson-sft.jsonl \
"http://localhost:3141/api/export/training/sft?minRating=3&callTypes=agent,categorize&model=gpt-5.2"

Parameters:

ParamDefaultDescription
minRating4Minimum star rating to include
callTypesagentComma-separated call types to include
model(all)Filter by specific model
Terminal window
curl -o wilson-dpo.jsonl http://localhost:3141/api/export/training/dpo

No parameters — exports all valid preference pairs (interactions with matching pair_id where one is chosen and one is rejected).

Terminal window
curl http://localhost:3141/api/export/training/stats
{
"totalInteractions": 847,
"annotated": 124,
"sftReady": 89,
"dpoPairs": 15
}

Each line is a complete conversation in OpenAI messages format:

{
"messages": [
{"role": "system", "content": "You are Wilson, an AI bookkeeper..."},
{"role": "user", "content": "What did I spend on groceries last month?"},
{"role": "assistant", "content": "", "tool_calls": [
{"id": "tc_1", "type": "function", "function": {"name": "spending_summary", "arguments": "{\"category\":\"groceries\"}"}}
]},
{"role": "tool", "tool_call_id": "tc_1", "content": "{\"total\": 342.50, \"count\": 12}"},
{"role": "assistant", "content": "You spent $342.50 on groceries last month across 12 transactions."}
]
}
  • Full multi-turn conversations (all interactions in a run)
  • System prompt from the first call in the run
  • Tool calls with function name and arguments
  • Tool results linked by tool_call_id
  • Only runs containing at least one interaction rated >= minRating

The exporter groups interactions by run_id and orders by sequence_num to reconstruct the full conversation. A single run with 3 LLM calls becomes one JSONL line with the complete multi-turn sequence.

Each line contains a prompt with chosen and rejected responses:

{
"prompt": "Categorize my Amazon transactions",
"chosen": [
{"role": "system", "content": "You are Wilson..."},
{"role": "assistant", "content": "I've categorized your 8 Amazon transactions..."}
],
"rejected": [
{"role": "system", "content": "You are Wilson..."},
{"role": "assistant", "content": "Here are your Amazon purchases..."}
]
}

A DPO pair requires:

  1. Two interactions linked by the same pair_id
  2. One marked with preference: "chosen"
  3. One marked with preference: "rejected"

If a pair_id has only a chosen or only a rejected interaction, it’s skipped.

Terminal window
# Each line should be valid JSON
cat wilson-sft.jsonl | python3 -c "
import sys, json
for i, line in enumerate(sys.stdin, 1):
try:
data = json.loads(line)
msgs = data.get('messages', [])
print(f'Line {i}: {len(msgs)} messages, roles: {[m[\"role\"] for m in msgs]}')
except json.JSONDecodeError as e:
print(f'Line {i}: INVALID JSON - {e}')
"
from datasets import load_dataset
ds = load_dataset("json", data_files="wilson-sft.jsonl", split="train")
print(f"Examples: {len(ds)}")
print(f"First example roles: {[m['role'] for m in ds[0]['messages']]}")
Training MethodMinimum ExamplesRecommendedNotes
SFT50200-500More is better, but quality matters more than quantity
DPO20 pairs50-100 pairsPairs should cover diverse prompt types

For financial bookkeeping tasks, 200-300 high-quality SFT examples typically produce noticeable improvements in a 7B parameter model.