Exporting Training Data

Wilson exports training data in HuggingFace TRL-compatible JSONL formats. Two export modes are available: SFT (Supervised Fine-Tuning) for high-quality examples, and DPO (Direct Preference Optimization) for preference pairs.

Export from the Dashboard

Open the Training tab at http://localhost:3141 and click:

Export SFT JSONL — Downloads wilson-sft.jsonl
Export DPO JSONL — Downloads wilson-dpo.jsonl

Export via API

SFT Export

# Default: rating >= 4, agent calls only
curl -o wilson-sft.jsonl http://localhost:3141/api/export/training/sft

# Custom filters
curl -o wilson-sft.jsonl \
  "http://localhost:3141/api/export/training/sft?minRating=3&callTypes=agent,categorize&model=gpt-5.2"

Parameters:

Param	Default	Description
`minRating`	`4`	Minimum star rating to include
`callTypes`	`agent`	Comma-separated call types to include
`model`	(all)	Filter by specific model

DPO Export

curl -o wilson-dpo.jsonl http://localhost:3141/api/export/training/dpo

No parameters — exports all valid preference pairs (interactions with matching pair_id where one is chosen and one is rejected).

Training Stats

curl http://localhost:3141/api/export/training/stats

{
  "totalInteractions": 847,
  "annotated": 124,
  "sftReady": 89,
  "dpoPairs": 15
}

SFT JSONL Format

Each line is a complete conversation in OpenAI messages format:

{
  "messages": [
    {"role": "system", "content": "You are Wilson, an AI bookkeeper..."},
    {"role": "user", "content": "What did I spend on groceries last month?"},
    {"role": "assistant", "content": "", "tool_calls": [
      {"id": "tc_1", "type": "function", "function": {"name": "spending_summary", "arguments": "{\"category\":\"groceries\"}"}}
    ]},
    {"role": "tool", "tool_call_id": "tc_1", "content": "{\"total\": 342.50, \"count\": 12}"},
    {"role": "assistant", "content": "You spent $342.50 on groceries last month across 12 transactions."}
  ]
}

What’s Included

Full multi-turn conversations (all interactions in a run)
System prompt from the first call in the run
Tool calls with function name and arguments
Tool results linked by tool_call_id
Only runs containing at least one interaction rated >= minRating

Conversation Reconstruction

The exporter groups interactions by run_id and orders by sequence_num to reconstruct the full conversation. A single run with 3 LLM calls becomes one JSONL line with the complete multi-turn sequence.

DPO JSONL Format

Each line contains a prompt with chosen and rejected responses:

{
  "prompt": "Categorize my Amazon transactions",
  "chosen": [
    {"role": "system", "content": "You are Wilson..."},
    {"role": "assistant", "content": "I've categorized your 8 Amazon transactions..."}
  ],
  "rejected": [
    {"role": "system", "content": "You are Wilson..."},
    {"role": "assistant", "content": "Here are your Amazon purchases..."}
  ]
}

What Makes a Valid Pair

A DPO pair requires:

Two interactions linked by the same pair_id
One marked with preference: "chosen"
One marked with preference: "rejected"

If a pair_id has only a chosen or only a rejected interaction, it’s skipped.

Verifying Export Quality

Check JSONL Validity

# Each line should be valid JSON
cat wilson-sft.jsonl | python3 -c "
import sys, json
for i, line in enumerate(sys.stdin, 1):
    try:
        data = json.loads(line)
        msgs = data.get('messages', [])
        print(f'Line {i}: {len(msgs)} messages, roles: {[m[\"role\"] for m in msgs]}')
    except json.JSONDecodeError as e:
        print(f'Line {i}: INVALID JSON - {e}')
"

Inspect with HuggingFace Datasets

from datasets import load_dataset

ds = load_dataset("json", data_files="wilson-sft.jsonl", split="train")
print(f"Examples: {len(ds)}")
print(f"First example roles: {[m['role'] for m in ds[0]['messages']]}")

Data Volume Guidelines

Training Method	Minimum Examples	Recommended	Notes
SFT	50	200-500	More is better, but quality matters more than quantity
DPO	20 pairs	50-100 pairs	Pairs should cover diverse prompt types

For financial bookkeeping tasks, 200-300 high-quality SFT examples typically produce noticeable improvements in a 7B parameter model.