Exporting Training Data
Wilson exports training data in HuggingFace TRL-compatible JSONL formats. Two export modes are available: SFT (Supervised Fine-Tuning) for high-quality examples, and DPO (Direct Preference Optimization) for preference pairs.
Export from the Dashboard
Section titled “Export from the Dashboard”Open the Training tab at http://localhost:3141 and click:
- Export SFT JSONL — Downloads
wilson-sft.jsonl - Export DPO JSONL — Downloads
wilson-dpo.jsonl
Export via API
Section titled “Export via API”SFT Export
Section titled “SFT Export”# Default: rating >= 4, agent calls onlycurl -o wilson-sft.jsonl http://localhost:3141/api/export/training/sft
# Custom filterscurl -o wilson-sft.jsonl \ "http://localhost:3141/api/export/training/sft?minRating=3&callTypes=agent,categorize&model=gpt-5.2"Parameters:
| Param | Default | Description |
|---|---|---|
minRating | 4 | Minimum star rating to include |
callTypes | agent | Comma-separated call types to include |
model | (all) | Filter by specific model |
DPO Export
Section titled “DPO Export”curl -o wilson-dpo.jsonl http://localhost:3141/api/export/training/dpoNo parameters — exports all valid preference pairs (interactions with matching pair_id where one is chosen and one is rejected).
Training Stats
Section titled “Training Stats”curl http://localhost:3141/api/export/training/stats{ "totalInteractions": 847, "annotated": 124, "sftReady": 89, "dpoPairs": 15}SFT JSONL Format
Section titled “SFT JSONL Format”Each line is a complete conversation in OpenAI messages format:
{ "messages": [ {"role": "system", "content": "You are Wilson, an AI bookkeeper..."}, {"role": "user", "content": "What did I spend on groceries last month?"}, {"role": "assistant", "content": "", "tool_calls": [ {"id": "tc_1", "type": "function", "function": {"name": "spending_summary", "arguments": "{\"category\":\"groceries\"}"}} ]}, {"role": "tool", "tool_call_id": "tc_1", "content": "{\"total\": 342.50, \"count\": 12}"}, {"role": "assistant", "content": "You spent $342.50 on groceries last month across 12 transactions."} ]}What’s Included
Section titled “What’s Included”- Full multi-turn conversations (all interactions in a run)
- System prompt from the first call in the run
- Tool calls with function name and arguments
- Tool results linked by
tool_call_id - Only runs containing at least one interaction rated >=
minRating
Conversation Reconstruction
Section titled “Conversation Reconstruction”The exporter groups interactions by run_id and orders by sequence_num to reconstruct the full conversation. A single run with 3 LLM calls becomes one JSONL line with the complete multi-turn sequence.
DPO JSONL Format
Section titled “DPO JSONL Format”Each line contains a prompt with chosen and rejected responses:
{ "prompt": "Categorize my Amazon transactions", "chosen": [ {"role": "system", "content": "You are Wilson..."}, {"role": "assistant", "content": "I've categorized your 8 Amazon transactions..."} ], "rejected": [ {"role": "system", "content": "You are Wilson..."}, {"role": "assistant", "content": "Here are your Amazon purchases..."} ]}What Makes a Valid Pair
Section titled “What Makes a Valid Pair”A DPO pair requires:
- Two interactions linked by the same
pair_id - One marked with
preference: "chosen" - One marked with
preference: "rejected"
If a pair_id has only a chosen or only a rejected interaction, it’s skipped.
Verifying Export Quality
Section titled “Verifying Export Quality”Check JSONL Validity
Section titled “Check JSONL Validity”# Each line should be valid JSONcat wilson-sft.jsonl | python3 -c "import sys, jsonfor i, line in enumerate(sys.stdin, 1): try: data = json.loads(line) msgs = data.get('messages', []) print(f'Line {i}: {len(msgs)} messages, roles: {[m[\"role\"] for m in msgs]}') except json.JSONDecodeError as e: print(f'Line {i}: INVALID JSON - {e}')"Inspect with HuggingFace Datasets
Section titled “Inspect with HuggingFace Datasets”from datasets import load_dataset
ds = load_dataset("json", data_files="wilson-sft.jsonl", split="train")print(f"Examples: {len(ds)}")print(f"First example roles: {[m['role'] for m in ds[0]['messages']]}")Data Volume Guidelines
Section titled “Data Volume Guidelines”| Training Method | Minimum Examples | Recommended | Notes |
|---|---|---|---|
| SFT | 50 | 200-500 | More is better, but quality matters more than quantity |
| DPO | 20 pairs | 50-100 pairs | Pairs should cover diverse prompt types |
For financial bookkeeping tasks, 200-300 high-quality SFT examples typically produce noticeable improvements in a 7B parameter model.