Training Data Pipeline
Wilson captures every LLM interaction — prompts, responses, tool calls, and results — to a local SQLite database. You can annotate interactions for quality, then export training data in HuggingFace TRL-compatible formats for fine-tuning your own models.
Why Fine-Tune
Section titled “Why Fine-Tune”Cloud LLMs work out of the box. But if you want to:
- Run fully offline with a local model that knows your financial vocabulary
- Reduce latency and cost by using a smaller model trained on your usage patterns
- Improve accuracy for your specific categorization and spending analysis tasks
- Keep everything private — train on your data without sending it to a third party
Then the training pipeline lets you close the loop: use Wilson with a cloud model, annotate the best interactions, export training data, fine-tune a local model, and switch to it.
Pipeline Overview
Section titled “Pipeline Overview”1. Use Wilson normally (interactions auto-captured)2. Review interactions in the Dashboard Training tab3. Annotate: rate quality (1-5 stars), mark chosen/rejected pairs4. Export: SFT JSONL (supervised) or DPO JSONL (preference)5. Fine-tune: LoRA adapter via HuggingFace6. Deploy: Convert to GGUF, load into OllamaWhat Gets Captured
Section titled “What Gets Captured”Every call to Wilson’s LLM facade is recorded automatically:
| Field | Description |
|---|---|
run_id | Groups all LLM calls within a single agent run |
sequence_num | Order of calls within the run (1, 2, 3…) |
call_type | agent, summarize, relevance, or categorize |
system_prompt | Full system prompt sent to the model |
user_prompt | The user’s query or iteration prompt |
response_content | The model’s text response |
tool_calls_json | JSON array of tool calls (name, args, id) |
tool_defs_json | Names of all tools available to the model |
| Token counts | Input, output, and total tokens |
duration_ms | Wall-clock time for the API call |
Tool execution results are captured separately and linked to the LLM call that triggered them.
No Configuration Required
Section titled “No Configuration Required”Interaction capture is always on. There’s nothing to enable — every LLM call is recorded to ~/.openaccountant/data.db alongside your transaction data.
The capture is lightweight: it uses the same SQLite database and prepared-statement pattern as the existing trace system, adding negligible overhead to each LLM call.
Pages in This Section
Section titled “Pages in This Section”- Interaction Capture — Schema, storage, and what’s recorded
- Annotating Interactions — Rating, preference pairing, and the Dashboard Training tab
- Exporting Training Data — SFT and DPO JSONL formats for HuggingFace TRL
- Fine-Tuning with LoRA — End-to-end guide to training and deploying a custom model
- Deploying Trained Models — Deploy your adapter to Ollama or Transformers.js
- End-to-End Tutorial — Complete walkthrough from captured interactions to deployed model