Skip to content

End-to-End Training Tutorial

This tutorial walks through every step of training and deploying a custom model, start to finish. Each step links to detailed documentation if you want to go deeper.

Before starting, make sure you have:

  • Wilson installed and working (wilson --status shows transactions)
  • 50+ captured interactions (use Wilson with a cloud model for a week or two)
  • Ollama installed (brew install ollama)
  • Python 3.10+ with pip or uv
  • A HuggingFace account with a write token

Open the Dashboard and check how many interactions you have:

Terminal window
curl http://localhost:3141/api/export/training/stats

You should see at least 50 interactions. More is better — 200+ produces noticeably stronger models.

FieldMinimumRecommended
Total interactions50200+
Annotated (4-5 stars)30100+
DPO pairs1050+

If you don’t have enough data, keep using Wilson with a cloud model and come back later.

Open the Dashboard Training tab:

/dashboard

Navigate to the Training section. For each interaction:

  1. Read the conversation — was the response helpful and accurate?
  2. Rate it 1-5 stars — 4-5 stars means “good enough to teach a model”
  3. Create DPO pairs — for interactions where the model could have done better, mark the cloud response as “chosen” and the weaker response as “rejected”

Aim for at least 30 interactions rated 4-5 stars and 10 DPO pairs before moving on.

Export from the Dashboard or via the API:

Terminal window
# SFT format (supervised fine-tuning)
curl -o wilson-sft.jsonl http://localhost:3141/api/export/training/sft
# DPO format (preference pairs)
curl -o wilson-dpo.jsonl http://localhost:3141/api/export/training/dpo

See Exporting Training Data for format details and filtering options.

Inspect your data before training:

import json
# Check SFT data
with open("wilson-sft.jsonl") as f:
lines = [json.loads(line) for line in f if line.strip()]
print(f"SFT examples: {len(lines)}")
print(f"First example messages: {len(lines[0]['messages'])}")
print(f"Roles: {[m['role'] for m in lines[0]['messages']]}")
# Check DPO data
with open("wilson-dpo.jsonl") as f:
pairs = [json.loads(line) for line in f if line.strip()]
print(f"\nDPO pairs: {len(pairs)}")
if pairs:
print(f"Keys: {list(pairs[0].keys())}")

Expected output: SFT lines have messages arrays with system, user, and assistant roles. DPO lines have prompt, chosen, and rejected fields.

from datasets import load_dataset
from huggingface_hub import login
login(token="hf_your_token")
# Upload SFT dataset
sft_ds = load_dataset("json", data_files="wilson-sft.jsonl", split="train")
sft_ds.push_to_hub("your-username/wilson-bookkeeper-sft", private=True)
# Upload DPO dataset (if you have pairs)
dpo_ds = load_dataset("json", data_files="wilson-dpo.jsonl", split="train")
dpo_ds.push_to_hub("your-username/wilson-bookkeeper-dpo", private=True)

Install dependencies:

Terminal window
pip install transformers trl peft datasets accelerate bitsandbytes

Run SFT training:

from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig
dataset = load_dataset("your-username/wilson-bookkeeper-sft", split="train")
model_name = "Qwen/Qwen2.5-7B-Instruct"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16",
)
model = AutoModelForCausalLM.from_pretrained(
model_name, quantization_config=bnb_config, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
lora_config = LoraConfig(
r=16, lora_alpha=32,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05, task_type="CAUSAL_LM",
)
training_config = SFTConfig(
output_dir="./wilson-sft-adapter",
num_train_epochs=3,
per_device_train_batch_size=4,
learning_rate=2e-4,
warmup_steps=10,
logging_steps=10,
save_strategy="epoch",
bf16=True,
)
trainer = SFTTrainer(
model=model, train_dataset=dataset,
peft_config=lora_config, args=training_config,
tokenizer=tokenizer,
)
trainer.train()
trainer.save_model("./wilson-sft-adapter")

Training time depends on hardware and dataset size. Expect 10-30 minutes on a GPU, 1-2 hours on CPU.

See Fine-Tuning with LoRA for DPO training, parameter tuning, and tips on avoiding overfitting.

Terminal window
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
python convert_lora_to_gguf.py \
--base Qwen/Qwen2.5-7B-Instruct \
--lora ../wilson-sft-adapter \
--outfile wilson-adapter.gguf

Create a Modelfile:

FROM Qwen/Qwen2.5-7B-Instruct
ADAPTER ./wilson-adapter.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|im_end|>"
SYSTEM """You are Wilson, an AI bookkeeper from Open Accountant. You help users manage their personal finances with precision and care. You have access to tools for importing transactions, categorizing spending, analyzing budgets, and generating reports."""

Build the model:

Terminal window
ollama create wilson-7b -f Modelfile
/model ollama:wilson-7b

Or set as default:

Terminal window
echo 'DEFAULT_MODEL=ollama:wilson-7b' >> .env

Test your model with representative queries:

Terminal window
wilson --run "Categorize my last 10 transactions"
wilson --run "What did I spend on dining out this month?"
wilson --run "Am I on track with my grocery budget?"

Compare responses against a cloud model. If quality is noticeably worse in specific areas, those interactions become your next batch of DPO training pairs.

Symptom: Model outputs generic or repetitive responses. Fix: Collect more interactions. Use Wilson with a cloud model for another week, annotate interactions, and re-train. Aim for 200+ SFT examples.

Symptom: Model repeats training data verbatim or performs well on familiar queries but poorly on new ones. Fix: Reduce epochs (try 2 instead of 3), lower learning rate to 1e-4, use r=8 instead of r=16, or hold out 10-20% of data for validation.

Symptom: Model responds with plain text instead of calling tools, or calls the wrong tool. Fix: Ensure your SFT data includes interactions with tool calls. The model needs to see examples of correct tool usage in the training data. Consider creating DPO pairs specifically for tool-calling interactions.

Symptom: convert_lora_to_gguf.py crashes or produces an invalid file. Fix: Make sure you’re using the latest llama.cpp. Some model architectures require specific conversion scripts — check the llama.cpp repository for model-specific instructions.

Symptom: Responses take 10+ seconds. Fix: Use a smaller base model (3B instead of 7B). On Apple Silicon, make sure Ollama is using Metal acceleration (ollama ps shows GPU layers). Check that you’re not running other memory-intensive apps.

After deploying v1, the iterative improvement cycle begins:

  1. Use your trained model daily
  2. Wilson captures all interactions automatically
  3. Annotate — rate quality, create DPO pairs comparing trained vs cloud
  4. Export new data, combine with original dataset
  5. Train v2
  6. Deploy, compare, repeat

Each cycle makes your model better at your specific financial tasks. See Deploying Trained Models for more on the improvement loop.