End-to-End Training Tutorial

This tutorial walks through every step of training and deploying a custom model, start to finish. Each step links to detailed documentation if you want to go deeper.

Prerequisites

Before starting, make sure you have:

Wilson installed and working (wilson --status shows transactions)
50+ captured interactions (use Wilson with a cloud model for a week or two)
Ollama installed (brew install ollama)
Python 3.10+ with pip or uv
A HuggingFace account with a write token

Step 1: Check Your Data

Open the Dashboard and check how many interactions you have:

curl http://localhost:3141/api/export/training/stats

You should see at least 50 interactions. More is better — 200+ produces noticeably stronger models.

Field	Minimum	Recommended
Total interactions	50	200+
Annotated (4-5 stars)	30	100+
DPO pairs	10	50+

If you don’t have enough data, keep using Wilson with a cloud model and come back later.

Step 2: Annotate Interactions

Open the Dashboard Training tab:

/dashboard

Navigate to the Training section. For each interaction:

Read the conversation — was the response helpful and accurate?
Rate it 1-5 stars — 4-5 stars means “good enough to teach a model”
Create DPO pairs — for interactions where the model could have done better, mark the cloud response as “chosen” and the weaker response as “rejected”

Aim for at least 30 interactions rated 4-5 stars and 10 DPO pairs before moving on.

Step 3: Export Training Data

Export from the Dashboard or via the API:

# SFT format (supervised fine-tuning)
curl -o wilson-sft.jsonl http://localhost:3141/api/export/training/sft

# DPO format (preference pairs)
curl -o wilson-dpo.jsonl http://localhost:3141/api/export/training/dpo

See Exporting Training Data for format details and filtering options.

Step 4: Validate the Export

Inspect your data before training:

import json

# Check SFT data
with open("wilson-sft.jsonl") as f:
    lines = [json.loads(line) for line in f if line.strip()]
    print(f"SFT examples: {len(lines)}")
    print(f"First example messages: {len(lines[0]['messages'])}")
    print(f"Roles: {[m['role'] for m in lines[0]['messages']]}")

# Check DPO data
with open("wilson-dpo.jsonl") as f:
    pairs = [json.loads(line) for line in f if line.strip()]
    print(f"\nDPO pairs: {len(pairs)}")
    if pairs:
        print(f"Keys: {list(pairs[0].keys())}")

Expected output: SFT lines have messages arrays with system, user, and assistant roles. DPO lines have prompt, chosen, and rejected fields.

Step 5: Upload to HuggingFace

from datasets import load_dataset
from huggingface_hub import login

login(token="hf_your_token")

# Upload SFT dataset
sft_ds = load_dataset("json", data_files="wilson-sft.jsonl", split="train")
sft_ds.push_to_hub("your-username/wilson-bookkeeper-sft", private=True)

# Upload DPO dataset (if you have pairs)
dpo_ds = load_dataset("json", data_files="wilson-dpo.jsonl", split="train")
dpo_ds.push_to_hub("your-username/wilson-bookkeeper-dpo", private=True)

Step 6: Train

Install dependencies:

pip install transformers trl peft datasets accelerate bitsandbytes

Run SFT training:

from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig

dataset = load_dataset("your-username/wilson-bookkeeper-sft", split="train")

model_name = "Qwen/Qwen2.5-7B-Instruct"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
)

model = AutoModelForCausalLM.from_pretrained(
    model_name, quantization_config=bnb_config, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

lora_config = LoraConfig(
    r=16, lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05, task_type="CAUSAL_LM",
)

training_config = SFTConfig(
    output_dir="./wilson-sft-adapter",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    learning_rate=2e-4,
    warmup_steps=10,
    logging_steps=10,
    save_strategy="epoch",
    bf16=True,
)

trainer = SFTTrainer(
    model=model, train_dataset=dataset,
    peft_config=lora_config, args=training_config,
    tokenizer=tokenizer,
)
trainer.train()
trainer.save_model("./wilson-sft-adapter")

Training time depends on hardware and dataset size. Expect 10-30 minutes on a GPU, 1-2 hours on CPU.

See Fine-Tuning with LoRA for DPO training, parameter tuning, and tips on avoiding overfitting.

Step 7: Convert to GGUF

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

python convert_lora_to_gguf.py \
  --base Qwen/Qwen2.5-7B-Instruct \
  --lora ../wilson-sft-adapter \
  --outfile wilson-adapter.gguf

Step 8: Deploy to Ollama

Create a Modelfile:

FROM Qwen/Qwen2.5-7B-Instruct
ADAPTER ./wilson-adapter.gguf

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|im_end|>"

SYSTEM """You are Wilson, an AI bookkeeper from Open Accountant. You help users manage their personal finances with precision and care. You have access to tools for importing transactions, categorizing spending, analyzing budgets, and generating reports."""

Build the model:

ollama create wilson-7b -f Modelfile

Step 9: Switch Wilson

/model ollama:wilson-7b

Or set as default:

echo 'DEFAULT_MODEL=ollama:wilson-7b' >> .env

Step 10: Verify

Test your model with representative queries:

wilson --run "Categorize my last 10 transactions"
wilson --run "What did I spend on dining out this month?"
wilson --run "Am I on track with my grocery budget?"

Compare responses against a cloud model. If quality is noticeably worse in specific areas, those interactions become your next batch of DPO training pairs.

Troubleshooting

Too Few Examples

Symptom: Model outputs generic or repetitive responses. Fix: Collect more interactions. Use Wilson with a cloud model for another week, annotate interactions, and re-train. Aim for 200+ SFT examples.

Overfitting

Symptom: Model repeats training data verbatim or performs well on familiar queries but poorly on new ones. Fix: Reduce epochs (try 2 instead of 3), lower learning rate to 1e-4, use r=8 instead of r=16, or hold out 10-20% of data for validation.

Tool Calling Breaks

Symptom: Model responds with plain text instead of calling tools, or calls the wrong tool. Fix: Ensure your SFT data includes interactions with tool calls. The model needs to see examples of correct tool usage in the training data. Consider creating DPO pairs specifically for tool-calling interactions.

GGUF Conversion Fails

Symptom: convert_lora_to_gguf.py crashes or produces an invalid file. Fix: Make sure you’re using the latest llama.cpp. Some model architectures require specific conversion scripts — check the llama.cpp repository for model-specific instructions.

Model is Slow

Symptom: Responses take 10+ seconds. Fix: Use a smaller base model (3B instead of 7B). On Apple Silicon, make sure Ollama is using Metal acceleration (ollama ps shows GPU layers). Check that you’re not running other memory-intensive apps.

What’s Next

After deploying v1, the iterative improvement cycle begins:

Use your trained model daily
Wilson captures all interactions automatically
Annotate — rate quality, create DPO pairs comparing trained vs cloud
Export new data, combine with original dataset
Train v2
Deploy, compare, repeat

Each cycle makes your model better at your specific financial tasks. See Deploying Trained Models for more on the improvement loop.