End-to-End Training Tutorial
This tutorial walks through every step of training and deploying a custom model, start to finish. Each step links to detailed documentation if you want to go deeper.
Prerequisites
Section titled “Prerequisites”Before starting, make sure you have:
- Wilson installed and working (
wilson --statusshows transactions) - 50+ captured interactions (use Wilson with a cloud model for a week or two)
- Ollama installed (
brew install ollama) - Python 3.10+ with
piporuv - A HuggingFace account with a write token
Step 1: Check Your Data
Section titled “Step 1: Check Your Data”Open the Dashboard and check how many interactions you have:
curl http://localhost:3141/api/export/training/statsYou should see at least 50 interactions. More is better — 200+ produces noticeably stronger models.
| Field | Minimum | Recommended |
|---|---|---|
| Total interactions | 50 | 200+ |
| Annotated (4-5 stars) | 30 | 100+ |
| DPO pairs | 10 | 50+ |
If you don’t have enough data, keep using Wilson with a cloud model and come back later.
Step 2: Annotate Interactions
Section titled “Step 2: Annotate Interactions”Open the Dashboard Training tab:
/dashboardNavigate to the Training section. For each interaction:
- Read the conversation — was the response helpful and accurate?
- Rate it 1-5 stars — 4-5 stars means “good enough to teach a model”
- Create DPO pairs — for interactions where the model could have done better, mark the cloud response as “chosen” and the weaker response as “rejected”
Aim for at least 30 interactions rated 4-5 stars and 10 DPO pairs before moving on.
Step 3: Export Training Data
Section titled “Step 3: Export Training Data”Export from the Dashboard or via the API:
# SFT format (supervised fine-tuning)curl -o wilson-sft.jsonl http://localhost:3141/api/export/training/sft
# DPO format (preference pairs)curl -o wilson-dpo.jsonl http://localhost:3141/api/export/training/dpoSee Exporting Training Data for format details and filtering options.
Step 4: Validate the Export
Section titled “Step 4: Validate the Export”Inspect your data before training:
import json
# Check SFT datawith open("wilson-sft.jsonl") as f: lines = [json.loads(line) for line in f if line.strip()] print(f"SFT examples: {len(lines)}") print(f"First example messages: {len(lines[0]['messages'])}") print(f"Roles: {[m['role'] for m in lines[0]['messages']]}")
# Check DPO datawith open("wilson-dpo.jsonl") as f: pairs = [json.loads(line) for line in f if line.strip()] print(f"\nDPO pairs: {len(pairs)}") if pairs: print(f"Keys: {list(pairs[0].keys())}")Expected output: SFT lines have messages arrays with system, user, and assistant roles. DPO lines have prompt, chosen, and rejected fields.
Step 5: Upload to HuggingFace
Section titled “Step 5: Upload to HuggingFace”from datasets import load_datasetfrom huggingface_hub import login
login(token="hf_your_token")
# Upload SFT datasetsft_ds = load_dataset("json", data_files="wilson-sft.jsonl", split="train")sft_ds.push_to_hub("your-username/wilson-bookkeeper-sft", private=True)
# Upload DPO dataset (if you have pairs)dpo_ds = load_dataset("json", data_files="wilson-dpo.jsonl", split="train")dpo_ds.push_to_hub("your-username/wilson-bookkeeper-dpo", private=True)Step 6: Train
Section titled “Step 6: Train”Install dependencies:
pip install transformers trl peft datasets accelerate bitsandbytesRun SFT training:
from datasets import load_datasetfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigfrom peft import LoraConfigfrom trl import SFTTrainer, SFTConfig
dataset = load_dataset("your-username/wilson-bookkeeper-sft", split="train")
model_name = "Qwen/Qwen2.5-7B-Instruct"bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16",)
model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, device_map="auto")tokenizer = AutoTokenizer.from_pretrained(model_name)
lora_config = LoraConfig( r=16, lora_alpha=32, target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], lora_dropout=0.05, task_type="CAUSAL_LM",)
training_config = SFTConfig( output_dir="./wilson-sft-adapter", num_train_epochs=3, per_device_train_batch_size=4, learning_rate=2e-4, warmup_steps=10, logging_steps=10, save_strategy="epoch", bf16=True,)
trainer = SFTTrainer( model=model, train_dataset=dataset, peft_config=lora_config, args=training_config, tokenizer=tokenizer,)trainer.train()trainer.save_model("./wilson-sft-adapter")Training time depends on hardware and dataset size. Expect 10-30 minutes on a GPU, 1-2 hours on CPU.
See Fine-Tuning with LoRA for DPO training, parameter tuning, and tips on avoiding overfitting.
Step 7: Convert to GGUF
Section titled “Step 7: Convert to GGUF”git clone https://github.com/ggerganov/llama.cppcd llama.cpp
python convert_lora_to_gguf.py \ --base Qwen/Qwen2.5-7B-Instruct \ --lora ../wilson-sft-adapter \ --outfile wilson-adapter.ggufStep 8: Deploy to Ollama
Section titled “Step 8: Deploy to Ollama”Create a Modelfile:
FROM Qwen/Qwen2.5-7B-InstructADAPTER ./wilson-adapter.gguf
PARAMETER temperature 0.7PARAMETER top_p 0.9PARAMETER stop "<|im_end|>"
SYSTEM """You are Wilson, an AI bookkeeper from Open Accountant. You help users manage their personal finances with precision and care. You have access to tools for importing transactions, categorizing spending, analyzing budgets, and generating reports."""Build the model:
ollama create wilson-7b -f ModelfileStep 9: Switch Wilson
Section titled “Step 9: Switch Wilson”/model ollama:wilson-7bOr set as default:
echo 'DEFAULT_MODEL=ollama:wilson-7b' >> .envStep 10: Verify
Section titled “Step 10: Verify”Test your model with representative queries:
wilson --run "Categorize my last 10 transactions"wilson --run "What did I spend on dining out this month?"wilson --run "Am I on track with my grocery budget?"Compare responses against a cloud model. If quality is noticeably worse in specific areas, those interactions become your next batch of DPO training pairs.
Troubleshooting
Section titled “Troubleshooting”Too Few Examples
Section titled “Too Few Examples”Symptom: Model outputs generic or repetitive responses. Fix: Collect more interactions. Use Wilson with a cloud model for another week, annotate interactions, and re-train. Aim for 200+ SFT examples.
Overfitting
Section titled “Overfitting”Symptom: Model repeats training data verbatim or performs well on familiar queries but poorly on new ones.
Fix: Reduce epochs (try 2 instead of 3), lower learning rate to 1e-4, use r=8 instead of r=16, or hold out 10-20% of data for validation.
Tool Calling Breaks
Section titled “Tool Calling Breaks”Symptom: Model responds with plain text instead of calling tools, or calls the wrong tool. Fix: Ensure your SFT data includes interactions with tool calls. The model needs to see examples of correct tool usage in the training data. Consider creating DPO pairs specifically for tool-calling interactions.
GGUF Conversion Fails
Section titled “GGUF Conversion Fails”Symptom: convert_lora_to_gguf.py crashes or produces an invalid file.
Fix: Make sure you’re using the latest llama.cpp. Some model architectures require specific conversion scripts — check the llama.cpp repository for model-specific instructions.
Model is Slow
Section titled “Model is Slow”Symptom: Responses take 10+ seconds.
Fix: Use a smaller base model (3B instead of 7B). On Apple Silicon, make sure Ollama is using Metal acceleration (ollama ps shows GPU layers). Check that you’re not running other memory-intensive apps.
What’s Next
Section titled “What’s Next”After deploying v1, the iterative improvement cycle begins:
- Use your trained model daily
- Wilson captures all interactions automatically
- Annotate — rate quality, create DPO pairs comparing trained vs cloud
- Export new data, combine with original dataset
- Train v2
- Deploy, compare, repeat
Each cycle makes your model better at your specific financial tasks. See Deploying Trained Models for more on the improvement loop.