Fine-Tuning with LoRA

This guide walks through the full fine-tuning workflow: from exported JSONL to a LoRA adapter running locally in Ollama.

Prerequisites

Exported training data (see Exporting Training Data)
A HuggingFace account with a write token (HF_TOKEN)
Python 3.10+ with pip or uv
Ollama installed locally

Step 1: Upload Dataset to HuggingFace

Use the hugging-face-dataset-creator skill to upload your JSONL:

# Ask Wilson to upload (if using a cloud model)
"Upload wilson-sft.jsonl as a dataset called my-username/wilson-bookkeeper-sft"

Or upload manually:

from datasets import load_dataset
from huggingface_hub import login

login(token="hf_your_token")

ds = load_dataset("json", data_files="wilson-sft.jsonl", split="train")
ds.push_to_hub("my-username/wilson-bookkeeper-sft", private=True)

Step 2: Choose a Base Model

Recommended base models for financial bookkeeping:

Model	Size	Why
`Qwen/Qwen2.5-7B-Instruct`	7B	Strong instruction following, good with structured data
`mistralai/Mistral-7B-Instruct-v0.3`	7B	Solid general-purpose, fast inference
`meta-llama/Llama-3.1-8B-Instruct`	8B	High quality, well-supported tooling
`google/gemma-2-9b-it`	9B	Compact and efficient

For machines with limited RAM (8 GB), consider 3B models:

Model	Size	Why
`Qwen/Qwen2.5-3B-Instruct`	3B	Best small model for structured tasks
`meta-llama/Llama-3.2-3B-Instruct`	3B	Compact with strong categorization

Step 3: Train with LoRA

Using the model-trainer Skill

Ask Wilson to generate a training configuration:

"Train a LoRA adapter on my-username/wilson-bookkeeper-sft using Qwen2.5-7B-Instruct"

Wilson uses the model-trainer skill to configure and launch training on HuggingFace.

Manual Training

Install dependencies:

pip install transformers trl peft datasets accelerate bitsandbytes

Training script:

from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig

# Load dataset
dataset = load_dataset("my-username/wilson-bookkeeper-sft", split="train")

# Base model with 4-bit quantization
model_name = "Qwen/Qwen2.5-7B-Instruct"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# LoRA config
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    task_type="CAUSAL_LM",
)

# Training config
training_config = SFTConfig(
    output_dir="./wilson-sft-adapter",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    learning_rate=2e-4,
    warmup_steps=10,
    logging_steps=10,
    save_strategy="epoch",
    bf16=True,
)

# Train
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=lora_config,
    args=training_config,
    tokenizer=tokenizer,
)
trainer.train()
trainer.save_model("./wilson-sft-adapter")

DPO Training

For preference-based training, replace SFTTrainer with DPOTrainer:

from trl import DPOTrainer, DPOConfig

dpo_dataset = load_dataset("json", data_files="wilson-dpo.jsonl", split="train")

dpo_config = DPOConfig(
    output_dir="./wilson-dpo-adapter",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    learning_rate=5e-5,
    beta=0.1,
    bf16=True,
)

trainer = DPOTrainer(
    model=model,
    ref_model=None,  # uses implicit reference with LoRA
    train_dataset=dpo_dataset,
    peft_config=lora_config,
    args=dpo_config,
    tokenizer=tokenizer,
)
trainer.train()
trainer.save_model("./wilson-dpo-adapter")

Step 4: Convert to GGUF

Ollama needs GGUF format. Use llama.cpp to convert:

# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# Convert LoRA adapter to GGUF
python convert_lora_to_gguf.py \
  --base Qwen/Qwen2.5-7B-Instruct \
  --lora ../wilson-sft-adapter \
  --outfile wilson-adapter.gguf

For full model merge + quantization (produces a standalone model):

# Merge LoRA into base model
python merge_lora.py \
  --base Qwen/Qwen2.5-7B-Instruct \
  --lora ../wilson-sft-adapter \
  --output ../wilson-merged

# Convert to GGUF with Q4_K_M quantization
python convert_hf_to_gguf.py ../wilson-merged --outtype q4_k_m

Step 5: Deploy to Ollama

Create a Modelfile:

FROM Qwen/Qwen2.5-7B-Instruct
ADAPTER ./wilson-adapter.gguf

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|im_end|>"

SYSTEM """You are Wilson, an AI bookkeeper from Open Accountant. You help users manage their personal finances with precision and care. You have access to tools for importing transactions, categorizing spending, analyzing budgets, and generating reports."""

Build and run:

ollama create wilson-7b -f Modelfile
ollama run wilson-7b

Step 6: Switch Wilson to Your Model

# Set as default
echo 'DEFAULT_MODEL=ollama:wilson-7b' >> .env

# Or switch interactively
/model ollama:wilson-7b

Wilson now uses your fine-tuned model for all interactions. The training pipeline continues capturing interactions with this model, so you can iteratively improve it.

Training Tips

Quality Over Quantity

A small dataset of 200 excellent interactions outperforms 2000 mediocre ones. Focus on:

Interactions where Wilson correctly used tools in the right order
Responses that were well-structured and actionable
Categorization calls that matched your preferred categories

Iterative Improvement

Train v1 on cloud model interactions
Switch to v1, use it for a week
Annotate v1’s interactions — compare against cloud model responses
Create DPO pairs (v1 response = rejected, cloud model = chosen)
Train v2 with SFT + DPO
Repeat

LoRA Parameters

Parameter	Default	Notes
`r` (rank)	16	Higher = more capacity, more VRAM. 8-32 is typical.
`alpha`	32	Scaling factor. Usually `2 * r`.
`target_modules`	`q_proj, k_proj, v_proj, o_proj`	Attention projection layers. Adding `gate_proj, up_proj, down_proj` increases capacity.
`epochs`	3	2-5 is typical. Watch for overfitting with small datasets.
`learning_rate`	2e-4	Standard for LoRA. Reduce to 5e-5 for DPO.

Avoiding Overfitting

Hold out 10-20% of your data for validation
Monitor training loss — if it plateaus or increases, stop early
Use a lower learning rate (1e-4) with more epochs rather than a high rate with fewer
With < 100 examples, use r=8 and epochs=2