Skip to content

Fine-Tuning with LoRA

This guide walks through the full fine-tuning workflow: from exported JSONL to a LoRA adapter running locally in Ollama.

  • Exported training data (see Exporting Training Data)
  • A HuggingFace account with a write token (HF_TOKEN)
  • Python 3.10+ with pip or uv
  • Ollama installed locally

Use the hugging-face-dataset-creator skill to upload your JSONL:

Terminal window
# Ask Wilson to upload (if using a cloud model)
"Upload wilson-sft.jsonl as a dataset called my-username/wilson-bookkeeper-sft"

Or upload manually:

from datasets import load_dataset
from huggingface_hub import login
login(token="hf_your_token")
ds = load_dataset("json", data_files="wilson-sft.jsonl", split="train")
ds.push_to_hub("my-username/wilson-bookkeeper-sft", private=True)

Recommended base models for financial bookkeeping:

ModelSizeWhy
Qwen/Qwen2.5-7B-Instruct7BStrong instruction following, good with structured data
mistralai/Mistral-7B-Instruct-v0.37BSolid general-purpose, fast inference
meta-llama/Llama-3.1-8B-Instruct8BHigh quality, well-supported tooling
google/gemma-2-9b-it9BCompact and efficient

For machines with limited RAM (8 GB), consider 3B models:

ModelSizeWhy
Qwen/Qwen2.5-3B-Instruct3BBest small model for structured tasks
meta-llama/Llama-3.2-3B-Instruct3BCompact with strong categorization

Ask Wilson to generate a training configuration:

"Train a LoRA adapter on my-username/wilson-bookkeeper-sft using Qwen2.5-7B-Instruct"

Wilson uses the model-trainer skill to configure and launch training on HuggingFace.

Install dependencies:

Terminal window
pip install transformers trl peft datasets accelerate bitsandbytes

Training script:

from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig
# Load dataset
dataset = load_dataset("my-username/wilson-bookkeeper-sft", split="train")
# Base model with 4-bit quantization
model_name = "Qwen/Qwen2.5-7B-Instruct"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16",
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# LoRA config
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05,
task_type="CAUSAL_LM",
)
# Training config
training_config = SFTConfig(
output_dir="./wilson-sft-adapter",
num_train_epochs=3,
per_device_train_batch_size=4,
learning_rate=2e-4,
warmup_steps=10,
logging_steps=10,
save_strategy="epoch",
bf16=True,
)
# Train
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=lora_config,
args=training_config,
tokenizer=tokenizer,
)
trainer.train()
trainer.save_model("./wilson-sft-adapter")

For preference-based training, replace SFTTrainer with DPOTrainer:

from trl import DPOTrainer, DPOConfig
dpo_dataset = load_dataset("json", data_files="wilson-dpo.jsonl", split="train")
dpo_config = DPOConfig(
output_dir="./wilson-dpo-adapter",
num_train_epochs=3,
per_device_train_batch_size=2,
learning_rate=5e-5,
beta=0.1,
bf16=True,
)
trainer = DPOTrainer(
model=model,
ref_model=None, # uses implicit reference with LoRA
train_dataset=dpo_dataset,
peft_config=lora_config,
args=dpo_config,
tokenizer=tokenizer,
)
trainer.train()
trainer.save_model("./wilson-dpo-adapter")

Ollama needs GGUF format. Use llama.cpp to convert:

Terminal window
# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Convert LoRA adapter to GGUF
python convert_lora_to_gguf.py \
--base Qwen/Qwen2.5-7B-Instruct \
--lora ../wilson-sft-adapter \
--outfile wilson-adapter.gguf

For full model merge + quantization (produces a standalone model):

Terminal window
# Merge LoRA into base model
python merge_lora.py \
--base Qwen/Qwen2.5-7B-Instruct \
--lora ../wilson-sft-adapter \
--output ../wilson-merged
# Convert to GGUF with Q4_K_M quantization
python convert_hf_to_gguf.py ../wilson-merged --outtype q4_k_m

Create a Modelfile:

FROM Qwen/Qwen2.5-7B-Instruct
ADAPTER ./wilson-adapter.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|im_end|>"
SYSTEM """You are Wilson, an AI bookkeeper from Open Accountant. You help users manage their personal finances with precision and care. You have access to tools for importing transactions, categorizing spending, analyzing budgets, and generating reports."""

Build and run:

Terminal window
ollama create wilson-7b -f Modelfile
ollama run wilson-7b
Terminal window
# Set as default
echo 'DEFAULT_MODEL=ollama:wilson-7b' >> .env
# Or switch interactively
/model ollama:wilson-7b

Wilson now uses your fine-tuned model for all interactions. The training pipeline continues capturing interactions with this model, so you can iteratively improve it.

A small dataset of 200 excellent interactions outperforms 2000 mediocre ones. Focus on:

  • Interactions where Wilson correctly used tools in the right order
  • Responses that were well-structured and actionable
  • Categorization calls that matched your preferred categories
  1. Train v1 on cloud model interactions
  2. Switch to v1, use it for a week
  3. Annotate v1’s interactions — compare against cloud model responses
  4. Create DPO pairs (v1 response = rejected, cloud model = chosen)
  5. Train v2 with SFT + DPO
  6. Repeat
ParameterDefaultNotes
r (rank)16Higher = more capacity, more VRAM. 8-32 is typical.
alpha32Scaling factor. Usually 2 * r.
target_modulesq_proj, k_proj, v_proj, o_projAttention projection layers. Adding gate_proj, up_proj, down_proj increases capacity.
epochs32-5 is typical. Watch for overfitting with small datasets.
learning_rate2e-4Standard for LoRA. Reduce to 5e-5 for DPO.
  • Hold out 10-20% of your data for validation
  • Monitor training loss — if it plateaus or increases, stop early
  • Use a lower learning rate (1e-4) with more epochs rather than a high rate with fewer
  • With < 100 examples, use r=8 and epochs=2