Fine-Tuning with LoRA
This guide walks through the full fine-tuning workflow: from exported JSONL to a LoRA adapter running locally in Ollama.
Prerequisites
Section titled “Prerequisites”- Exported training data (see Exporting Training Data)
- A HuggingFace account with a write token (
HF_TOKEN) - Python 3.10+ with
piporuv - Ollama installed locally
Step 1: Upload Dataset to HuggingFace
Section titled “Step 1: Upload Dataset to HuggingFace”Use the hugging-face-dataset-creator skill to upload your JSONL:
# Ask Wilson to upload (if using a cloud model)"Upload wilson-sft.jsonl as a dataset called my-username/wilson-bookkeeper-sft"Or upload manually:
from datasets import load_datasetfrom huggingface_hub import login
login(token="hf_your_token")
ds = load_dataset("json", data_files="wilson-sft.jsonl", split="train")ds.push_to_hub("my-username/wilson-bookkeeper-sft", private=True)Step 2: Choose a Base Model
Section titled “Step 2: Choose a Base Model”Recommended base models for financial bookkeeping:
| Model | Size | Why |
|---|---|---|
Qwen/Qwen2.5-7B-Instruct | 7B | Strong instruction following, good with structured data |
mistralai/Mistral-7B-Instruct-v0.3 | 7B | Solid general-purpose, fast inference |
meta-llama/Llama-3.1-8B-Instruct | 8B | High quality, well-supported tooling |
google/gemma-2-9b-it | 9B | Compact and efficient |
For machines with limited RAM (8 GB), consider 3B models:
| Model | Size | Why |
|---|---|---|
Qwen/Qwen2.5-3B-Instruct | 3B | Best small model for structured tasks |
meta-llama/Llama-3.2-3B-Instruct | 3B | Compact with strong categorization |
Step 3: Train with LoRA
Section titled “Step 3: Train with LoRA”Using the model-trainer Skill
Section titled “Using the model-trainer Skill”Ask Wilson to generate a training configuration:
"Train a LoRA adapter on my-username/wilson-bookkeeper-sft using Qwen2.5-7B-Instruct"Wilson uses the model-trainer skill to configure and launch training on HuggingFace.
Manual Training
Section titled “Manual Training”Install dependencies:
pip install transformers trl peft datasets accelerate bitsandbytesTraining script:
from datasets import load_datasetfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigfrom peft import LoraConfigfrom trl import SFTTrainer, SFTConfig
# Load datasetdataset = load_dataset("my-username/wilson-bookkeeper-sft", split="train")
# Base model with 4-bit quantizationmodel_name = "Qwen/Qwen2.5-7B-Instruct"bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16",)
model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, device_map="auto",)tokenizer = AutoTokenizer.from_pretrained(model_name)
# LoRA configlora_config = LoraConfig( r=16, lora_alpha=32, target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], lora_dropout=0.05, task_type="CAUSAL_LM",)
# Training configtraining_config = SFTConfig( output_dir="./wilson-sft-adapter", num_train_epochs=3, per_device_train_batch_size=4, learning_rate=2e-4, warmup_steps=10, logging_steps=10, save_strategy="epoch", bf16=True,)
# Traintrainer = SFTTrainer( model=model, train_dataset=dataset, peft_config=lora_config, args=training_config, tokenizer=tokenizer,)trainer.train()trainer.save_model("./wilson-sft-adapter")DPO Training
Section titled “DPO Training”For preference-based training, replace SFTTrainer with DPOTrainer:
from trl import DPOTrainer, DPOConfig
dpo_dataset = load_dataset("json", data_files="wilson-dpo.jsonl", split="train")
dpo_config = DPOConfig( output_dir="./wilson-dpo-adapter", num_train_epochs=3, per_device_train_batch_size=2, learning_rate=5e-5, beta=0.1, bf16=True,)
trainer = DPOTrainer( model=model, ref_model=None, # uses implicit reference with LoRA train_dataset=dpo_dataset, peft_config=lora_config, args=dpo_config, tokenizer=tokenizer,)trainer.train()trainer.save_model("./wilson-dpo-adapter")Step 4: Convert to GGUF
Section titled “Step 4: Convert to GGUF”Ollama needs GGUF format. Use llama.cpp to convert:
# Clone llama.cppgit clone https://github.com/ggerganov/llama.cppcd llama.cpp
# Convert LoRA adapter to GGUFpython convert_lora_to_gguf.py \ --base Qwen/Qwen2.5-7B-Instruct \ --lora ../wilson-sft-adapter \ --outfile wilson-adapter.ggufFor full model merge + quantization (produces a standalone model):
# Merge LoRA into base modelpython merge_lora.py \ --base Qwen/Qwen2.5-7B-Instruct \ --lora ../wilson-sft-adapter \ --output ../wilson-merged
# Convert to GGUF with Q4_K_M quantizationpython convert_hf_to_gguf.py ../wilson-merged --outtype q4_k_mStep 5: Deploy to Ollama
Section titled “Step 5: Deploy to Ollama”Create a Modelfile:
FROM Qwen/Qwen2.5-7B-InstructADAPTER ./wilson-adapter.gguf
PARAMETER temperature 0.7PARAMETER top_p 0.9PARAMETER stop "<|im_end|>"
SYSTEM """You are Wilson, an AI bookkeeper from Open Accountant. You help users manage their personal finances with precision and care. You have access to tools for importing transactions, categorizing spending, analyzing budgets, and generating reports."""Build and run:
ollama create wilson-7b -f Modelfileollama run wilson-7bStep 6: Switch Wilson to Your Model
Section titled “Step 6: Switch Wilson to Your Model”# Set as defaultecho 'DEFAULT_MODEL=ollama:wilson-7b' >> .env
# Or switch interactively/model ollama:wilson-7bWilson now uses your fine-tuned model for all interactions. The training pipeline continues capturing interactions with this model, so you can iteratively improve it.
Training Tips
Section titled “Training Tips”Quality Over Quantity
Section titled “Quality Over Quantity”A small dataset of 200 excellent interactions outperforms 2000 mediocre ones. Focus on:
- Interactions where Wilson correctly used tools in the right order
- Responses that were well-structured and actionable
- Categorization calls that matched your preferred categories
Iterative Improvement
Section titled “Iterative Improvement”- Train v1 on cloud model interactions
- Switch to v1, use it for a week
- Annotate v1’s interactions — compare against cloud model responses
- Create DPO pairs (v1 response = rejected, cloud model = chosen)
- Train v2 with SFT + DPO
- Repeat
LoRA Parameters
Section titled “LoRA Parameters”| Parameter | Default | Notes |
|---|---|---|
r (rank) | 16 | Higher = more capacity, more VRAM. 8-32 is typical. |
alpha | 32 | Scaling factor. Usually 2 * r. |
target_modules | q_proj, k_proj, v_proj, o_proj | Attention projection layers. Adding gate_proj, up_proj, down_proj increases capacity. |
epochs | 3 | 2-5 is typical. Watch for overfitting with small datasets. |
learning_rate | 2e-4 | Standard for LoRA. Reduce to 5e-5 for DPO. |
Avoiding Overfitting
Section titled “Avoiding Overfitting”- Hold out 10-20% of your data for validation
- Monitor training loss — if it plateaus or increases, stop early
- Use a lower learning rate (1e-4) with more epochs rather than a high rate with fewer
- With < 100 examples, use
r=8andepochs=2