Deploying Trained Models
After fine-tuning a LoRA adapter (see Fine-Tuning with LoRA), you need to deploy it so Wilson can use it. Two paths are available: Ollama (recommended) and Transformers.js (experimental).
Deploy to Ollama
Section titled “Deploy to Ollama”Ollama is the primary deployment target. It supports GGUF models with native tool calling, GPU acceleration, and the full model ecosystem.
Convert to GGUF
Section titled “Convert to GGUF”If you haven’t already converted your adapter, use llama.cpp:
git clone https://github.com/ggerganov/llama.cppcd llama.cpp
python convert_lora_to_gguf.py \ --base Qwen/Qwen2.5-7B-Instruct \ --lora ../wilson-sft-adapter \ --outfile wilson-adapter.ggufSee Fine-Tuning with LoRA — Step 4 for full conversion details including standalone model merging.
Create a Modelfile
Section titled “Create a Modelfile”Wilson’s generateOllamaModelfile() function produces a Modelfile with the correct system prompt and parameters. The format:
FROM Qwen/Qwen2.5-7B-InstructADAPTER ./wilson-adapter.gguf
PARAMETER temperature 0.7PARAMETER top_p 0.9PARAMETER stop "<|im_end|>"
SYSTEM """You are Wilson, an AI bookkeeper from Open Accountant. You help users manage their personal finances with precision and care. You have access to tools for importing transactions, categorizing spending, analyzing budgets, and generating reports."""Save this as Modelfile in the same directory as your GGUF adapter.
Build and Register
Section titled “Build and Register”ollama create wilson-7b -f ModelfileVerify it works:
ollama run wilson-7b "Categorize: Whole Foods $45.23"Set as Default
Section titled “Set as Default”Switch Wilson to your fine-tuned model:
# Set as default for all sessionsecho 'DEFAULT_MODEL=ollama:wilson-7b' >> .env
# Or switch interactively/model ollama:wilson-7bDeploy to Transformers.js (Experimental)
Section titled “Deploy to Transformers.js (Experimental)”Transformers.js supports ONNX-format models running in-process. This path is experimental and has limitations.
Requirements
Section titled “Requirements”- Your adapter and base model must be in ONNX format
- The model must be compatible with
@huggingface/transformersv3 - Tool calling uses prompt injection (less reliable than Ollama’s native support)
Use the transformers: prefix to select your model:
/model transformers:your-username/wilson-onnxSee Transformers.js Local Inference for supported models and configuration.
Verify the Deployment
Section titled “Verify the Deployment”Smoke Test
Section titled “Smoke Test”Run a few representative queries to confirm your model works:
# Test basic categorizationwilson --run "Categorize my last 5 transactions"
# Test tool callingwilson --run "What did I spend on groceries this month?"
# Test financial reasoningwilson --run "Am I on track with my budget?"Compare Against Cloud
Section titled “Compare Against Cloud”Run the same queries with a cloud model and your fine-tuned model side by side:
# Cloud model baselineDEFAULT_MODEL=claude-sonnet-4-6 wilson --run "Summarize my spending this week"
# Your fine-tuned modelDEFAULT_MODEL=ollama:wilson-7b wilson --run "Summarize my spending this week"Look for correct tool usage, accurate categorization, and well-structured responses.
The Iterative Improvement Loop
Section titled “The Iterative Improvement Loop”Deployment isn’t the end — it’s the start of a feedback loop.
1. Use your trained model (wilson-7b)2. Wilson captures all interactions automatically3. Review interactions in the Dashboard Training tab4. Create DPO pairs: trained model = rejected, cloud model = chosen5. Export new training data (SFT + DPO)6. Fine-tune v27. Deploy v2, repeatEach iteration improves your model. The DPO pairs teach it specifically where it falls short compared to cloud models, while SFT on good interactions reinforces correct behavior.
Tips for Iteration
Section titled “Tips for Iteration”- Rate aggressively. Give 1-2 stars to responses that miss the mark. 4-5 stars to good ones.
- Focus DPO pairs on tool calling. The biggest gap between small and large models is tool usage.
- Re-train every 100+ new annotations. Smaller batches risk overfitting.
- Version your models. Name them
wilson-7b-v1,wilson-7b-v2, etc. Keep old versions until the new one proves better.
See Also
Section titled “See Also”- Fine-Tuning with LoRA — Training the adapter
- Exporting Training Data — SFT and DPO export formats
- Model Management —
/modeland/pullcommands