Foundation models
Fine-tuning, LoRA adapters, RLHF.
We adapt open-weights and frontier models to specific domains without burning a foundation-model budget. The point is rarely a bigger model — it's the right model, post-trained on the right data, evaluated against the right benchmark.
What we use, and when
Closed frontier models (Anthropic, OpenAI, Google) when latency matters less than reasoning depth and the use case is general. Open-weights models (Llama, Qwen, Mistral, DeepSeek) when we need full control of inference cost, data residency, or post-training. The decision is rarely religious — it follows the eval and the unit economics.
Techniques in the toolbox
Full fine-tune
When the domain shift is real and the data volume justifies the GPU bill.
LoRA / QLoRA
Parameter-efficient adapters. Cheap to train, cheap to swap, cheap to ship.
RLHF / DPO
Preference optimization for style, safety, and judgment-call decisions.
Distillation
Compress a frontier-model behavior into a smaller, faster, deployable model.
Constrained decoding
JSON-mode, regex grammars, schema-constrained generation when structure matters.
Tool-augmented prompting
Sometimes the right answer is a smaller model with better tools, not a bigger model.