Switched to local models after quality dropped off a cliff and token consumption seemed to double. Having some success with Qwen+Crush and have been more productive.
Would love some more info on how you got any local model working with Crush. Love charmbracelet but the docs are all over the place on linking into arbitrary APIs.
Obviously the context window settings are going to depend on what you've got set on the llama-server/llama-swap side. Multiple models on the same server like I have in the config snippet above is mostly only relevant if you're using llama-swap.
TL;DR is you need to set up a provider for your local LLM server, then set at least one model on that server, then set the large and small models that crush actually uses to respond to prompts to use that provider/model combo. Pretty straightforward but agree that their docs could be better for local LLM setups in particular.
For me, I've got llama-swap running and set up on my tailnet as a [tailscale service](https://tailscale.com/docs/features/tailscale-services) so I'm able to use my local LLMs anywhere I would use a cloud-hosted one, and I just set the provider baseurl in crush.json to my tailscale service URL and it works great.