Switched to local models after quality dropped off a cliff and token consumption...

tfrancisl · 2026-04-24T17:13:39 1777050819

Would love some more info on how you got any local model working with Crush. Love charmbracelet but the docs are all over the place on linking into arbitrary APIs.

porkloin · 2026-04-24T18:11:48 1777054308

assuming you have a locally running llama-server or llama-swap, just drop this into your crush.json with your setup details/local addresses etc:

Edit: i forgot HN doesn't do code fences. See https://pastebin.com/2rQg0r2L

Obviously the context window settings are going to depend on what you've got set on the llama-server/llama-swap side. Multiple models on the same server like I have in the config snippet above is mostly only relevant if you're using llama-swap.

TL;DR is you need to set up a provider for your local LLM server, then set at least one model on that server, then set the large and small models that crush actually uses to respond to prompts to use that provider/model combo. Pretty straightforward but agree that their docs could be better for local LLM setups in particular.

For me, I've got llama-swap running and set up on my tailnet as a [tailscale service](https://tailscale.com/docs/features/tailscale-services) so I'm able to use my local LLMs anywhere I would use a cloud-hosted one, and I just set the provider baseurl in crush.json to my tailscale service URL and it works great.