Hacker Newsnew | past | comments | ask | show | jobs | submit | am17an's commentslogin

You can still run larger MoE models using expert weight off-loading to the CPU for token generation. They are by and large useable, I get ~50 toks/second on a kimi linear 48B (3B active) model on a potato PC + a 3090

Sure. “Tell me a joke”

Damn I’m jealous that they figured out how to pay their contributors. I’ve been toiling away for free

They already have with qwen3.5

I agree with the previous post that there's hope that there's a convergence point in the not too distant future where consumer hardware can run powerful models.

At the moment, the 397Bn Qwen3.5 model (which I assume is what you're referring to) is still out of reach of most consumers to run locally: the only relatively straightforward path (i.e. discounting custom Threadripper builds) to running it would be a 512Gb Mac Studio.

However, in a generation or two (of hardware and models) maybe we'll see convergence with more hardware available with 3-400Gb of memory for more approachable money (a tough sell right now, I accept, with memory prices as they are) and models offering great performance in this size range.


I was referring to the 35B version. It is surprisingly good for its size. You can use it for implementation tasks without it going off the rails

What do you use for sub-50ms inference?


Could be bank statement line item Classification

Honestly you can run this on a 16GB VRAM GPU with llama.cpp. Just try it!


One often overlooked after that is ggml, the tensor library that runs llama.cpp is not based on pytorch, rather just plain cpp. In a world where pytorch dominates, it shows that alternatives are possible and are worthy to be pursued.


Holy smokes we're cooked.


I immediately flagged it. But it doesn't matter much. No one has skin in the game of commenting on HN anyway.


Yeah that’s an LLM isn’t it? Commenting on outsourcing judgement. The dead internet is real


Maintainers time is a more scarce resource than free tokens. I would much rather get my time back after reading those PRs


1) Python is unreadable." Would you prefer C or C++?

> Unironically, yes. Unless I never plan to look at that code again


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: