Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The site literally has a quick visual comparison near the top, which shows that theirs is the closest to 16bit performance compared to the others. I don't get what more you'd want.

https://cdn.prod.website-files.com/64f4e81394e25710d22d042e/...



These are comparisons to other quantizing methods. That is fine.

What I want to see is comparisons to NON-quantized models all with around the same VRAM along with associated inference latencies.

Also, we would want to see the same quantizing schemes applied to other base models.. because perhaps the paper's proposed quantizing scheme only beats others using a particular base model.


They tested the quantisation on 3 different models.

They also show it has little to no effect relative to fp16 on these models.

IMO that's enough. Comparison against smaller models is much less useful because you can't use the same random seeds. So you end up with a very objective "this is worse" based purely on aesthetic preferences of one person vs another. You already see this with Flux Schnell vs. the larger Flux models.


I disagree.

They report that their method produces a model that is 6.5 GB from flux (22.7GB). Why wouldn't you want to know how their 6.5GB model compares to other 6.5GB models?

Regarding aesthetic prefs: it's an open problem what an appropriate metric is for GenAI... LLM arena is widely regarded as a good way to measure LLMs and that's user preferences.

In any case, the authors report LPIPs etc. They could do the same for other small models.


LPIPS and similar don't work if the scene is different, as happens if the random seed doesn't match. This is why they can use it to compare the quantised network, but not against networks with reduced numbers of weights.


I'm really confused, this looks like concern trolling because there's a live demo for exactly this A/B testing, that IIRC was near the top of the article, close enough it was the first link I clicked.

But you're quite persistent in that they need to address this, so it seems much more likely they silently added it after your original post, or you didn't click through, concern trolling would stay more vague


The demo is not what they're asking for. It compares original versus quantized. They want quantized versus a similar same-size in GB model.


>What I want to see is comparisons to NON-quantized models

isnt that the first image in the diagram / the 22GB model that took 111 seconds?


The next six words you didn't quote make all the difference.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: