More

xaskasdf · 2026-03-21T20:19:56 1774124396

It didn't had any quality loss, since the PSNT as quantization it's mainly to convert the model over the console constraints (you can convert any model you want, even when i trained a model for this hw); it's q8 quantization, so quality loss is negligible for these sizes. For the speed, I will fix the Tok/sec count since now drops 0 always for showing measures

PS: Thank you! And forgot to mention PSNT also supports bitnet models, they work like crap tho

SilentEditor · 2026-03-23T09:09:30 1774256970

Thats super helpful, thanks for the details. Makes sense now that PSNT is more of a transport/runtime format for the PS2 constraints than a quality hack.

Very cool that it supports bitnet too even if results are rough right now, feels like theres a lot of room to tune there over time. when you do fix tok/sec, are you planning to post per-stage timings too (tokenizer, weight stream, matmul, samppling)? would be awesome to see where the biggest bottleneck is on real hw

xaskasdf · 2026-02-22T15:07:31 1771772851

Ya know, here on the local market there are a bunch of optanes hanging around, I'll try to manage one to check if there's any improvement

jonassm · 2026-02-22T18:08:47 1771783727

Optanes will be good for latency, but not so much for BW which seems to be your major bottleneck if I'm not mistaken?

xaskasdf · 2026-02-22T20:06:24 1771790784

yeah, the mobo upgrade is something I gotta do anyway, so I'll cover that up more or less, the optane is something I didn't thought about

xaskasdf · 2026-02-22T15:03:07 1771772587

Actually is purely bandwidth-bound, the major bottleneck of the whole process, for me in this case, is the B450 mobo I got that's only capable of pcie3 and 1x8 in the pcie lanes for gpu instead of 1x16; so I'm capped until I get an X570 maybe. I should get around twice or triple the tok speed with that upgrade alone

xaskasdf · 2026-02-22T14:57:09 1771772229

Actually I can't go full tdp with a 650w PSU, I got to upgrade it asap

xaskasdf · 2026-02-22T14:43:49 1771771429

I updated the documentation to provide more info for the patching process, I added the patches themselves too and provided some risk info about the patches

xaskasdf · 2026-02-22T14:42:16 1771771336

I did it, but with different quantization compressions, It ran into quality issues, I will try to rerun with the same quants if that fixes the issue, but the most that looks unused, its being used by rotating layers that are being swapped by the cpu from the ram itself, that manages to keep layers warm, ready to use while inferencing and discarding already used ones

xaskasdf · 2026-02-22T14:39:45 1771771185

Did you even read anything? hahaha

xaskasdf · 2026-02-22T14:37:19 1771771039

Actually I'm thinking about buyin an AMD BC-250 that's bassically a ps5 with pcie factor format; and it's linux capable by default, maybe next month

xaskasdf · 2026-02-22T14:35:24 1771770924

This was the experiment itself https://github.com/xaskasdf/ps2-llm

The idea was basically to run a llm on a ps2, then I ran into some problems as the 32mb ram cap with 4mb vram cap; so I had to figure out a way to stream layers on the forward pass. Given that ps2 manages to give instructions directly to the vram that's capable of 32bit addresses, it gave an insane amount of tok/s, then I wondered if I could do the same on my puter

xaskasdf · 2026-02-22T14:27:00 1771770420

I got an m3, I will test it on metal and check how it goes