Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, those numbers are correct as of their testing (in June) although people who are really interested should check out the linked repo and do their own runs as software/optimizations have continued to change a lot and the RDNA3 side has a lot of untapped potential. Eg, the 7900 XTX has a huge theoretical FLOPS advantage over the 3090 but the results totally don't reflect that. One example of this hobbling is that RDNA3 only recently got backpass FA via a still under-optimized aotriton implementation: https://github.com/ROCm/aotriton/pull/39

There are also still ongoing optimizations on the Nvidia side as well. In the beginning of the year the 7900 XTX and 3090 were pretty close on llama.cpp inference performance, but a few months ago llama.cpp got CUDA graph and FA support implemented that boosted perf significantly for both my 3090 and 4090.

(For AI/ML, a used 3090 remains I think the best bang/buck for both inference and small training runs. You can pay twice as much for the twice as fast 4090, but at the end of the day you'll still wish you had more VRAM, so it's hard to really recommend unless you're going to use mixed precision. The RDNA3 cards are not as bad to work with as the Internet would have you believe, but they'd have to be a lot cheaper if your main use case was AI/ML for both the PITA factor and just from pure real-world performance.)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: