Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Various machines reboot randomly all the time. Given the amount of direct outdoor airflow that we push through the machines (we don't have fans on the GPUs), as soon as the GPUs stop running, they cool down very very quickly. That is the 'shock' you're looking for.

Why do they reboot? We run on the edge of peak OC tuning performance by default and I've built an automated tuner which downclocks individual cards. This way, they get more stable over time, while maintaining their best possible performance.

Occasionally, we would reset the tunings and then let them auto tune back... this accounted for the seasonal variances because hotter cards are more prone to crashing.



How often does the average machine reboot? If it's less often than 24 hours you're still putting the card under less thermal stress than someone who games for a half hour every evening. I'd buy your used GPU over a gamer's used GPU


Sometimes it can reboot 50+ times in a row. Each box has 12 gpus, so if I reset the tuning for the box, it can take a while to find the optimal settings because the voltage/clock tuning steps are very granular.

Again, this isn't an actual issue and I have the data to prove it.


Fair enough, I'll defer to your experience. Although if you're power cycling that much, I take it back, maybe I won't buy your cards :)


No, you want my cards because I've proven that reboots/thermal changes don't make any difference. =)

You wouldn't want my cards, because they don't have fans. Most people don't have adequate cooling for something like that.


> That is the 'shock' you're looking for.

It may still be a lot less 'shock' than normal use, where players have a 15 minute round, then low use for a couple minutes, etc, for hours.. and then turn the card off.

Thermal cycling is known to be bad for electronics-- this is well studied and documented. Sustained high temperatures are also bad, but it's only really bad when the temperatures are really high.


I'm pretty sure my cards have gone through all extreme different load situations that you could possibly make up in your head.

Certainly, thermal cycling can be an issue for electronics in general, but my experience with these specific cards says that it isn't an issue at all. At least certainly not as much as something that should dictate purchasing 'miner' cards or not.


Since you would know about every possible failure mode…

Do you know what causes NVIDIA cards to have their output turn off (black screen) and the fan to go 100%?

Been happening to my 2000-series recently but I don’t know what to try to fix: cooling, PSU, or capacitors…


My primary experience is with AMD cards.

My guess is a vbios or driver bug. You could also be running into a tuning issue. GPUs are amazingly complex beasts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: