Long story but the tldr is that it happened in two stages. First someone figured out a way to reliably reproduce the problem. And then I spent a very long time single stepping through machine instructions until I had a eureka moment.
Actually, I remember reporting the bug to the compiler authors and being stunned when they told me that they were not going to issue a new version with the bug fix because the project was no longer being funded. (This was the T dialect of Lisp in case you're wondering.)
Fortunately, it only happened when the robot's arm was moving, and we were mostly doing mobility research so we were able to be productive simply by not using the arm.