Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There’s about a 0% chance that kind of emergent, secret reasoning is going on.

Far more likely: 1) they are mistaken of lying about the published system prompt, 2) they are being disingenuous about the definition of “system prompt” and consider this a “grounding prompt” or something, or 3) the model’s reasoning was fine tuned to do this so the behavior doesn’t need to appear in the system prompt.

This finding is revealing a lack of transparency from Twitxaigroksla, not the model.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: