One of Anthropic's ostensive ethical goals is to produce AI that is "understanda...

rimliu · 2026-04-24T11:07:05 1777028825

broken cache does not breakthrough make.

pxc · 2026-04-26T17:20:50 1777224050

Huh? I never claimed they made a breakthrough. My point is that if they really have an alignment or interpretability breakthrough, they won't have to tell us by virtue signaling about how safe it is. Users will just be able to tell because it will eliminate or drastically reduce problems like #3 in the OP. The outcomes of prompt changes remaining unpredictable tells us it's still a black box.