Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am using 4.7 with the default extra high thinking, and it is clearly very stupid. It's worse than old Sonnet 4.5.

I had it suggest some parameters for BCFtools and it suggested parameters that would do the opposite of what I wanted to do. I pointed out the error and it apologized.

It also is not taking any initiative to check things, but wants me to check them (ie: file contents, etc.).

And it is claiming that things are "too complex" or "too difficult" when they are super easy. For instance refreshing an AWS token - somehow it couldn't figure out that you could do that in a cron task.

A really really bad downgrade. I will be using Codex more now, sadly.



You can’t make up your mind about a model by using it on one task. Especially to say it’s such a bad downgrade after that is ludicrous. I’ve had great experiences with it this morning.


That was more than one task. It was 3.

I also had Opus 4.7 and Opus 4.6 do audits of a very long document using identical prompts. I then had Codex 5.4 compare the audits. Codex found that 4.6 did a far better job and 4.7 had missed things and added spurious information.

I then asked a new session of Opus 4.7 if it agreed or disagreed with the Codex audit and it agreed with it.

I also agreed with it.


It's been dramatically better than any model I have ever used before on my tasks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: