It’s a failure that it created and when told to fix it did this. It’s beyond bad...

BeetleB · on May 25, 2025

> It's a failure that it created and when told to fix it did this. It’s beyond bad.

No one's disputing this was bad. People are merely claiming it can also be good. I've dealt with plenty of humans this bad - it's not an argument that humans can't program.

callc · on May 25, 2025

It seems like the underlying issue is trust.competent programmer - even a junior - and trust them to finish the task correctly. It might take multiple tries, and they may ask for clarification, but since they’re human, we trust they are intelligent.

There are some people who fall into the bucket that we can’t trust them to finish the task correctly, or within a time frame or level of effort on our part to make the task offloading exercise have a positive benefit.

If we view LLMs in the same light, IMO currently they fall into “not trust” category to really give they a task and trust them to finish it correctly, with us being confident we don’t really need to understand their implementation.

If one day LLMs or some other solution reaches that point, then it definitely won’t look like a bubble, but a real revolution.

BeetleB · on May 25, 2025

Very well put. The trick is to do either of the following:

1. Find simpler tasks for which the trust in LLMs is high.

2. Give tasks to the LLMs that have a very low cost to verify (even when the task is not simple) - particularly one off scripts.

I once had a colleague who was in the "not trust" bucket for the work we were doing. So we found something he was good at that was a pain for me to do, and re-assigned him to do those things and take that burden off of us.

In the last few months I've had the LLM solve (simple) problems via code that had been in my head for years. At any point I could have done them, but they were a chore. If the LLM failed for one of these tasks - it's not a big deal - not much time was lost. But they tend to succeed fairly often, because they are simple tasks.

I almost never let the LLM write production code, because of the extra burden that you and others allude to. But I do let it write code I rely on in my personal life, because frankly I tend to write pretty poor code for my personal use - I can't justify the time it would take to write things well - life is too busy. I welcome the code quality I get from Sonnet or Gemini 2.5 Pro.

That's my point in this thread. Writing code is a pretty diverse discipline, and many are dismissing it simply because it doesn't do one particular use case (high quality production code) well.

I didn't take LLM coding seriously until I found well respected, well known SW engineers speak positively about them. Then I tried it and ... oh wow. People dismissing them is dismissing not only a lot of average developers' reality, but also a lot of experts' daily reality.

Just look at the other submission:

https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-...

He used an LLM to find a security vulnerability in the kernel. To quote him:

> Before I get into the technical details, the main takeaway from this post is this: with o3 LLMs have made a leap forward in their ability to reason about code, and if you work in vulnerability research you should start paying close attention. If you’re an expert-level vulnerability researcher or exploit developer the machines aren’t about to replace you. In fact, it is quite the opposite: they are now at a stage where they can make you significantly more efficient and effective. If you have a problem that can be represented in fewer than 10k lines of code there is a reasonable chance o3 can either solve it, or help you solve it.