More

nickandbro · 2026-03-29T19:35:15 1774812915

If anybody wants to checkout my site to learn the basics of vim. Here it is:

I proxy to neovim instances for each level. Still working out some kinks but soon to complete it

awesan · 2026-03-29T20:13:07 1774815187

Seems like you need an account just to try it.

nickandbro · 2026-03-29T20:40:24 1774816824

Yeah working on a smart way to rate limit stale requests for those who don't have accounts.But the final version will allow anybody who is not a bot, to get into a vim instance without logging in. Thanks for the feedback.

nickandbro · 2026-03-27T05:13:10 1774588390

I feel like we are just inching closer and closer to a world where rapid iteration of software will be by default. Like for example a trusted user makes feedback -> feedback gets curated into a ticket by an AI agent, then turned into a PR by an Agent, then reviewed by an Agent, before being deployed by an Agent. We are maybe one or two steps from the flywheel being completed. Or maybe we are already there.

jwpapi · 2026-03-27T11:15:08 1774610108

I just don’t see it coming. I was full on that camp 3 months ago, but I just realize every step makes more mistakes. It leads into a deadlock and when no human has the mental model anymore.

Don’t you guys have hard business problems where AI just cant solve it or just very slowly and it’s presenting you 17 ideas till it found the right one. I’m using the most expensive models.

I think the nature of AI might block that progress and I think some companies woke up and other will wake up later.

The mistake rate is just too high. And every system you implement to reduce that rate has a mistake rate as well and increases complexity and the necessary exploration time.

I think a big bulk of people is of where the early adaptors where in December. AI can implement functional functionality on a good maintained codebase.

But it can’t write maintable code itself. It actually makes you slower, compared to assisted-writing the code, because assisted you are way more on the loop and you can stop a lot of small issues right away. And you fast iterate everything•

I’ve not opened my idea for 1 months and it became hell at a point. I’ve now deleted 30k lines and the amount of issues I’m seeing has been an eye-opening experience.

Unscalable performance issues, verbosity, straight up bugs, escape hatches against my verification layers, quindrupled types.

Now I could monitor the ai output closer, but then again I’m faster writing it myself. Because it’s one task. Ai-assisted typing isn’t slower than my brain is.

Also thinking more about it FAANG pays 300$ per line in production, so what do we really trying to achieve here, speed was never the issue.A great coder writes 10 production lines per day.

Accuracy, architecture etc is the issue. You do that by building good solid fundamental blocks that make features additions easier over time and not slower

jwpapi · 2026-03-27T20:34:52 1774643692

Wow so many replies.

I think it goes down in two camps. AI is improving on these issues and people countering.

I don’t know for sure, but to me it seems the last 2 years weren’t necessarily 'intelligence' improvements but post-training improvement and tool connections, also reduced censorship.

I’m know using less AI than ever and I’ve been burning 1000USD/month before Claude Code. I have a couple of really fundamental functions built that help me to solve a big chunk of specific problems I can built a lot on that. Adding functionality became easier not more complicated.

I would think for these business problems that I’m facing AI is less than 30% of the time right. For example deciding on how to setup databases for max efficiency how to write efficient queries. Everything that in the end is really moat to you compared to your vibe coded competitors.

From my personal experience I’ve seen a lot of vibe-cded companies stuck and barely adding nec functionality or features and my guess is that they don’t trust changes anymore.

So even if AI would be as good as a really good coder one thing would still be missing a person that is knowing exactly what is happening.

And I mean okay it might be writing a form real quick. But a modern form needs to do a lot of things and if you have established patterns for all kind of inputs, the implementation is mundane.

It’s like when you learn coding, type it yourself to learn. So if you can’t scale the AI only codebase at one point you have to learn it, and I argue right now most efficient way is to write in it.

And I’m also arguing that it’s really tough to get a software so good that it’s actually an asset on the market vibe-coded only. It seems like its more of a drug for wannapreneurs than it is actually building an asset.

Like it builds you a Netflix clone, but what you see is barely the code you need to write a Netflix competitor.

onionisafruit · 2026-03-27T14:30:22 1774621822

I know it’s not your main point, but I’m curious where $300/line comes from. I don’t think I’ve ever seen a dollar amount attached to a line of production code before.

aspenmartin · 2026-03-27T12:20:16 1774614016

I think this sounds like a true yet short sighted take. Keep in mind these features are immature but they exist to obtain a flywheel and corner the market. I don’t know why but people seem to consistently miss two points and their implications

- performance is continuing to increase incredibly quickly, even if you rightfully don’t trust a particular evaluation. Scaling laws like chinchilla and RL scaling laws (both training and test time)

- coding is a verifiable domain

The second one is most important. Agent quality is NOT limited by human code in the training set, this code is simply used for efficiency: it gets you to a good starting point for RL.

Claiming that things will not reach superhuman performance, INCLUDING all end to end tasks: understanding a vague business objective poorly articulated, architecting a system, building it out, testing it, maintaining it, fixing bugs, adding features, refactoring, etc. is what requires the burden of proof because we literally can predict performance (albeit it has a complicated relationship with benchmarks and real world performance).

Yes definitely, error rates are too high so far for this to be totally trusted end to end but the error rates are improving consistently, and this is what explains the METR time horizon benchmark.

sobellian · 2026-03-27T13:13:53 1774617233

Scaling laws vs combinatorial explosion, who wins? In personal experience claude does exceedingly well on mundane code (do a migration, add a field, wire up this UI) and quite poorly on code that has likely never been written (even if it is logically simple for a human). The question is whether this is a quantitative or qualitative barrier.

Of course it's still valuable. A real app has plenty of mundane code despite our field's best efforts.

aspenmartin · 2026-03-27T13:21:12 1774617672

Combinatorial explosion? What do you mean? Again, your experiences are true, but they are improving with each release. The error rate on tasks continues to go down, even novel tasks (as far as we can measure them). Again this is where verifiable domains come in -- whatever problems you can specify the model will improve on them, and this improvement will result in better generalization, and improvements on unseen tasks. This is what I mean by taking your observations of today, ignoring the rate of progress that got us here and the known scaling laws, and then just asserting there will be some fundamental limitation. My point is while this idea may be common, it is not at all supported by literature and the mathematics.

sobellian · 2026-03-27T13:44:13 1774619053

The space of programs is incomprehensibly massive. Searching for a program that does what you need is a particularly difficult search problem. In the general case you can't solve search, there's no free lunch. Even scaling laws must bow to NFL. But depending on the type of search problem some heuristics can do well. We know human brains have a heuristic that can program (maybe not particularly well, but passably). To evaluate these agents we can only look at it experimentally, there is no sense in which they are mathematically destined to eventually program well.

How good are these types of algorithms at generalization? Are they learning how to code; or are they learning how to code migrations, then learning how to code caches, then learning how to code a command line arg parser, etc?

Verifiable domains are interesting. It is unquestionably why agents have come first for coding. But if you've played with claude you may have experienced it short-circuiting failing tests, cheating tests with code that does not generalize, writing meaningless tests, and at long last if you turn it away from all of these it may say something like "honest answer - this feature is really difficult and we should consider a compromise."

aspenmartin · 2026-03-27T14:12:49 1774620769

So what do you think the difference is between humans and an agent in this respect? What makes you think this has any relevance to the problem? everything is combinatorially explosive: the combination of words that we can string into sentences and essays is also combinatorially explosive and yet LLMs and humans have no problem with it. It's just the wrong frame of thinking for what's going on. These systems are obtaining higher and higher levels of abstractions because that is the most efficient thing for them to do to gain performance. That's what reasoning looks like: compositions of higher level abstractions. What you say may be true but I don't see how this is relevant.

"There is no sense in which they are mathematically destined to eventually program well"

- Yes there is and this belies and ignorance of the literature and how things work

- Again: RL has been around forever. Scaling laws have held empirically up to the largest scales we've tested. There are known RL scaling laws for both training and test time. It's ludicrous to state there is "no sense" in this, on the contrary, the burden of proof of this is squarely on yourself because this has already been studied and indeed is the primary reason why we're able to secure the eye-popping funding: contrary to popular HN belief, a trillion dollars of CapEx spend is based on rational evidence-based decision making.

> "How good are these types of algorithms at generalization"

There is a tremendously large literature and history of this. ULMFiT, BERT ==> NLP task generalization; https://arxiv.org/abs/2206.07682 ==> emergent capabilities, https://transformer-circuits.pub/2022/in-context-learning-an... ==> demonstrated circuits for in context learning as a mechanism for generalization, https://arxiv.org/abs/2408.10914 + https://arxiv.org/html/2409.04556v1 ==> code training produces downstream performance improvements on other tasks

> Verifiable domains are interesting. It is unquestionably why agents have come first for coding. But if you've played with claude you may have experienced it short-circuiting failing tests, cheating tests with code that does not generalize, writing meaningless tests, and at long last if you turn it away from all of these it may say something like "honest answer - this feature is really difficult and we should consider a compromise."

You say this and ignore my entire argument: you are right about all of your observations, yet

- Opus 4.6 compared to Sonnet 3.x is clearly more generalizable and less prone to these mistakes

- Verifiable domain performance SCALES, we have no reason to expect that this scaling will stop and our recursive improvement loop will die off. Verifiable domains mean that we are in alphago land, we're learning by doing and not by mimicking human data or memorizing a training set.

sobellian · 2026-03-27T14:32:57 1774621977

Hey man, it sounds like you're getting frustrated. I'm not ignoring anything; let's have a reasonable discussion without calling each other ignorant. I don't dispute the value of these tools nor that they're improving. But the no free lunch theorem is inexorable so the question is where this improvement breaks down - before or beyond human performance on programming problems specifically.

What difference do I think there is between humans and an agent? They use different heuristics, clearly. Different heuristics are valuable on different search problems. It's really that simple.

To be clear, I'm not calling either superior. I use agents every day. But I have noticed that claude, a SOTA model, makes basic logic errors. Isn't that interesting? It has access to the complete compendium of human knowledge and can code all sorts of things in seconds that require my trawling through endless documentation. But sometimes it forgets that to do dirty tracking on a pure function's output, it needs to dirty-track the function's inputs.

It's interesting that you mention AlphaGo. I was also very fascinated with it. There was recent research that the same algorithm cannot learn Nim: https://arstechnica.com/ai/2026/03/figuring-out-why-ais-get-.... Isn't that food for thought?

aspenmartin · 2026-03-27T14:54:24 1774623264

What is unreasonable? I am saying the claims you are making are completely contradicted by the literature. I am calling you ignorant in the technical sense, not dumb or unintelligent, and I don't mean this as an insult. I am completely ignorant of many things, we all are.

I am saying you are absolutely right that Opus 4.6 is both SOTA and also colossally terrible in even surprisingly mundane contexts. But that is just not relevant to the argument you are making which is that there is some fundamental limitation. There is of course always a fundamental limitation to everything, but what we're getting at is where that fundamental limitation is and we are not yet even beginning to see it. Combinatorics here is the wrong lens to look at this, because it's not doing a search over the full combinatoric space, as is the case with us. There are plenty of efficient search "heuristics" as you call them.

> They use different heuristics, clearly.

what is the evidence for this? I don't see that as true, take for instance: https://www.nature.com/articles/s42256-025-01072-0

> It's interesting that you mention AlphaGo. I was also very fascinated with it. There was recent research that the same algorithm cannot learn Nim: https://arstechnica.com/ai/2026/03/figuring-out-why-ais-get-.... Isn't that food for thought?

It's a long known problem with RL in a particular regime and isn't relevant to coding agents. Things like Nim are a small, adversarially structured task family and it's not representative of language / coding / real-world tasks. Nim is almost the worst possible case, the optimal optimal policy is a brittle, discontinuous function.

Alphago is pure RL from scratch, this is quite challenging, inefficient, and unstable, and why we dont do that with LLMs, we pretrain them first. RL is not used to discover invariants (aspects of the problem that don't change when surface details change) from scratch in coding agents as they are in this example. Pretraining takes care of that and RL is used for refinement, so a completely different scenario where RL is well suited.

sobellian · 2026-03-27T15:04:55 1774623895

I didn't make any claims contradicted by literature. The only thing I cited as bedrock fact, NFL, is a mathematical theorem. I'm not sure why Nim shouldn't be relevant, it's an exercise in logic.

> “AlphaZero excels at learning through association,” Zhou and Riis argue, “but fails when a problem requires a form of symbolic reasoning that cannot be implicitly learned from the correlation between game states and outcomes.”

Seems relevant.

troupo · 2026-03-27T14:56:50 1774623410

> So what do you think the difference is between humans and an agent in this respect?

Humans learn.

Agents regurgitate training data (and quality training data is increasingly hard to come by).

Moreover, humans learn (somewhat) intangible aspects: human expectations, contracts, business requirements, laws, user case studies etc.

> Verifiable domain performance SCALES, we have no reason to expect that this scaling will stop.

Yes, yes we have reasons to expect that. And even if growth continues, a nearly flat logarithmic scale is just as useless as no growth at all.

For a year now all the amazing "breakthrough" models have been showing little progress (comparatively). To the point that all providers have been mercilessly cheating with their graphs and benchmarks.

aspenmartin · 2026-03-27T15:15:13 1774624513

> Where did I say that? I didn’t even mention money, just the broader resource term. A lot of business are mostly running experiments if the current set of tooling can match the marketing (or the hype). They’re not building datacenters or running AI labs. Such experiments can’t run forever.

I'm just going to ask that you read any of my other comments, this is not at all how coding agents work and seems to be the most common misunderstanding of HN users generally. It's tiring to refute it. RL in verifiable domains does not work like this.

> Humans learn.

Sigh, so do LLMs, in context.

> Moreover, humans learn (somewhat) intangible aspects: human expectations, contracts, business requirements, laws, user case studies etc.

Literally benchmarks on this all over the place, I'm sure you follow them.

> Yes, yes we have reasons to expect that. And even if growth continues, a nearly flat logarithmic scale is just as useless as no growth at all.

and yet its not logarithmic? Consider data flywheel, consistent algorithmic improvements, synthetic data [basically: rejection sampling from a teacher model with a lot of test-time compute + high temperature],

> For a year now all the amazing "breakthrough" models have been showing little progress (comparatively). To the point that all providers have been mercilessly cheating with their graphs and benchmarks.

Benchmaxxing is for sure a real thing, not to mention even honest benchmarking is very difficult to do, but considering "all of the AI companies are just faking the performance data" to be the "story" is tremendously wrong. Consider AIME performance on 2025 (uncontaminated data), the fact that companies have a _deep incentive_ to genuinely improve their models (and then of course market it as hard as possible, thats a given). People will experiment with different models, and no benchmaxxing is going to fool people for very long.

If you think Opus 4.6 compared to Sonnet 3.x is "little progress" I think we're beyond the point of logical argument.

jwpapi · 2026-03-28T00:31:46 1774657906

Are you aware that LLms are still the same autocomplete just with different token decisions more data better pre and post training and settings

We have all the data now.

I don’t see where the huge gap should come from, as one person before they said they still make basic errors.

Models got better for a bunch of soft tuning. Language and abstractness is not really the same thing there are a lot of very good speakers that are terrible in logic and abstractness.

Thinking abstract sometimes makes it necessary to leave language and draw or som people even code in another coding language to get it.

We’ve seen it with the compiler project it’s nice looking but if you would want to make a competitive compiler you would be as far as starting fresh

nprateem · 2026-03-27T12:27:16 1774614436

But the issue isn't coding, it's doing the right thing. I don't see anywhere in your plan some way of staying aligned to core business strategy, forethought, etc.

The number of devs will reduce but there will still be large activities that can't be farmed out without an overall strategy

aspenmartin · 2026-03-27T12:39:23 1774615163

Why do you think this is a problem? Reasoning is constantly improving, it has ample access to humans to gather more business context, it has access to the same industry data and other signals that humans do, and it can get any data necessary. It has Zoom meeting notes, I mean why do people think there's somehow a fundamental limit beyond coding?

The other thing you're missing here is generalizability. Better coding performance (which is verifiable and not limited by human data quality) generalizes performance on other benchmarks. This is a long known phenomenon.

skydhash · 2026-03-27T14:22:09 1774621329

> Why do you think this is a problem?

Because it cannot do it?

Every investment has a date where there should be a return on that investment. If there’s no date, it’s a donation of resources (or a waste depending on perspective).

You may be OK with continuing to try to make things work. But others aren’t and have decided to invest their finite resources somewhere else.

aspenmartin · 2026-03-27T14:31:19 1774621879

> Because it cannot do it?

Ah ok so you didn't really read my comment, what is your counter argument? Models are just fundamentally incapable of understanding business context? They are demonstrably already capable of this to a large extent.

> Every investment has a date where there should be a return on that investment. If there’s no date, it’s a donation of resources (or a waste depending on perspective).

what are you implying here? This convo now turns into the "AI is not profitable and this is a house of cards" theme? That's ok, we can ignore every other business model like say Uber running at a loss to capture what is ultimately an absolutely insane TAM. Little ol' Uber accumuluated ~33B in losses over 14 years, and you're right they tanked and collapsed like a dying star...oh wait...hmm interesting I just looked at their market cap and it's 141 Billion.

> You may be OK with continuing to try to make things work. But others aren’t and have decided to invest their finite resources somewhere else.

I truly love that. If you want to code as a hobby that is fantastic, and we can go ahead and see in 2 years how your comment ages.

skydhash · 2026-03-27T14:47:37 1774622857

> They are demonstrably already capable of this to a large extent.

I’d very like to see such demonstration. Where someone hands over a department to an agent and let it makes decisions.

> This convo now turns into the "AI is not profitable and this is a house of cards" theme?

Where did I say that? I didn’t even mention money, just the broader resource term. A lot of business are mostly running experiments if the current set of tooling can match the marketing (or the hype). They’re not building datacenters or running AI labs. Such experiments can’t run forever.

jwpapi · 2026-03-28T01:23:16 1774660996

@skydhash I think aspenmartin is ragebaiting he can’t be for real.

aspenmartin · 2026-03-27T15:07:16 1774624036

> I’d very like to see such demonstration. Where someone hands over a department to an agent and let it makes decisions.

That's your bar for understanding business context? I thought we were talking about what you actually said which is: understanding business context. If I brainstorm about a feature it will be able to pull the compendium of knowledge for the business (reports, previous launches, infrastructure, an understanding of the problem space, industry, company strategy). That's business context.

> Where did I say that? I didn’t even mention money, just the broader resource term. A lot of business are mostly running experiments if the current set of tooling can match the marketing (or the hype). They’re not building datacenters or running AI labs. Such experiments can’t run forever.

I misunderstood you then, I wasn't sure what point you were trying to make. Is your point "companies are trying to cajole Claude to do X and it doesn't work and hasn't for the last year so they are giving up"? If so I think that is a wonderful opportunity for people that understand the nuance of these systems and the concept of timing.

nprateem · 2026-03-28T15:30:41 1774711841

You make the mistake of disregarding tacit knowledge - the stuff that isn't in reports, docs, etc, etc because it's just "how we do things" and picked up on the job.

Unless the AI is inserted into every conversation it won't discover this, or how it changes.

Even if it had access to all this documented it wouldn't then be able to account for politics, where Barry who runs analytics is secretly trying to sabotage the project so it ends up run by his team, etc.

embedding-shape · 2026-03-27T12:29:45 1774614585

> - coding is a verifiable domain

You're missing the point though. "1 + 1" vs "one.add(1)" might both be "passable" and correct, but it's missing the forest for the trees, how do you know which one is "long-term the right choice, given what we know?", which is the engineering part of building software, and less about "coding" which tends to be the easy part.

How do you evaluate, score and/or benchmark something like that? Currently, I don't think we have any methodologies for this, probably because it's pretty subjective in the end. That's where the "creative" parts of software engineering becomes more important, and it's also way harder to verify.

jorl17 · 2026-03-27T14:48:48 1774622928

While I agree we don't have any methodologies for this, it's also true that we can just "fail" more often.

Code is effectively becoming cheap, which means even bad design decisions can be overturned without prohibitive costs.

I wouldn't be surprised if in a couple of years we see several projects that approach the problem of tech debt like this:

1. Instruct AI to write tens of thousands of tests by using available information, documentation, requirements, meeting transcripts, etc. These tests MUST include performance AND availability related tests (along with other "quality attribute" concerns) 2. Have humans verify (to the best of their ability) that the tests are correct -- step likely optional 3. Ask another AI to re-implement the project while matching the tests

It sounds insane, but...not so insane if you think we will soon have models better than Opus 4.6. And given the things I've personally done with it, I find it less insane as the days go by.

I do agree with the original poster who said that software is moving in this direction, where super fast iteration happens and non-developers can get features to at least be a demo in front of them fast. I think it clearly is and am working internally to make this a reality. You submit a feature request and eventually a live demo is ready for you, deployed in isolation at some internal server, proxied appropriately if you need a URL, and ready for you to give feedback and have the AI iterate on it. Works for the kind of projects we have, and, though I get it might be trickier for much larger systems, I'm sure everyone will find a way.

For now, we still need engineers to help drive many decisions, and I think that'll still be the case.These days all I do when "coding" is talking (via TTS) with Opus 4.6 and iterating on several plans until we get the right one, and I can't wait to see how much better this workflow will be with smarter and faster models.

I'm personally trying to adapt everything in our company to have agents work with our code in the most frictionless way we can think of.

Nonetheless, I do think engineers with a product inclination are better off than those who are mostly all about coding and building systems. To me, it has never felt so magical to build a product, and I'm loving it.

embedding-shape · 2026-03-27T14:54:08 1774623248

> Code is effectively becoming cheap, which means even bad design decisions can be overturned without prohibitive costs.

I'm sorry, but only someone who never maintained software long-term would say something like this. The further along you are in development, the magnitude of costs related to changing that increases, maybe even exponentially.

Correct the design before you even wrote code, might be 100x cheaper (or even 1000x) than changing that design 2 years later, after you've stored TBs of data in some format because of that decision, and lots of other parts of the company/product/project depends on those choices you made earlier.

You can't just pile on code on top of code, say "code is cheap" and hope for the best, it's just not feasible to run a project long-term that way, and I think if you had the experience of maintaining something long-term, you'd realize how this sounds.

The easiest part of "software engineering" is "writing code", and today "writing code" is even easier. But the hardest parts, actually designing, thinking and maintaining, remains the same as before, although some parts are easier, others are harder.

Don't get me wrong, I'm on the "agentic coding" train as much as everyone else, probably haven't written/edited a code by myself for a year at this point, but it's important to be realistic about what it actually takes to produce "worthwhile software", not just slop out patchy and hacky code.

wyre · 2026-03-27T17:26:34 1774632394

I've never maintained software long-term so i could be wrong, but I interpret "code is cheap" to mean that you can have coding agents refactor or rewrite the project from scratch around the design correction. I don't think 'code is cheap' ever should be interpreted to mean ship hacky code.

I think using agents to prototype code and design will be a big thing. Have the agent write out what you want, come back with what works and what doesn't, write a new spec, toss out the old code and and have a fresh agent start again. Spec-driven development is the new hotness, but we know that the best spec is code, have the agent write the spec in code, rewrite the spec in natural language, then iterate.

aspenmartin · 2026-03-27T12:37:23 1774615043

because it has business context and better reasoning, and can ask humans for clarification and take direction.

You don't need to benchmark this, although it's important. We have clear scaling laws on true statistical performance that is monotonically related to any notion of what performance means.

I do benchmarks for a living and can attest: benchmarks are bad, but it doesn't matter for the point I'm trying to make.

embedding-shape · 2026-03-27T13:31:46 1774618306

I feel like you're missing the initial context of this conversation (no pun intended):

> Like for example a trusted user makes feedback -> feedback gets curated into a ticket by an AI agent, then turned into a PR by an Agent, then reviewed by an Agent, before being deployed by an Agent.

Once you add "humans for clarifications and take direction" then yeah, things can be useful, but that's far away from the non-human-involvment-loop earlier described in this thread, which is what people are pushing back against.

Of course, involving people makes things better, that's the entire point here, and that by removing the human, you won't get as good results. Going back to benchmarks, obviously involving humans aren't possible here, so again we're back to being unable to score these processes at all.

aspenmartin · 2026-03-27T14:15:22 1774620922

I'm confused on the scenario here. There is human in the loop, it's the feedback part...there is business context, it is either seeded or maintained by the human and expanded by the agent. The agent can make inferences about the world, especially when embodiment + better multimodal interaction is rolled out [embodiment taking longer].

Benchmarks ==> it's absolutely not a given that humans can't be involved in the loop of performance measurement. Why would that be the case?

troupo · 2026-03-27T13:23:41 1774617821

> because it has business context

It doesn't because it doesn't learn. Every time you run it, it's a new dawn with no knowledge of your business or your business context

> better reasoning

It doesn't have better reasoning beyond very localized decisions.

> and can ask humans for clarification and take direction.

And yet it doesn't, no matter how many .md file you throw at it, at crucial places in code.

> We have clear scaling laws on true statistical performance that is monotonically related to any notion of what performance means.

This is just a bunch of words stringed together, isn't it?

skydhash · 2026-03-27T14:15:46 1774620946

Almost every task that people are tackling agents on, it’s either not worth doing, can be done better with scripts and software, or require human oversight (that negates all the advantages.

aspenmartin · 2026-03-27T14:26:06 1774621566

I assume this is a troll because it's just so far removed from reality there's not much to say. "Almost every task" -- I'm sure you have great data to back this up. "It's not worth doing" well sure if you want to put your head in the sand and ignore even what systems today can do let alone the improvement trajectory. "can be done better with scripts and software" .... not sure if you realize this but agents write scripts and software. "or require human oversight (that negates all the advantages." it certainly does not; human oversight vs actual humans implementing the code is pretty dramatically more efficient and productive.

aspenmartin · 2026-03-27T14:18:38 1774621118

> It doesn't because it doesn't learn. Every time you run it, it's a new dawn with no knowledge of your business or your business context

It does learn in context. And lack of continuous learning is temporary, that is a quirk of the current stack, expect this to change rather quickly. Also still not relevant, consider that agentic systems can be hierarchical and that they have no trouble being able to grok codebases or do internal searches effectively and this will only improve.

> It doesn't have better reasoning beyond very localized decisions.

Do you have any basis for this claim? It contradicts a large amount of direct evidence and measurement and theory.

> This is just a bunch of words stringed together, isn't it?

Maybe to yourself? Chinchilla scaling laws and RL scaling laws are measured very accurately based on next token test loss (Chinchilla). This scales very predictably. It is related to downstream performance, but that relationship is noisy but clearly monotonic

troupo · 2026-03-27T14:45:13 1774622713

> It does learn in context

It quite literally doesn't.

It also doesn't help that every new context is a new dawn with no knowledge if things past.

> Also still not relevant, consider that agentic systems can be hierarchical and that they have no trouble being able

A bunch of Memento guys directing a bunch of other Memento guys don't make a robust system, or a system that learns, or a system that maintains and retains things like business context.

> and this will only improve.

We've heard this mantra for quite some time now.

> Do you have any basis for this claim?

Oh. Just the fact that in every single coding session even on a small 20kloc codebase I need to spend time cleaning up large amounts of duplicated code, undo quite a few wrong assumptions, and correct the agent when it goes on wild tangents and goose hunts.

> Maybe to yourself? Chinchilla scaling laws a

yap yap yap. The result is anything but your rosy description of these amazing reasoning learning systems that handle business context.

aspenmartin · 2026-03-27T15:02:08 1774623728

> It quite literally doesn't.

Awesome you've backed this up with real literature. Let's just include this for now to easily refute your argument which I don't know where it comes from: https://transformer-circuits.pub/2022/in-context-learning-an...

> It also doesn't help that every new context is a new dawn with no knowledge if things past.

Absolutely true that it doesn't help but: agents like Claude have access to older sessions, they can grok impressive amounts of data via tool use, they can compose agents into hierarchical systems that effectively have much larger context lengths at the expense of cost and coordination which needs improvement. Again this is a temporary and already partially solved limitation

> A bunch of Memento guys directing a bunch of other Memento guys don't make a robust system, or a system that learns, or a system that maintains and retains things like business context.

I think you are not understanding: hierarchical agents have long term memory maintained by higher level agents in the hierarchy, it's the whole point. It's annoying to reset model context, but yet you have a knowledge base of the business context persisted and it can grok it...

> We've heard this mantra for quite some time now.

yes you have, and it has held true and will continue to hold true. Have you read the literature on scaling laws? Do you follow benchmark progression? Do you know how RL works? If you do I don't think you will have this opinion.

> yap yap yap. The result is anything but your rosy description of these amazing reasoning learning systems that handle business context.

Well that's fine to call an entire body of literature "yap" but don't pretend like you have some intelligible argument, I don't see you backing up any argument you have here with any evidence, unlike the multitude of sources I have provided to you.

Do you argue things have not improved in the last year with reasoning systems? If so I would really love to hear the evidence for this.

troupo · 2026-03-27T15:46:48 1774626408

> Let's just include this for now to easily refute your argument which I don't know where it comes from: https://transformer-circuits.pub/2022/in-context-learning-an...

I love it when people include links to papers that refute their words.

So, Antropic (which is heavily reliant on hype and making models appear more than they are) authors a paper which clearly states: "tokens later in context are easier to predict and there's less loss of tokens. For no reason at all we decided to give this a new name, in-context learning".

> agents like Claude have access to older sessions, they can grok impressive amounts of data via tool use

That is they rebuild the world from scratch for every new session, and can't build on what was learned or built in the last one.

Hence continuous repeating failure modes.

10 years ago I worked in a team implementing royalties for a streaming service. I can still give you a bunch of details, including references to multiple national laws, about that. Agents would exhaust their context window just re-"learning" it from scratch, every time. And they would miss a huge amount of important context and business implications.

> Have you read the literature on scaling laws?

You keep referencing this literature as it was Holy Bible. Meanwhile the one you keep referring to, Chinchilla, clearly shows the very hard limits of those laws.

> Do you argue things have not improved in the last year with reasoning systems?

I don't.

Frankly, I find your aggressiveness quite tiring

aspenmartin · 2026-03-27T16:14:23 1774628063

> Frankly, I find your aggressiveness quite tiring

having to answer for opinions with no basis in the literature is I'm sure very tiring for you. Your aggression being met is I'm sure uncomfortable.

> I love it when people include links to papers that refute their words. > So, Antropic (which is heavily reliant on hype and making models appear more than they are) authors a paper which clearly states: "tokens later in context are easier to predict and there's less loss of tokens. For no reason at all we decided to give this a new name, in-context learning".

well I don't really love it when people just totally misread a paper because they have an agenda to push and can't seem to accept that their opinions are contradicted by real evidence.

in-context learning is not "later tokens easier" it’s task adaptation from examples in the prompt. I'm sure you realize this. Models can learn a mapping (e.g. word --> translation) from a few examples in the prompt, apply inputs within the same forward pass. That is function learning at inference time, not just "predicting later tokens better"

I'm sure also you're happy to chalk up any contradicting evidence to a grand conspiracy of all AI companies just gaming benchmarks and that this gaming somehow completely explains progress.

> That is they rebuild the world from scratch for every new session, and can't build on what was learned or built in the last one.

That they rebuild the world from scratch (wrong, they have priors from pretraining, but I accept your point here) does not mean they can't build on what was learned or built in the last one. They have access to the full transcript, and they have access to the full codebase, the diff history, whatever knowledge base is available. It's just disingenuous to say this, and then it also assumes (1) there is no mitigation for this, which I have presented twice before and you don't seem to understand it, (2) this is a temporary limitation, continual learning is one of the most important and well funded problems right now.

> 10 years ago I worked in a team implementing royalties for a streaming service. I can still give you a bunch of details, including references to multiple national laws, about that. Agents would exhaust their context window just re-"learning" it from scratch, every time. And they would miss a huge amount of important context and business implications.

also not an accurate understanding of how agents and their context work; you can use multiple session to digest and distill information useful in other sessions and in fact Claude does this automatically with subagents. It's a problem we have _already sort of solved today_ and that will continue to improve.

> You keep referencing this literature as it was Holy Bible. Meanwhile the one you keep referring to, Chinchilla, clearly shows the very hard limits of those laws.

You keep dismissing this literature as if you have understood it and that your opinion somehow holds more weight...Can you elaborate on why you think Chinchilla shows the hard limits of the scaling laws? Perhaps you're referring to the term capturing the irreducible loss? Is that what you're saying?

> Do you argue things have not improved in the last year with reasoning systems? I don't

Then are you arguing this progress will stop? I'm just not sure I understand, you seem to contradict yourself

troupo · 2026-03-27T23:05:05 1774652705

> having to answer for opinions with no basis in the literatu

Having only literature on your side must feel nice.

> They have access to the full transcript, and they have access to the full codebase, the diff history, whatever knowledge base is available.

Yes. And it means that they don't learn, and they alway miss important details when rebuilding the world.

That's why even the tiniest codebases are immediately filled with duplications, architecturally unsound decisions, invalid assumptions etc.

> also not an accurate understanding of how agents and their context work; you can use multiple session to digest and distill information useful in other sessions and in fact

I say: agents don't learn and have to rebuild the world from scratch

You: not an accurate understanding of how agents and their context work.... they rebuild the world from scratch every time they run.

> You keep dismissing this literature as if you have understood it

No. I'm dismissing your flawed interpretation of purely theoretical constructs.

Chinchilla doesn't project unlimited amazing scalability. If anything, it shows a very real end of scalability.

Anthropic's paper adopts a nice marketable term for a process that has little to do with learning.

Etc.

Meanwhile you do keep rejecting actual real-world behaviour of these systems.

> Then are you arguing this progress will stop? I'm just not sure I understand, you seem to contradict yourself

I didn't say that either. Your opponents don't contradict themselves if you only stop to pretend they think or say.

Your unsubstantiated belief is that improvements are on a steep linear or even exponensial progression. Because "literature" or something.

Looking past all the marketing bullshit, it could be argued that growth is at best logarithmic, and most improvments come from tooling around (harnesses, subagents etc.). While all the failure modes from a year ago are still there: misunderstanding context, inability to maintain cohesion between sessions, context pollution etc.

And providers are running into the issue of getting non-polluted trainig data.

---

At this point we're going around in circles, and I'm no interested in arguing with theorists.

Adieu

jwpapi · 2026-03-28T23:08:08 1774739288

Today a great video came about it: https://www.youtube.com/watch?v=vFUjcHhOpgA

chatmasta · 2026-03-27T05:36:24 1774589784

I love everything about this direction except for the insane inference costs. I don’t mind the training costs, since models are commoditized as soon as they’re released. Although I do worry that if inference costs drop, the companies training the models will have no incentive to publish their weights because inference revenue is where they recuperate the training cost.

Either way… we badly need more innovation in inference price per performance, on both the software and hardware side. It would be great if software innovation unlocked inference on commodity hardware. That’s unlikely to happen, but today’s bleeding edge hardware is tomorrow’s commodity hardware so maybe it will happen in some sense.

If Taalas can pull off burning models into hardware with a two month lead time, that will be huge progress, but still wasteful because then we’ve just shifted the problem to a hardware bottleneck. I expect we’ll see something akin to gameboy cartridges that are cheap to produce and can plug into base models to augment specialization.

But I also wonder if anyone is pursuing some more insanely radical ideas, like reverting back to analog computing and leveraging voltage differentials in clever ways. It’s too big brain for me, but intuitively it feels like wasting entropy to reduce a voltage spike to 0 or 1.

throwaw12 · 2026-03-27T08:39:54 1774600794

> I love everything about this direction except for the insane inference costs.

If this direction holds true, ROI cost is cheaper.

Instead of employing 4 people (Customer Support, PM, Eng, Marketing), you will have 3-5 agents and the whole ticket flow might cost you ~20$

But I hope we won't go this far, because when things fail every customer will be impacted, because there will be no one who understands the system to fix it

michaelmior · 2026-03-27T11:08:14 1774609694

I worry about the costs from an energy and environmental impact perspective. I love that AI tools make me more productive, but I don't like the side effects.

azan_ · 2026-03-27T14:11:11 1774620671

Environmental impact of ai is greatly overstated. Average person will make bigger positive impact on environment by reducing his meat intake by 25% compared with combined giving up flying and AI use.

setsewerd · 2026-03-27T23:36:13 1774654573

Is this before or after you account for the initial training impact? Because that would need to be factored in for a good faith calculation here, much as the companies would rather we didn't.

efromvt · 2026-03-27T14:19:12 1774621152

Inference costs at least seem like the thing that is easiest to bring down, and there's plenty of demand to drive innovation. There's a lot less uncertainty here than with architectural/capability scaling. To your point, tomorrow's commodity hardware will solve this for the demands of today at some point in the future (though we'll probably have even more inference demand then).

eksu · 2026-03-27T06:31:05 1774593065

This is the wrong way to see it. If a technology gets cheaper, people will use more and more and more of it. If inference costs drop, you can throw way more reasoning tokens and a combination of many many agents to increase accuracy or creativity and such.

gf000 · 2026-03-27T10:00:27 1774605627

> throw way more reasoning tokens and a combination of many many agents to increase accuracy or creativity and such.

But this is just not true, otherwise companies that can already afford such high prices would have already outpaced their competitors.

spacebanana7 · 2026-03-27T11:10:50 1774609850

No company at the moment has enough money operate with 10x the reasoning tokens of their competitors because they're bottlenecked by GPU capacity (or other physical constraints). Maybe in lab experiments but not for generally available products.

And I sense you would have to throw orders of magnitude more tokens to get meaningfully better results (If anyone has access to experiments with GPT 5 class models geared up to use marginally more tokens with good results please call me out though).

gf000 · 2026-03-28T18:29:09 1774722549

Well, how many more dogs would you need to help you write your university thesis? It's a logical fallacy to assume that more tokens would somehow help - especially that even with cursory use you would see that LLMs, once they go off the road, they are pretty much lost, and the best thing you can do with them is to give them a clear context.

mastermage · 2026-03-27T07:07:57 1774595277

I mean theoretically if there are many competitiors the costs of the product should generally drop because competition.

Sadly enough I have not seen this happening in a long time.

Leptonmaniac · 2026-03-27T06:04:19 1774591459

I think that as a user I'm so far removed from the actual (human) creation of software that if I think about it, I don't really care either way. Take for example this article on Hacker News: I am reading it in a custom app someone programmed, which pulls articles hosted on Hacker News which themselves are on some server somewhere and everything gets transported across wires according to a specification. For me, this isn't some impressionist painting or heartbreaking poem - the entity that created those things is so far removed from me that it might be artificial already. And that's coming from a kid of the 90s with some knowledge in cyber security, so potentially I could look up the documentation and maybe even the source code for the things I mentioned; if I were interested.

slopinthebag · 2026-03-27T06:54:37 1774594477

Art is and has always been about the creator.

raincole · 2026-03-27T10:02:43 1774605763

I don't want software that is built to be art. I want software that is built to provide facilities.

slopinthebag · 2026-03-27T16:34:25 1774629265

Cool, but it's actually not all about you (the consumer) at all.

vntok · 2026-03-27T08:51:57 1774601517

Take a walk in any museum, I'm pretty sure you'll react to some of the art displayed there and find it cool before you read the name of the artist.

egeozcan · 2026-03-27T11:04:32 1774609472

Dive into a forest, you'll find a couple of cool trees.

Art isn't about being cool. Art is about context.

When I tell people that art cannot be unpolitical, they react strongly, because they think about the left/right divide and how divided people are, where art is supposed to be unifying.

But art is like movement, you need an origin and a destination. Without that context, it will be just another... thing. Context makes it something.

rjknight · 2026-03-27T10:59:24 1774609164

It's not that you know the artist first and then say "this art is cool because I like the artist". The art is the means by which you know the artist. The more of their works you encounter, the closer you get to understanding the artist and what they are trying to communicate.

slopinthebag · 2026-03-27T16:35:37 1774629337

Of course. And yet, people still read the name and backstories anyways.

dominotw · 2026-03-27T11:37:33 1774611453

I dont mean this as a shade but ppl who are not coders now seem to think "coding is now solved" and seem to be pushing absurd ideas like shipping software with slack messages. These ppl are often high up in the chain and have never done serious coding.

Stripe is apparently pushing gazzaliion prs now from slack but their feature velocity has not changed. so what gives?

how is that number of pr is now the primary metric of productivity and no one cares about what is being shipped or if we are shipping product faster. Its total madness right now. Everyone has lost their collective minds.

rkomorn · 2026-03-27T11:42:27 1774611747

I ask myself the same question.

I'm not seeing the apps, SaaS, and other tools I use getting better, with either more features or fewer bugs.

Whatever is being shipped, as an end user, I'm just not seeing it.

dominotw · 2026-03-27T11:46:16 1774611976

cto and ceo are now feeling insane pressure to show how they are using ai but its not evident in output. So now they've resorted to blabbering publicly about prs, lines of code ect to save face. And ofcourse ppl giving them voice and platform have their own agendas that prevent them from asking "so what exactly have you shipped stripe from million pr/day".

Its baffling to see these comments on hacknernews though. I guess you have to prove that you are not a luddite by making "ai forward" predictions and show that you "get it"

duped · 2026-03-27T14:41:55 1774622515

I think a lot of SWE roles are really bullshit jobs (1) and these have been particularly susceptible to getting sniped with AI tools.

(1) https://en.wikipedia.org/wiki/Bullshit_Jobs

theredbeard · 2026-03-27T07:01:08 1774594868

We haven’t been inching closer to users writing a half-decent ticket in decades though.

fhub · 2026-03-27T10:12:22 1774606342

Solutions like https://bugherd.com/ might make the issue context capture part more accurate.

aembleton · 2026-03-27T08:05:46 1774598746

Maybe the agent can ask the user clarifying questions. Even better if it could do it at the point of submission.

slopinthebag · 2026-03-27T05:50:36 1774590636

What kind of software are people building where AI can just one shot tickets? Opus 4.6 and GPT 5.4 regularly fail when dealing with complicated issues for me.

withinboredom · 2026-03-27T06:04:45 1774591485

Not just complicated, but even simple ones if the current software is too “new” of a pattern they’ve never seen before or trained on.

slopinthebag · 2026-03-27T06:22:59 1774592579

I dunno if Rust async or native platform API's which have existed for years count as new patterns, but if you throw even a small wrench in the works they really struggle. But that's expected really when you look at what the technology is - it's kind of insane we've even gotten to this point with what amounts to fancy autocomplete.

thin_carapace · 2026-03-27T05:56:48 1774591008

i dont see anyone sane trusting ai to this degree any time soon, outside of web dev. the chances of this strategy failing are still well above acceptable margins for most software, and in safety critical instances it will be decades before standards allow for such adoption. anyway we are paying pennies on the dollar for compute at the moment - as soon as the gravy train stops rolling, all this intelligence will be out of access for most humans. unless some more efficient generalizable architecture is identified.

heavyset_go · 2026-03-27T07:05:43 1774595143

> as soon as the gravy train stops rolling, all this intelligence will be out of access for most humans. unless some more efficient generalizable architecture is identified.

All Chinese labs have to do to tank the US economy is to release open-weight models that can run on relatively cheap hardware before AI companies see returns.

Maybe that's why AI companies are looking to IPO so soon, gotta cash out and leave retail investors and retirement funds holding the bag.

PeterStuer · 2026-03-27T07:22:02 1774596122

They could still eliminate relatively cheap hardware.

thin_carapace · 2026-03-27T07:20:27 1774596027

i was under the impression that we were approaching performance bottlenecks both with consumer GPU architecture and with this application of transformer architecture. if my impression is incorrect, then i agree it is feasible for china to tank the US economy that way (unless something else does it first)

heavyset_go · 2026-03-27T08:02:20 1774598540

I think it just needs to be efficient or small enough for companies to deploy their own models on their hardware or cloud, for more inference providers to come out of the woodwork and compete on price, and/or for optimized models to run locally for users.

Regarding the latter, smaller models are really good for what they are (free) now, they'll run on a laptop's iGPU with LPDDR5/DDR5, and NPUs are getting there.

Even models that can fit in unified 64GB+ memory between CPU & iGPU aren't bad. Offloading to a real GPU is faster, but with the iGPU route you can buy cheaper SODIMM memory in larger quantities, still use it as unified memory, eventually use it with NPUs, all without using too much power or buying cards with expensive GDDR.

Qwen-3.5 locally is "good enough" for more than I expected, if that trend continues, I can see small deployable models eventually being viable & worthy competition, or at least being good enough that companies can run their own instead of exfiltrating their trade secrets to the worst people on the planet in real-time.

g947o · 2026-03-27T13:53:54 1774619634

I mean, they have been doing that for at least a year, and I haven't seen signs of US economy tanking?... You need to find some better arguments

heavyset_go · 2026-03-28T18:29:13 1774722553

There aren't any released open-weight models that are "good enough" yet, but Qwen-3.5 is getting really damn close to the point where more than half of my LLM usage gets routed to it.

I suspect, but don't know, some fields of inquiry will be fruitful when it comes to "good enough" small models. Especially when it comes to constrained tasks like software development. Software development models don't have to generalize to anything a chatbot can be asked or tasked with, the space it's required to generalize on is pretty small compared to literally the whole world.

If I was a betting man, I'd put my money where my mouth is, but I'm not. I am betting with my time and focus that smaller local models are worth it, and will be worth it, though.

slopinthebag · 2026-03-27T06:19:54 1774592394

Even in webdev it rots your codebase unchecked. Although it's incredibly useful for generating UI components, which makes me a very happy webslopper indeed.

thin_carapace · 2026-03-27T07:28:12 1774596492

im grateful to have never bothered learning web dev properly, it was enlightening witnessing chat gpt transform my ten second ms paint job into a functional user interface

m00x · 2026-03-27T06:20:47 1774592447

Several fintechs like Block and Stripe are boasting thousands of AI-generated PRs with little to no human reviews.

Of course it's in the areas where it doesn't matter as much, like experiments, internal tooling, etc, but the CTOs will get greedy.

slopinthebag · 2026-03-27T06:22:07 1774592527

I don't think anybody is doubting its ability to generate thousands of PR's though. And yes, it's usually in the stuff that should have been automated already regardless of AI or not.

sigseg1v · 2026-03-27T15:51:25 1774626685

Depends on your circle. On HN I would argue that there are still a fair number of people that would be surprised to see what heavy organizational usage of AI actually looks like. On a non programming online group, of which I am a member of several, people still think that AI agents are the same as they were in mid 2025 and they can't answer "how many R's are in the following word:". Same thing even when chatting with my business owner friends. The majority of the public has no clue of the scale of recent advancement.

nerptastic · 2026-03-28T03:25:43 1774668343

Not arguing, but I just prompted Opus with a made up word and it responded with this:

“There are 4 Rs in the word “burberrorrly.” Here they are highlighted: burberrorrly (positions 3, 6, 7, 9)”

Obviously not a real word, but perhaps the fundamental concept remains

thin_carapace · 2026-03-27T07:05:06 1774595106

these companies contribute to swathes of the west's financial infrastructure, not quite safety critical but critical enough, insane to involve automation here to this degree

girvo · 2026-03-27T10:17:50 1774606670

GPT 5.4 straight up just dies with broken API responses sometimes, let alone when it struggles with a even moderately complex task.

I still can't get a good mental model for when these things will work well and when they won't. Really does feel like gambling...

victorbjorklund · 2026-03-27T07:11:19 1774595479

Of course not all tickets are complex. Last week I had to fix a ticket which was to display the update date on a blog post next to the publish date. Perfect use case for AI to one shot.

yrds96 · 2026-03-27T23:50:41 1774655441

I'm using Opus on Claude Code and even on easy tasks, if you not review the changes properly, it creates tech debts. One of the most common issues is replicating the same logic with variables with different names (which makes grep harder to detect on future changes) in multiple places and lack of following project patterns. Even having a lot of .md files instructing to do the opposite. I still didn't find a workflow without human interaction that can be that efficient and reliable.

nerptastic · 2026-03-28T03:18:58 1774667938

I suppose at that point I’m wondering if it would have just been faster for… you, (I’m assuming) the developer to make that change and deploy it? Is the AI really faster on small changes like that, if you understand the platform/code/CI/CD enough???

Maybe for a non-dev it would be nice to submit a ticket and have it auto-fixed by an agent. But in the devs case, it feels like it would be faster to just do it manually.

jvuygbbkuurx · 2026-03-27T05:30:16 1774589416

Tusted user like Jia Tan.

heavyset_go · 2026-03-27T07:14:44 1774595684

Feedback loops like that would be an exercise in raising garbage-in->garbage-out to exponential terms.

It's the "robots will just build/repair themselves" trope but the robots are agents

TeMPOraL · 2026-03-27T08:14:48 1774599288

Yes. Next they'll want nanobots that build/repair themselves.

Oh wait. That's already here and is working fine.

heavyset_go · 2026-03-28T18:19:34 1774721974

Turns out we were the nanobots all along

MattGaiser · 2026-03-27T05:35:53 1774589753

I am already there with a project/startup with a friend. He writes up an issue in GitHub and there is a job that automatically triggers Claude to take a crack at it and throw up a PR. He can see the change in an ephemeral environment. He hasn't merged one yet, but it will get there one day for smaller items.

I am already at the point where because it is just the two of us, the limiting factor is his own needs, not my ability to ship features.

m00x · 2026-03-27T06:21:20 1774592480

Must be nice working on simple stuff.

jondwillis · 2026-03-27T06:01:41 1774591301

Why doesn’t he merge them?

MattGaiser · 2026-03-27T14:35:46 1774622146

He is not technical but a product guy, so he still wants me to check it over.

lancekey · 2026-03-27T12:03:33 1774613013

Ha I just SPECed out a version of this. I have a simple static website that I want a few people to be able to update.

So, we will give these 3 or 4 trusted users access to an on-site chat interface to request updates.

Next, a dev environment is spun up, agent makes the changes, creates PR and sends branch preview link back to user.

Sort of an agent driven CMS for non-technical stakeholders.

Let’s see if it works.

mindwok · 2026-03-27T08:13:42 1774599222

I think Anthropic will launch backend hosting off the back of their Bun acquisition very soon. It makes sense to basically run your entire business out of Claude, and share bespoke apps built by Claude code for whatever your software needs are.

pxtail · 2026-03-27T10:30:40 1774607440

100% its going to happen - also OpenAI will do same, there were already rumors about them building internal "github" which is stepping stone for that Also it is requirement for completing lock-in - the dream for these companies.

andy_ppp · 2026-03-27T09:51:51 1774605111

Users are often incorrect about what the software should actually be doing and don’t see the bigger picture.

EastLondonCoder · 2026-03-27T12:23:25 1774614205

I think some type of tickets can be done like this but your trusted user assumption does a lot of work here. Now I don't see this getting better than that with the current architecture of LLMs, you can do all sorts of feedback mechanisms which helps but since LLMs are not conscious drift is unavoidable unless there is a human in the loop that understands and steers what's going on.

But I do think even now with certain types of crud apps, things can be largely automated. And that's a fairly large part of our profession.

backscratches · 2026-03-27T11:11:37 1774609897

In the past three weeks a couple of projects I follow have implemented AI tools with their own github accounts which have been doing exactly this. And they appear to be doing good work! Dozens of open issues iterated, tested and closed. At one point i had almost 50 notification for one projects backlog being eradicated in 24 hours. The maintainer reviewed all of it and some were not merged.

tuo-lei · 2026-03-27T06:50:09 1774594209

The missing piece for me is post-hoc review.

A PR tells me what changed, but not how an AI coding session got there: which prompts changed direction, which files churned repeatedly, where context started bloating, what tools were used, and where the human intervened.

I ended up building a local replay/inspection tool for Claude Code / Cursor sessions mostly because I wanted something more reviewable than screenshots or raw logs.

yieldcrv · 2026-03-27T05:34:56 1774589696

We do feedback to ticket automatically

We dont have product managers or technical ticket writers of any sort

But us devs are still choosing how to tackle the ticket, we def don't have to as I’m solving the tickets with AI. I could automate my job away if I wanted, but I wouldn't trust the result as I give a degree of input and steering, and there’s bigger picture considerations its not good at juggling, for now

obastani · 2026-03-27T11:57:35 1774612655

I don't know if this is the future, but if it is, why bother building one version of the software for everyone? We can have agents build the website for each user exactly the way they want. That would be the most exciting possibility to come out of AI-generated software.

bwestergard · 2026-03-27T12:04:15 1774613055

"why bother building one version of the software for everyone?"

So one user's experience is relevant to another, so they can learn from one another?

shafyy · 2026-03-27T09:46:47 1774604807

Haha sure, let's just let every user add their feedback to the software.

eerikkivistik · 2026-03-27T14:05:23 1774620323

I know a company already operating like this in the fintech space. I foresee a front page headline about their demise in their future.

charcircuit · 2026-03-27T05:19:10 1774588750

Then sets up telemetry and experiments with the change. Then if data looks good an agent ramps it up to more users or removes it.

edf13 · 2026-03-27T07:11:31 1774595491

Or perhaps we end up where all software is self evolving via agents… adjusting dynamically to meet the users needs.

PeterStuer · 2026-03-27T08:08:04 1774598884

The "user" being the one that's in charge of the AI, not the person on the receiving end.

eru · 2026-03-27T06:52:18 1774594338

Instead of having a trusted user, you can also do statistics on many users.

(That's basically what A/B testing is about.)

bredren · 2026-03-27T05:49:54 1774590594

What you're describing is absolutely where we're headed.

But the entire SWE apparatus can be handled.

Automated A/B testing of the feature. Progressive exposure deployment of changes, you name it.

hyperionultra · 2026-03-27T06:55:21 1774594521

"Trusted user" also can be an Agent.

tossandthrow · 2026-03-27T05:33:23 1774589603

I think the Ai agent will directly make a PR - tickets are for humans with limited mental capacity.

At least in my company we are close to that flywheel.

_puk · 2026-03-27T06:06:34 1774591594

Tickets need to exist purely from a governance perspective.

Tickets may well not look like they do now, but some semblance of them will exist. I'm sure someone is building that right now.

No. It's not Jira.

tossandthrow · 2026-03-27T06:33:59 1774593239

Yes, so my point is that PRs act as that governance layer - with preview environments, you can see the complexity and risk of the change etc.

Gigachad · 2026-03-27T05:58:41 1774591121

The agents have even more limited capacity

eru · 2026-03-27T06:53:00 1774594380

At the moment, maybe. But it's growing.

Gigachad · 2026-03-27T07:48:14 1774597694

Even so they would probably still benefit from intermediate organisational steps.

eru · 2026-03-27T07:56:14 1774598174

For a while, sure.

overfeed · 2026-03-27T08:15:28 1774599328

> I feel like we are just inching closer and closer to a world where rapid iteration of software will be by default.

There's a lots of experimentation right now, but one thing that's guaranteed is that the data gatekeepers will slam the door shut[1] - or install a toll-booth when there's less money sloshing about, and the winners and losers are clear. At some point in the future, Atlassian and Github may not grant Anthropic access to your tickets unless you're on the relevant tier with the appropriate "NIH AI" surcharge.

1. AI does not suspend or supplant good old capitalism and the cult of profit maximization.

eranation · 2026-03-27T05:58:59 1774591139

Um, we are already there...

eranation · 2026-04-01T04:17:38 1775017058

Not sure why the downvote, I'm seeing this happening...

nickandbro · 2026-03-25T17:15:54 1774458954

Without getting into some of the other things mentioned in the article,

I don't think Vim is going away. Even with all the AI code written, engineers navigate through Claude Code / Codex using Vim (ex: Vim mode in Claude Code).

I really like Vim so much that I've built a gamified way to learn it at https://vimgolf.ai that I am working on completing.

nickandbro · 2026-03-09T17:55:17 1773078917

This looks amazing man, seriously good job with this.

nickandbro · 2026-03-09T05:23:21 1773033801

very cool

nickandbro · 2026-03-09T04:25:00 1773030300

First thing is going outside. Staying inside is only going to compound depression and cause days gone by to become a blur. Just doing that is a step in the right direction, bonus points if it involves moving the body.

Second, start writing. Writing can help prevent circular thoughts and force the brain to plan how to change your lifestyle and get you to a life worth living. If you don’t write, it’s very easy to repeat patterns over and over again.

Third, don’t lose hope. Having a positive outlook and a growth mindset is going to help you navigate setback and push through obstacles. It’s easy to resort to a fixed mindset, but the saints are the sinners who keep on trying as they say.

Best of luck!

nickandbro · 2026-03-05T18:52:21 1772736741

Beat Simon Willison ;)

https://www.svgviewer.dev/s/gAa69yQd

Not the best pelican compared to gemini 3.1 pro, but I am sure with coding or excel does remarkably better given those are part of its measured benchmarks.

GaggiX · 2026-03-05T18:53:31 1772736811

This pelican is actually bad, did you use xhigh?

nickandbro · 2026-03-05T18:54:23 1772736863

yep, just double checked used gpt-5.4 xhigh. Though had to select it in codex as don't have access to it on the chatgpt app or web version yet. It's possible that whatever code harness codex uses, messed with it.

nubg · 2026-03-05T20:05:16 1772741116

this is proof they are not benchmaxxing the pelican's :-)

nickandbro · 2026-03-03T18:53:12 1772563992

Wonder when 5.3 thinking will be released?

naiv · 2026-03-03T19:59:22 1772567962

It won't. They will go straight to 5.4 for thinking: https://x.com/OpenAI/status/2028909019977703752

nickandbro · 2026-03-03T17:18:31 1772558311

Is this long Covid or depression or Major Depressive Disorder (MDD)? Because in her earlier videos she talks about becoming bed-bound again due to her emotional state after finding news her friend who had a similar condition died.

dirck-norman · 2026-03-03T17:30:03 1772559003

As someone who suffers from a complex autoimmune disorder which has caused dysautonomia and suspected mitochondrial dysfunction, stress flares and exacerbates symptoms. This has a physiological basis in the complex way the HPA axis/cortisol affects us at the cellular level. My primary diagnosis is sarcoidosis with small fiber neuropathy, but they don’t fully understand all the mechanisms of auto-immune fatigue and dysregulation.

nickandbro · 2026-03-03T17:31:53 1772559113

Sorry to hear. Thanks for explaining.

nablaxcroissant · 2026-03-03T17:21:47 1772558507

It was essentially long covid. me/cfs or chronic fatigue syndrome induced by covid infection

nickandbro · 2026-03-03T17:23:54 1772558634

Wow, did not know that. Thanks

KaiserPro · 2026-03-03T17:43:33 1772559813

Dunno, personally I don't think its that much of my business. Sure I'm curious, but that doesn't mean I have a right, or that its a nice thing™ to publicly speculate

hinkley · 2026-03-03T22:17:43 1772576263

She’s been pretty open about it. She even touched on it at the end of this video.

KaiserPro · 2026-03-04T08:22:26 1772612546

She has, and I commend her choice. but that doesn't mean its good form to further speculate.

cwbrandsma · 2026-03-04T19:18:28 1772651908

Long Covid and ME/CFS can make you fragile. At least it has for me. I never used to get emotionally overwhelmed, now I do. I don't have it as bad as she does, but I've been bedbound because of this. And being sick for months to years, already leaves you feeling on the brink of depression.

ChrisClark · 2026-03-03T17:55:11 1772560511

I'm quite sure being that terribly sick could cause depression yeah, but that's not the reason

nickandbro · 2026-02-26T16:30:35 1772123435

These image gen models are getting so advanced and life like that increasingly the general public are being duped into believing AI images are actually real (ex Facebook food images or fake OF models). Don't get me wrong I will enjoy the benefits of using this model for expressing myself better than ever before, but can't help feeling there's something also very insidious about these models too.

WarmWash · 2026-02-26T16:33:26 1772123606

It's more likely than not that every single person who uses the internet has viewed an AI image and taken it as real by now.

The obvious ones stand out, but there are so many that are indiscernible without spending lots of time digging through it. Even then there are ones that you can at best guess it's maybe AI gen.

WD-42 · 2026-02-26T17:48:03 1772128083

People will continue to retreat into walled, trusted networks where they can have more confidence in the content they see. I can’t even be sure I’m responding to a real person right now.

akutlay · 2026-02-27T02:39:53 1772159993

As long as HackerNews community keeps the quality of the conversation high (with or without AI), I don’t think many of us will question this too much

abustamam · 2026-02-27T06:11:54 1772172714

Just the other day, I saw a comment on HN accuse another comment of being AI for no good reason. I personally thought the comment was fine.

I know it's an unpopular opinion, but I don't really read too deeply into whether text is AI generated or not. On social platforms like HN I tend to just skim many comments anyway so it's not like the concept of "they spent no time writing so you shouldn't spend time reading" really applies.

I know some people use apps like Grammarly to improve their language and stuff, which I can respect. But at what point do we draw the line between AI assisted text and AI generated text?

I sometimes use AI to do research into the nuance of some topics to help me formulate a response and synthesize ideas, but if I ever get to the point where I'd be asking AI generate a response to the comment then I find it better to just not respond at all.

srh01 · 2026-02-28T09:57:47 1772272667

I think it depends on the purpose of the comment. I can see why someone may get frustrated with AI text if they were say looking for advice on xyz, as you'd usually want someone with personal or senior experience. If I want career advice, AI will give me a predictable response similar to LinkedIn, but something from a successful person in the industry will have a lot more trust and substance in it.

tokai · 2026-02-26T17:03:31 1772125411

Maybe not an actual argument for anything, but even before these image models everyone that used the internet had seen a doctored image they believed to be real. There was a reason that 'i can tell by the pixels' was a meme.

versk · 2026-02-26T16:38:35 1772123915

At the point now where basically any photo that isn't shared by someone I trust or a reputable news organisation is essentially unverifiable as being real or not

The positive aspect of this advance is that I've basically stopped using social media because of the creeping sense that everything is slop

yen223 · 2026-02-26T21:14:27 1772140467

At least some of the comments here are likely AI-generated

yieldcrv · 2026-02-26T16:46:37 1772124397

people only notice when they are prompted to look for AI or scrutinize AI

a lot of these accounts mix old clips with new AI clips

or tag onto something emotional like a fake Epstein file image with your favorite politician, and pointing out its AI has people thinking you’re deflecting because you support the politician

Meanwhile the engagement farmer is completely exempt from scrutiny

Its fascinating how fast and unexpected the direction goes

kevincox · 2026-02-26T16:44:56 1772124296

I actually think this was a good thing. Manipulating images incredibly convincingly was already possible but the cost was high (many hours of highly skilled work). So many people assumed that most images they were seeing were "authentic" without much consideration. By making these fake images ubiquitous we are forcing people to quickly learn that they can't believe what they see on the internet and tracking down sources and deciding who you trust is critically important. People have always said that you can't believe what you see on the internet, but unfortunately many people have managed without major issue ignoring this advice. This wave will force them to take that advice to heart by default.

slfnflctd · 2026-02-26T17:36:48 1772127408

I remember telling my parents at a young age that I couldn't be sure Ronald Reagan was real, because I'd only ever seen him on TV and never in real life, and I knew things on TV could be fake.

That was the beginning of my journey into understanding what proper verification/vetting of a source is. It's been going on for a long time and there are always new things to learn. This should be taught to every child, starting early on.

abustamam · 2026-02-27T06:18:26 1772173106

I agree. Too many adults are fooled by fake news and propaganda and false contexts. And CNN and Fox are more than happy to take advantage of this.

My personal rule of thumb is if it generates outrage, it's probably fake, or at least a fake interpretation. I know that outrageous stuff actually happens pretty often, so I'll dig into things I find interesting. But most of the time it's all just garbage for clicks.

arkmm · 2026-02-26T19:22:07 1772133727

I used to also have this optimistic take, but over time I think the reality is that most people will instead just distrust unknown online sources and fall into the mental shortcuts of confirmation bias and social proof. Net effect will be even more polarization and groupthink.

manuelabeledo · 2026-02-26T17:00:03 1772125203

> By making these fake images ubiquitous we are forcing people to quickly learn that they can't believe what they see on the internet and tracking down sources and deciding who you trust is critically important.

Has this thought process ever worked in real life? I know plenty of seniors who still believe everything that comes out of Facebook, be AI or not, and before that it was the TV, radio, newspapers, etc.

Most people choose to believe, which is why they have a hard time confronting facts.

rootusrootus · 2026-02-26T17:42:29 1772127749

> I know plenty of seniors

And not just seniors. I see people of all ages who are perfectly happy to accept artificially generated images and video so long as it plays to their existing biases. My impression is that the majority of humanity is not very skeptical by default, and unwilling to learn.

ralfd · 2026-02-27T08:52:45 1772182365

Yes. People willingly accept made up text (stories) if it fits their world view, and for words we always knew that they could be untrue. Why should it be different for images/audio/video?

pixl97 · 2026-02-27T14:14:26 1772201666

As they say, people have accepted made up religions for thousands of years.

ByThyGrace · 2026-02-27T00:21:58 1772151718

> By making these fake images ubiquitous we are forcing people to quickly learn

That's quite the high opinion on the self-improvement ability of your Average Joe. This kind of behavior only comes with an awareness, previously learned, and an alertness of mind. You need the population at large to be able to do this. How if not, say, teaching this at schools and waiting for the next generation to reach adulthood, would you expect this to happen?

kevincox · 2026-02-27T01:11:44 1772154704

I agree that improvement for the Average Joe will be very hard. I also think that taking more attention to teach the younger generation is vitally important. But mostly I don't see an alternative. I don't think we can protect people from fake information without giving up our freedom, and that isn't a viable alternative in my mind. So what is left but trying our hardest to teach people to think critically?

abustamam · 2026-02-27T06:24:16 1772173456

Our institutions have been trying to get our kids to think critically for a while. At least when I was in school, we didn't focus a lot on memorization (sometimes we did, like memorizing the times tables or periodic table). My teachers tried to instill in us an understanding of the concepts, something I took for granted. Many of my classmates have gone on to become lawyers, doctors, other prestigious careers.

But I feel like we live in a different time now. I hear teachers tell stories about school admin siding with parents instead of teachers, and the kids aren't learning anything. Anecdotally of course.

I think our teachers really want the kids to think critically. But parents and schools don't seem to value that anymore.

0x457 · 2026-02-26T20:37:15 1772138235

When it comes to graphic content on the internet I usually consume it's for entertainment purposes. I didn't care where it came from before and don't care today either. Low quality content exists in both categories, a bit easier to spot in AI generated, so it's actually a bonus.

lm28469 · 2026-02-26T16:48:12 1772124492

I feel like there is one or two generations of people who are tech savy and not 100% gullible when it comes to online things. Older and younger generations are both completely lost imho, in a blind test you wouldn't discern a monkey from a human scrolling tiktok &co

manuelabeledo · 2026-02-26T17:26:43 1772126803

How so? This "tech savvy and not 100% gullible" generation, gave birth to a political landscape dominated by online ragebait.

lm28469 · 2026-02-26T17:57:57 1772128677

Boomers used to tell us to never trust anything online and now they send their life savings to "Brad Pitt"

New generations gets unlimited brain rot delivered through infinite scroll, don't know what a folder is, think everything is "an app" and keep falling for the "technology will free us from work and cure cancer"

There was a sweet spot during which you could grow alongside the internet at a pace that was still manageable and when companies and scammers weren't trying so hard to robbyou from your time money and attention

anigbrowl · 2026-02-27T08:45:54 1772181954

And if they don't?

Your post seems a little naive to me, a lot of people are just not interested in putting in the work or confronting their own confirmation bias, and there's an oversupply of bad actors who will deliberately generate fake imagery for either deception or exhaustion. Many people are just not on quest for truth and are more interested in the activation potential of images or allegations than in the factual reliability.

toraway · 2026-02-27T01:07:19 1772154439

In reality: millions of boomers are scrolling FB this very minute reacting to the most obviously fake rage/surprise/love bait AI slop you've ever seen.

kevincox · 2026-02-27T01:09:58 1772154598

They were scrolling through fake bait long before generative AI

Natfan · 2026-02-27T19:55:18 1772222118

but now it is even harder to distinguish

whynotmaybe · 2026-02-26T16:40:27 1772124027

>fake OF models

Soon many real OF models will be out of job when everyone will be able to produce content to their personal taste from a few prompts.

sodacanner · 2026-02-26T16:59:20 1772125160

People already have access to every form of niche pornography they could dare to imagine (for absolutely free!), I really doubt that 'personal taste' is the part that makes OF models their money. They'll be fine.

sosodev · 2026-02-26T17:12:20 1772125940

I think you're under-estimating how much personal taste applies in that industry. Yes, there's a lot of free content but it's often low quality and/or difficult to find for a particular niche. The OF pages, and other paid sites, are curated collections of high quality stuff that can satisfy particular cravings repeatedly with minimal effort.

A big part of it also the feeling of "connection" with the creator via messages and what not, but that too can be replicated (arguably better) by AI. In fact, a lot of those messages are already being generated haha.

sodacanner · 2026-02-26T17:46:55 1772128015

I was mostly hinting towards the 'connection' part of it, yes - I think that's really where the money is made more than anything else. That's the part that'll start killing the industry once some company tunes it in.

abraxas · 2026-02-26T21:44:37 1772142277

This is the dystopia of that pacified moon from "Mold of Yancy" by PKD but taken to the next level.

What's astonishing abut the present is that even PKD did not foresee the possibility of an artificial being not only being constructed from whole cloth but actually tailored to each individual.

pixl97 · 2026-02-26T22:33:36 1772145216

We looked forward to the future, but it turns out the future smashed into our blind spot from the side.

deklesen · 2026-02-26T19:46:59 1772135219

For a podcast on this topic (niche pornography and how it was affected by the advent of pornhub and the likes) check out "the butterfly effect"

dfxm12 · 2026-02-26T17:31:04 1772127064

I don't think so. Talking to people in this space, I've found out about broad camps. There are probably more:

-They simply aren't into real women/men (so you couldn't even pay a model to do what they're looking for).

-They want to play out fantasies that would be hard to coordinate even if you could pay models (I guess this is more on the video side of things, but a string of photos can put be together into a comic)

-They want to generate imagery that would be illegal

Based on this, I would guess fetish artists (as in illustrators) are more at risk than OF models. However, AI isn't free. Depending on what you're looking for, commissions might be cheaper still for quite a while...

whynotmaybe · 2026-02-26T18:48:43 1772131723

Lily Allen Says Her OnlyFans Feet Pictures Make More Money Than Spotify Streams: ‘Don’t Hate the Player, Hate the Game’ : https://variety.com/2024/music/news/lily-allen-onlyfans-feet...

mjr00 · 2026-02-26T19:26:46 1772134006

Even ignoring the model censorship making high quality sexual imagery/videos not possible, this is a crazy take. You think OF models are making money because it's the only way to see a nude man/woman with particular characteristics on the internet?

You're completely misunderstanding what the product being sold is.

mfkp · 2026-02-26T19:40:22 1772134822

If you don't think that OF models are using AI to reply to incoming chats from users, well I've got a bridge to sell ya.

mjr00 · 2026-02-26T19:50:25 1772135425

No, I don't think OF models aren't using AI to respond to chat. Where did I say I thought that?

mfkp · 2026-02-26T19:54:03 1772135643

Then please explain what you're talking about.

Sharlin · 2026-02-26T21:24:14 1772141054

Their point is that the point of OF is that there is (supposed to be) a real human. It's a (para)social relationship that no image generator model is going to give you.

pixl97 · 2026-02-26T22:37:02 1772145422

If you can make X money running one client at a time you can make ( X × N ) money if you work with N clients at a time. You have to give just enough human to keep'em hooked.

mfkp · 2026-02-26T21:26:35 1772141195

Yes, and my point is that the (supposedly) real human is also AI. You're chatting with a bot.

tehjoker · 2026-02-27T04:55:51 1772168151

they often contact that work out, i wouldnt be surprised if some of that is already ai. cheaper than hiring if you get it right

sekai · 2026-02-26T17:29:09 1772126949

> Soon many real OF models will be out of job when everyone will be able to produce content to their personal taste from a few prompts.

net positive to society

fwip · 2026-02-26T18:47:30 1772131650

In what way? Certainly not for the models, who lose their income/job. Probably not better for the consumer, either.

blibble · 2026-02-26T19:27:55 1772134075

or the taxpayer

the high end probably pay the same sort of tax as professional footballers

dzhiurgis · 2026-02-27T01:12:11 1772154731

Sex work shouldn't be shunned, but it's not a normal profession either. Mental health, addiction and abuse is just as much of a problem online and in countries where prostitution is legal and normalized.

jpadkins · 2026-02-26T22:36:08 1772145368

lose the income, but likely they will live a more fulfilling life.

pixl97 · 2026-02-26T22:39:08 1772145548

More fulfilling life starving on the streets with beginner programmers looking for a job?

baal80spam · 2026-02-26T16:44:16 1772124256

And this can't come soon enough.

noumenon1111 · 2026-02-26T16:55:50 1772124950

Coming soon... YOU!