More

jgraettinger1 · 2026-05-28T21:44:04 1780004644

At Estuary, we have an in-house Rust crate [1] for building scale-out durable actors / FSMs in Postgres. It powers all async activity in our control plane -- slews of fine-grain scheduled actions, complex change propagation through data-flow topologies, reliable alert and email delivery, and more -- at hundreds to thousands of state transitions per second (today). It's been a wonderful pattern to build on, and is all of three source files.

Here's a an example computing a Fibonacci sequence (very inefficiently, with lots of spawned sub-tasks and message passing) [2]

[1] https://github.com/estuary/flow/tree/master/crates/automatio... [2] https://github.com/estuary/flow/blob/master/crates/automatio...

jgraettinger1 · 2026-05-16T05:32:45 1778909565

The right metaphor isn't painting, though, it's molding clay. That first pass is slop, but it's raw clay that the agent is very good at molding given a modicum of direction and "not this, do that" comments. The combined first-pass and reshaping time is still far less than writing by hand from scratch. And increasingly, that first pass is ... not bad?

zarzavat · 2026-05-16T06:12:57 1778911977

Not all code is fixable. Sometimes the best thing to do with code is to throw it away.

Without any human code to grab on to, AI has a habit of writing code that is pervasively low quality and rife with misunderstandings such that it always needs to be thrown out.

And yes with considerable prompting effort you can improve this picture. But it's easier, faster and cheaper to just write the code yourself. Code is the best specification language we have.

jgraettinger1 · 2026-03-10T00:54:54 1773104094

I ask Claude or codex to review staged work regularly, as part of my workflow. This is often after I’ve reviewed myself, so I’m asking it to catch issues I missed.

It will _always_ find about 8 issues. The number doesn’t change, but it gets a bit … weird if it can’t really find a defect. Part of the art of using the tool is recognizing this is happening, and understanding it’s scraping the bottom of its barrel.

However, if there _are_ defects, it’s quite good at finding and surfacing them prominently.

jgraettinger1 · 2026-03-04T20:24:39 1772655879

Maintaining a high-quality requirements / specification document for large features prior to implementation, and then referencing it in "plan mode" prompts, feels like consensus best practice at this stage.

However a thing I'm finding quite valuable in my own workflows, but haven't seen much discussion of, is spending meaningful time with AI doing meta-planning of that document. For example, I'll spend many sessions partnered with AI just iterating on the draft document, asking it to think through details, play contrarian, surface alternatives, poke holes, identify points of confusion, etc. It's been so helpful for rapidly exploring a design space, and I frequently find it makes suggestions that are genuinely surprising or change my perspective about what we should build.

I feel like I know we're "done" when I thoroughly understand it, a fresh AI instance seems to really understand it (as evaluated by interrogating it), and neither of us can find anything meaningful to improve. At that point we move to implementation, and the actual code writing falls out pretty seamlessly. Plus, there's a high quality requirements document as a long-lived artifact.

Obviously this is a heavyweight process, but is suited for my domain and work.

ETA one additional practice: if the agent gets confused during implementation or otherwise, I find it's almost always due to a latent confusion about the requirements. Ask the agent why it did a thing, figure out how to clarify in the requirements, and try again from the top rather than putting effort into steering the current session.

ramoz · 2026-03-04T20:36:31 1772656591

> consensus best practice

I'm not sure I agree with this. I don't think there needs to be a whole spec & documentation process before plan mode.

There is alternative thought leadership that the waterfall approach for building out projects is not the right Agentic pattern[1].

Planning itself can be such an intensive process where you're designing and figuring out the specs on the fly in a focused manner for the thing the agent will actually develop next. Not sure how useful it is to go beyond this in terms of specs that live outside of the Agentic loop for what should be developed now and next.

I've evolved my own process, originally from plain Claude Code to Claude Code with heavy spec integrated capabilities. However, that became a burden for me: a lot of contextual drift in those documents and then self managing & orchestrating of Claude Code over those documents. I've since reoriented myself to base Claude Code with a fairly high-level effort specific to ad-hoc planning sessions. Sometimes the plans will revolve around specific GitHub issues or feature requests in the ticketing system, but that's about it.

[1] https://boristane.com/blog/the-software-development-lifecycl...

jgraettinger1 · 2026-03-04T21:38:28 1772660308

I think there's a Danger Zone when planning is light-weight and iterative, and code is cheap, but reviewing code is expensive: it leads to a kind of local hill-climbing.

Suppose you iterate through many sessions of lightweight planning, implementation, and code review. It _feels_ high velocity, you're cranking through the feature, but you've also invested a lot of your time and energy (planning isn't free, and code review and fit-for-purpose checks, in particular, are expensive). As often happens -- with or without AI -- you get towards the end and realize: there might have been a fundamentally better approach to take.

The tradeoff of that apparent velocity is that _now_ course correction is much more challenging. Those ephemeral plans are now gone. The effort you put into providing context within those plans is gone. You have an locally optimal solution, but you don't have a great way of expressing how to start over from scratch pointed in a slightly different direction.

I think that part can be really valuable, because given a sufficiently specific arrow, the AI can just rip.

Whether it's worth the effort, I suppose, depends on how high-conviction you are on your original chosen approach.

ramoz · 2026-03-04T22:11:57 1772662317

nice thoughts

jgraettinger1 · 2025-10-22T13:24:10 1761139450

As someone with workloads that can benefit from these techniques, but limited resources to put them into practice, my working thesis has been:

* Use a multi-threaded tokio runtime that's allocated a thread-per-core * Focus on application development, so that tasks are well scoped / skewed and don't _need_ stealing in the typical case * Over time, the smart people working on Tokio will apply research to minimize the cost of work-stealing that's not actually needed. * At the limit, where long-lived tasks can be distributed across cores and all cores are busy, the performance will be near-optimal as compared with a true thread-per-core model.

What's your hot take? Are there fundamental optimizations to a modern thread-per-core architecture which seem _impossible_ to capture in a work-stealing architecture like Tokio's?

jandrewrogers · 2025-10-23T04:18:26 1761193106

A core assumption underlying thread-per-core architecture is that you will be designing a custom I/O and execution scheduler that is purpose-built for your software and workload at a very granular level. Most expectations of large performance benefits follow from this assumption.

At some point, people started using thread-per-core style while delegating scheduling to a third-party runtime, which almost completely defeats the purpose. If you let tokio et al do that for you, you are leaving a lot of performance and scale on the table. This is an NP-Hard problem; the point of solving it at compile-time is that it is computationally intractable for generic code to create a good schedule at runtime unless it is a trivial case. We need schedulers to consistently make excellent decisions extremely efficiently. I think this point is often lost in discussions of thread-per-core. In the old days we didn’t have runtimes, it was just assumed you would be designing an exotic scheduler. The lack of discussion around this may have led people to believe it wasn’t a critical aspect.

The reality that designing excellent workload-optimized I/O and execution schedulers is an esoteric, high-skill endeavor. It requires enormous amounts of patience and craft, it doesn’t lend itself to quick-and-dirty prototypes. If you aren’t willing to spend months designing the many touch points for the scheduler throughout your software, the algorithms for how events across those touch points interact, and analyzing the scheduler at a systems level for equilibria and boundary conditions then thread-per-core might not be worth the effort.

That said, it isn’t rocket science to design a reasonable schedule for software that is e.g. just taking data off the wire and doing something with it. Most systems are not nearly as complex as e.g. a full-featured database kernel.

jgraettinger1 · 2025-09-03T02:59:44 1756868384

It still doesn't make sense. Cursor undoubtedly has smart engineers who could implement the Anthropic text editing tool interface in their IDE. Why not just do that for one of your most important LLM integrations?

mritchie712 · 2025-09-03T16:55:30 1756918530

I agree it doesn't make sense. I'd think they could alias their own tools to match Anthropic's, but my guess is they don't want to customize too heavily on any given model.

jgraettinger1 · 2025-07-11T17:09:58 1752253798

> You can't do that for LLM output.

That's true if you're just evaluating the final answer. However, wouldn't you evaluate the context -- including internal tokens -- built by the LLM under test ?

In essence, the evaluator's job isn't to do separate fact-finding, but to evaluate whether the under-test LLM made good decisions given the facts at hand.

majormajor · 2025-07-11T18:19:04 1752257944

I would if I was the developer, but if I'm the user being sold the product, or a third-party benchmarker, I don't think I'd have full access to that if most of that is happening in the vendor's internal services.

jgraettinger1 · 2025-07-11T02:14:55 1752200095

But that’s not good. You don’t want Bob to be the gate keeper for why a process is the way it is.

In my experience working with agents helps eliminate that crap, because you have to bring the agent along as it reads your code (or process or whatever) for it to be effective. Just like human co-workers need to be brought along, so it’s not all on poor Bob.

jgraettinger1 · 2025-06-17T12:59:39 1750165179

Hi, I’m a cofounder / CTO of estuary.dev. Our whole mission is democratizing and enabling use of data within orgs.

Open to a conversation about your work here? Reach me at johnny at estuary dot dev.

jgraettinger1 · on June 1, 2025

I would recommend the `anyhow` crate and use of anyhow::Context to annotate errors on the return path within applications, like:

  falliable_func().context("failed to frob the peanut")?

Combine that with the `thiserror` crate for implementing errors within a library context. `thiserror` makes it easy to implement structured errors which embed other errors, and plays well with `anyhow`.

kaathewise · on June 1, 2025

Yeah, I found `anyhow`'s `Contex` to be a great way of annotating bubbled up errors. The only problem is that using the lazy `with_context` can get somewhat unwieldy. For all the grief people give to Go's `if err != nil` Rust's method chaining can get out of hand too. One particular offender I wrote:

   match operator.propose(py).with_context(|| {
    anyhow!(
   "Operator {} failed while generating a proposal",
   operator.repr(py).unwrap()
  )
   })? {

Which is a combination of `rustfmt` giving up on long lines and also not formatting macros as well as functions