Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is this a joke? If it's not trainable / differentiable when why do it in the first place? It's just as inefficient and inflexible as it gets compared to tool calling — you have to statically bake programs in the weights, model cannot introspect it and modify, it has very limited IO capabilities, bad performance, bad everything. Its like a weird brainfuck-esque VM — cool that you can do it, but for what except some lulz?

But maybe it's just too genius and I don't understand it.

 help



I'd tend to agree, the only good points I've seen were made by @hedgehog [1] here in this thread:

    I'm not sure about the rest but a significant problem with high frequency tool calling (especially in training) is that it breaks batching.
and then later by @ACCount37 [2]:

    I'm less interested in turning programs into transformers and more interested in turning programs into subnetworks within large language models.
In theory, if you can create a very efficient sub-net to replicate certain tool calls (even if the weights are frozen during any training steps, and manually compiled), this might help with making inference much more efficient at scale. No idea why in general you would want to do this through the clunky transformer architecture though. Just implement a non-trainable, GPU-accelerated layer to do the compute and avoid the tool-call.

[1] https://news.ycombinator.com/item?id=47367986

[2] https://news.ycombinator.com/item?id=47363909




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: