> Regardless of the mechanism, the foundational 'conceit' of LLMs is that by dum...

> Regardless of the mechanism, the foundational 'conceit' of LLMs is that by dumping enough syntax (and only syntax) into a sufficiently complex system, the semantics can be induced to emerge.

Syntax has dual aspect. It is both content and behavior (code and execution, or data and rules, form and dynamics). This means syntax as behavior can process syntax as data. And this is exactly how neural net training works. Syntax as execution (the model weights and algorithm) processes syntax as data (activations and gradients). In the forward pass the model processes data, producing outputs. In the backward pass it is the weights of the model that become the data to be processed.

When such a self-generative syntactic system is in contact with an environment, in our case the training set, it can encode semantics. Inside the model data is relationally encoded in the latent space. Any new input stands in relation to all past inputs. So data creates its own semantic space with no direct access to the thing in itself. The meaning of a data point is how it stands in relation to all other data points.

Another important aspect is that this process is recursive. A recursive process can't be fully understood from outside. Godel, Turing, Chaitin prove that recursion produces blindspots, that you need to walk the recursive path to know it, you have to be it to know it. Training and inferencing models is such a process

The water carves its banks

The banks channel the water

Which is the true river?

Here, banks = model weights and water = language