What OpenAI did was train increasingly large transformer model instances. which ...

mattalex · 2026-03-11T18:29:03 1773253743

There were plenty of models the size of gpt3 in industry.

The core insight necessary for chatgpt was not scaling (that was already widely accepted): the insight was that instead of finetuning for each individual task, you can finetune once for the meta-task of instruction following, which brings a problem specification directly into the data stream.

joquarky · 2026-03-11T22:17:21 1773267441

I miss having the completion models like davinci-003 since it gained in performance where it lacked simplicity to get what you want out.

It was fun to come up with creative ways to get it to answer your question or generate data by setting up a completion scenario.

I guess "chat" became the universal completion scenario. But I still feel like it could be "smarter" without the RLHF layer of distortion.