I saw this in my feed recently which was an interesting analysis on how code tra...

I saw this in my feed recently which was an interesting analysis on how code training was added as a fine tune (Codex) on a foundational model (GPT-3): https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tr...

I do wonder if anyone is considering mixing in larger and larger percentages of The Stack https://huggingface.co/datasets/bigcode/the-stack with this or the Pile to get more code and see what happens.

(Likely beyond mere mortals' budgets though.)