I do wonder if anyone is considering mixing in larger and larger percentages of The Stack https://huggingface.co/datasets/bigcode/the-stack with this or the Pile to get more code and see what happens.
(Likely beyond mere mortals' budgets though.)
I do wonder if anyone is considering mixing in larger and larger percentages of The Stack https://huggingface.co/datasets/bigcode/the-stack with this or the Pile to get more code and see what happens.
(Likely beyond mere mortals' budgets though.)