Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> What case law, if any, did you rely on in Microsoft & GitHub's public claim, stated by GitHub's (then) CEO, that: “(1) training ML systems on public data is fair use, (2) the output belongs to the operator, just like with a compiler”? In the interest of transparency and respect to the FOSS community, please also provide the community with your full legal analysis on why you believe that these statements are true.

> If it is, as you claim, permissible to train the model (and allow users to generate code based on that model) on any code whatsoever and not be bound by any licensing terms, why did you choose to only train Copilot's model on FOSS? For example, why are your Microsoft Windows and Office codebases not in your training set?

I'm not sure I buy this argument. In the first point, the authors state that Copilot was trained on public data. In the very next point, they slightly tweak it by saying training was done on "any" code which loses the distinction between public and private code. Obviously Windows and Office are not public code.

I also interpreted "public data" to mean they trained on codebases that explicitly specified, say, MIT licenses or other permissible licenses. That seems like fair use to me. Those licenses don't explicitly restrict training AI models on their codebases do they? It's ironic if these licenses started banning AI training now though. That would effectively mean Copilot would be sole trained AI model.

I'm happy to be proven wrong though. In general I have a distrust of Copilot. I fear it would make individuals worse programmers in the end at the cost of productivity



CoPilot has been demonstrated to lift code verbatim (including swearing comments): https://twitter.com/mitsuhiko/status/1410886329924194309

So even if it is legal to create a commercial product which outputs GPL code as its main value-add, it still seems like it could put the user in an awkward position of auto-completing big chunks of GPL licensed code into their project.


Good point, but also with FOSS licenses you can have incompatibilities if you take code distributed under a certain license and you use it into a project distributed under a different license. So I still think there is a double standard here.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: