Rewriting the code verbatim and distributing it would be a copyright infringement, yes, you do not have a write to distribute code written by other people
That's completely different from reading and learning from code, which is what grondo described.
Clean room design relies on this, in a clean room design you have one party read and describe the protected work, and another party implement it. That first party reading the protected work is learning from closed-source IP.
If an AI infringes on copyright then it infringes on copyright, that's unfortunate for the distributors of that code.
Humans accidentally infringe on copyright sometimes too. It's not a unique problem to machine learning. The potential to infringe on copyright has not made observing/learning/watching/reading copyright materials prohibited for humans, nor should it or (likely) will it become prohibited for machine learning algorithms.
Grondo said that AI should be given access to all code, including private and unlicensed code.
He was given a link to Clean Room Design demonstrating the problem with the same entity (the AI) reading and learning from the existing code and the risk of regurgitation when writing new code.
He goes on to say thats what he does, which doesn't change that fact.
> Humans accidentally infringe on copyright sometimes too.
Indeed we do, and its almost entirely unnoticed, even by the author.
> nor should it or (likely) will it become illegal for machine learning algorithms.
If those machine learning algorithms are taking in unlicensed material and then they later output unlicensed and/or copyrighted material, then they are a liability. Why would you want that when you can train it otherwise and be sure it NEVER infringes others IP? Its a no-brainer, surely. Or are you assuming there is some magic inherent in other peoples private code?
> If those machine learning algorithms are taking in unlicensed material and then they later output unlicensed and/or copyrighted material, then they are a liability. Why would you want that when you can train it otherwise and be sure it NEVER infringes others IP?
Because it could produce a better model that produces better code.
You're now arguing a heavily reduced point. That a model that trained on proprietary code is at higher risk of reproducing infringing code is not a point under contention. The clean room serves the same purpose, it is a risk mitigation strategy.
Risk mitigation is a choice, left up to individuals. Maybe you use a clean room design, maybe you don't. Maybe you use a model trained on closed-source IP, maybe you don't. There are risks associated with these choices, but that is up to individuals to make.
The choice to observe closed source IP and learn from it shouldn't be prohibited just because some won't want to assume that risk.
That's completely different from reading and learning from code, which is what grondo described.
Clean room design relies on this, in a clean room design you have one party read and describe the protected work, and another party implement it. That first party reading the protected work is learning from closed-source IP.