The difference is that JPEG does store the real image, at least close enough to within the given tolerance (determined by the compression factor). That image is as real as say an image on film (also not exact, nor in "original" form).
With Stable Diffusion it's storing the style, but can't reproduce any single input image-- there aren't enough bits [0]. (except by luck, but that's really true for any storage).
The weights of a NN are just a compressed representation of the training data, think lossy zip.
Rank all generated images by similarity to the training data (etc.) and you can see what's stored.
The Shannon-Hartley theorem isnt relevant. A 4GB zip of 100TB text data can exactly reproduce the initial 100TB for some distributions of that initial dataset.
if you reproduced an exact image (to the same lossy degree as jpeg) using the NN, then you are violating copyright.
But if you reproduced an image whose style matches another copyrighted image (e.g., blah in the style of starry night), then how does that new image (which didn't exist before) violate existing copyright? You cannot copyright a style.
The NN containing information which _could_ be used to reconstruct an exact image doesn't itself constitute copyright violation - because the right to use information for training NN is not an exclusive right that the original holder of the training set has.
So either a new law has to come into existence, vis a vis the right to use copyrighted works to train a NN, or the current copyright laws should apply (which implies that NN generated images which are not "exact" copies of existing works don't violate copyright).
If a given model can consistently reproduce an exact image given the same input prompt, why shouldn't the model itself be considered a compressed form of that image?
Right, but underneath your premise is a scam, right?
A NN has not learnt to paint: it doesn't coordinate its sensory-motor system with its environment through play, it hasn't developed any taste, it does not discern the aesthetic good from the bad, it has no judgement, and so on ad infinitum.
A NN is just a kNN with an extra compression step. The way all gradient based "learners" work is to compute distances to pregiven training data. In the case of kNN that data is used exactly, in NN its compressed.
There is no intelligence here, there is no learning: it's a trick. It turns out that interpolating a point between prior examples can often look novel and often fool a human observer.
This is largely due to how incredibly tolerant to flaws we are in the cases where NNs are used to perform this trick. We go to great lengths to impart intention, fix communicative flaws, etc. and this is exploited by "AI" to make simple crap seem great by having the observer fill-in the details, perceptually. I see it as a kind of proto-schizophrenia that all people have which usually works if we're dealing with a human, but on everything else produces religions.
In any case, a NN is just a case of a kNN -- which is capable of fooling people exactly the same way, and clearly violates copyright and is a case of theft of intellectual work to make a product you can sell. Adding compression seems irrelevant.
I don't think this interpretation of NNs is correct. There's been a few papers purporting to show this, but afair they used a very tortured definition of "interpolation".
Stable Diffusion is certainly capable of differentiating good from bad. That's why you can tell it to draw good or draw bad.
Not that this point is relevant to my comment. "Play", "taste" and "judgment" can be just as deterministic as a sequence of large matrix operations interspersed with nonlinear layers.
Sure, but then who is torturing matrices to turn them into organic bodies which adapt their musculature to their environment?
Interpolation is forced in the case of NNs, it's a training condition.
And the "kNN interpretation" isnt an interpration, kNNs define what "ideal learning" is in the case of statistical learning, and hence show, it doesnt count as actual learning.
In actual learning we're not interested in whether you can solve prespecified problems but how well you cope when you can't. This is, by definition, not a problem which can be formulated in statistical learning terms and the particular "learning" algorithm here is irrelevant.
In other words accuracy isnt a test of learning. Accuracy is a "non-modal condition" in being fit to a history that actually took place. Learning "in the usual sense" is strictly a modal, "what if" phenomenon, and is assessed by the quality of failure under adverse conditions, not of success.
If one gave any AI/ML system in existence adverse conditions, posed relevant "what ifs" and observed the results, they'd be exposed as the catastrophe they are. None survive any even basic test of "coping well" in these cases.
This is why all breathless AI public relations, ie., academic papers published in the last decade, do not perform any such tests.
Because said set of numbers is produced via a training process that has the original as an input, and a different input would produce a different set of numbers.
You're correct that merely containing the information would not violate copyright - it's all about how that information was produced.
I'm actually seeing plenty of new images that are in the same style but are different from any of the images in the train set, like wonder woman in front of a mountain that looks like the setting of "frozen".
What this analogy is saying is that if an image is generic and derivative enough (or massively overrepresented in the training data) it may be possible to reconstruct a very close approximation from the model. If the training data is unbiased, I question the validity of copyright claims on an image that is sufficiently derivative that it can be reproduced in this manner.
With Stable Diffusion it's storing the style, but can't reproduce any single input image-- there aren't enough bits [0]. (except by luck, but that's really true for any storage).
[0] https://en.wikipedia.org/wiki/Shannon–Hartley_theorem