> What will save it is that, no matter how picky you are as a creator, your audience will never know what exactly was that you dreamed up, so any half-decent approximation will work.
Part of the problem is the "half decent approximations" tend towards a clichéd average, the audience won't know that the cool cyberpunk cityscape you generated isn't exactly what you had in mind, but they will know that it looks like every other AI generated cyberpunk cityscape and mentally file your creation in the slop folder.
I think the pursuit of fidelity has made the models less creative over time, they make fewer glaring mistakes like giving people six fingers but their output is ever more homogenized and interchangable.
In other words, someone willing to tweak the prompt and press the button enough times to say "yeah, that one, that's really good" is going to have a result which cannot in fact be reliably binned as AI-generated.
I mean, no? None of the AI-generated images managed to be indistinguishable. Some people were much better than others at spotting the differences. He even quotes, at length, an artist giving a detailed breakdown of what's wrong with one of the images he thought was good.
Did you read the article? Respondents performed barely better than chance. Sure, no one was actually 100% wrong[0]. Just almost always wrong, with a noticeable bias towards liking AI art more.
The detailed breakdown you mention? Maybe it's accurate to that artist's thought process, maybe it's more of a rationalization; either way, it's not a general rule they, or anyone, could apply to any of the other AI images. Most of those in the article don't exhibit those "telltale signs", and the one that does - the Victorian Megaship - was actually made by human artist with no AI in the mix.
EDIT:
Another image that stands out to me is Riverside Cafe. Myself, like apparently a lot of other people, going by articles' comments, assumed it's a human-made one, because we vaguely remembered Vang Gogh painted something like it. He did, it's called Café Terrace at Night - and yet, despite immediately evoking the association, Riverside Cafe was made by AI, and is actually nothing like Café Terrace at Night at any level.
(I find it fascinating how this work looks like a copy of Van Gogh at first glance, for no obvious reason, but nothing alike once you pause to look closer. It's like... they have similar low-frequency spectra or something?)
EDIT2:
Played around with the two images in https://ejectamenta.com/imaging-experiments/fourifier/. There are some similarities in the spectra, I can't put my finger on them exactly. But it's probably not the whole answer. I'll try to do some more detailed experimentation later.
--
[0] - Nor should you expect it - it would mean either a perfect calibration, or be the equivalent of flipping a coin and getting heads 30 times in a row; it's not impossible, but you shouldn't expect to see it unless you're interviewing fewer people than literally the entire population of the planet.
> The average participant scored 60%, but people who hated AI art scored 64%, professional artists scored 66%, and people who were both professional artists and hated AI art scored 68%.
> The highest score was 98% (49/50), which 5 out of 11,000 people achieved. Even with 11,000 people, getting scores this high by luck alone is near-impossible.
This accurately boils down to "cannot reliably be binned as AI-generated". Your objection amounts to a vanishing few people who are informed that this is a test being able to do a pretty good job at it.
If 0.0005% of people who are specifically judging art as AI or not AI, in a test which presumably attracts people who would like to be able to do that thing, can do a 98% accurate job, and the average is around 60%: that isn't reliable.
If that doesn't work for you, I encourage you to take the test. Obviously since you've read the article there are some spoilers, but there's still plenty of chances to get it right or wrong. I think you'll discover that you, too, cannot do this reliably. Let us know what happens.
I can't do it reliably and I don't want to - I learnt to spot certain popular video compression artifacts in my youth, and that has not enhanced my life. But any distinction that random people taking a casual internet survey get right 60% of the time is absolutely one that you can make reliably if you put in the effort. Look at something like chicken sexing.
a somewhat counterintuitive argument is this: AI models will make the overall creative landscape more diverse and interesting, ie, less "average"!
Imagine the space of ideas as a circle, with stuff in the middle being more easy to reach (the "cliched average"). Previously, traversing the circle was incredibly hard - we had to use tools like DeviantArt, Instragram, etc to agglomerate the diverse tastes of artists, hoping to find or create the style we're looking for. Creating the same art style is hiring the artist. As a result, on average, what you see is the result of huge amounts of human curation, effort, and branding teams.
Now reduce the effort 1000x, and all of a sudden, it's incredibly easy to reach the edge of the circle (or closer to it). Sure, we might still miss some things at the very outer edge, but it's equivalent to building roads. Motorists appear, people with no time to sit down and spend 10000 hours to learn and master a particular style can simply remix art and create things wildly beyond their manual capabilities. As a result, the amount of content in the infosphere skyrockets, the tastemaking velocity accelerates, and you end up with a more interesting infosphere than you're used to.
To extend the analogy, imagine the circle as a probability distribution; for simplicity, imagine it's a bivariate normal joint distribution (aka. Gaussian in 3D) + some noise, and you're above it and looking down.
When you're commissioning an artist to make you some art, you're basically sampling from the entire distribution. Stuff in the middle is, as you say, easiest to reach, so that's what you'll most likely get. Generative models let more people do art, meaning there's more sampling happening, so the stuff further from the centre will be visited more often, too.
However, AI tools also make another thing easier: moving and narrowing the sampling area. Much like with a very good human artist, you can find some work that's "out there", and ask for variations of it. However, there are only so many good artists to go around. AI making this process much easier and more accessible means more exploration of the circle's edges will happen. Not just "more like this weird thing", but also combinations of 2, 3, 4, N distinct weird things. So in a way, I feel that AI tools will surface creative art disproportionally more than it'll boost the common case.
Well, except for the fly in the ointment that's the advertising industry (aka. the cancer on modern society). Unfortunately, by far most of the creative output of humanity today is done for advertising purposes, and that goal favors the common, as it maximizes the audience (and is least off-putting). Deluge of AI slop is unavoidable, because slop is how the digital world makes money, and generative AI models make it cheaper than generative protein models that did it so far. Don't blame AI research for that, blame advertising.
Tastes are almost never normally distributed along a spectrum, but multi-modal. So the more dimensions you explore in, the more you end up with “islands of taste” on the surface of a hyper sphere and nothing like the normal distribution at all. This phenomenon is deeply tied to why “design by committee” (eg, in movies) always makes financial estimates happy but flops with audiences — there is almost no customer for average anything.
An example of a hit movie or song that was created by committee?
Inside Out 2 had the largest box office of any movie in 2024. Checkout the "research and writing" section in its wikipedia article https://en.wikipedia.org/wiki/Inside_Out_2#Research_and_writ... ... psychological consultants, a feedback loop with a group of teenagers, test screenings.
Or how about "Die with a smile" - currently number 1 in the global top 50 on Spotify. 5 songwriters
Or "APT." - currently number 2 in the global top 50 on Spotify. 11 songwriters
Inside Out 2 has a single writer, who also worked on the first.
Consulting with SMEs, testing with audiences, etc isn’t “design by committee”.
Similarly, “Die With a Smile” seems to have been the work of two people with developed styles with support — again, not a committee:
> The collaboration was a result of Mars inviting Gaga to his studio where he had been working on new music. He presented the track in progress to her and the duo finished writing and recording the song the same day.
Apt seems to have started with a single person goofing around, then pitched as a collaboration and the expanded team entered at that point.
I like the picture, but I'd be more impressed with the exploration argument if we were collectively actually doing a good job giving recognition to original and substantial works that already exist.
It'd be of greater service in that regard to create a high-quality artificial stand-in for that limited-quantity "attention" and "engagement" all the bloodsuckers seem so keen on harvesting.
(And I do blame the advertisers, but frankly anyone handing them new amplifiers, with entirely predictable consequences, is also not blameless.)
I read this argument/analogy and the "AI slop will win" idea reminds me of the idea that "fake news will win".
That is based on perception that it is easier than ever to create fake content, but fails to account for the fact that creating real content (for example, simply taking a video) is even much easier. So while there is more fake content, there is also lot more real content, and so manipulation of reality (for example, denying a genocide) is much harder today than ever.
Anyway, "the AI slop will win" is based on a similar misconception, that total creative output will not increase. But like with fake news, it probably will not be the case, and so the actual amount of good art will increase, too.
I think we are OK as long as normal humans prefer to create real news rather than fake news, and create innovative art rather than cliched art.
> I think we are OK as long as normal humans prefer to create real news rather than fake news, and create innovative art rather than cliched art.
So we're not OK.
I think I need to state my assumptions/beliefs here more explicitly.
First of all, "AI slop" is just the newest iteration on human-produced slop, which we're already drowning in. Not because people prefer to create slop, but because they're paid to do it, because most content is created by marketers and advertisers to sell you shit, and they don't want it to be better than strictly necessary for purpose.
It's the same with fake news, really. Fake news isn't new. Almost all news is fake news; what we call "fake news" is a particular flavor of bullshit that got popular as it got easier for random humans to publish stories competing with established media operations.
In both cases, AI is exacerbating the problem, but it did not create it - we were already drowning in slop.
Which leads me to related point:
> Anyway, "the AI slop will win" is based on a similar misconception, that total creative output will not increase.
It will. But don't forget Sturgeon's law - "ninety percent of everything is crap"[0]. Again, for the past couple decades, we've been drowning in "creative output". It's not a new problem, it's just increasingly noticeable in the past years, because the Web makes it very easy for everyone to create more "creative output" (most of which is, again, advertising), and it finally started overwhelming our ability to filter out the crap and curate the gems.
Adding AI to the mix means more output, which per Sturgeon's law, means disproportionately more crap. That's not AI's fault, that's ours; it's still the same problem we had before.
And as AI oversaturates the cliched average, creators will have to get further and further away from the average to differentiate themselves. If you pour a lot of work into your creation you want to make it clear that it isn't some cliched AI drivel.
> I think the pursuit of fidelity has made the models less creative over time, they make fewer glaring mistakes like giving people six fingers but their output is ever more homogenized and interchangeable.
That may be true of any one model (though I don’t think it really is, either, I think newer image gen models are individually capable of a much wider array of styles than earlier models), but it is pretty clearly not true of the whole range of available models, even if you look at a single model “family” like “SDXL derivatives”.
> I think the pursuit of fidelity has made the models less creative over time (...) their output is ever more homogenized and interchangable.
Ironically, we're long past that point with human creators, at least when it comes to movies and games.
Take sci-fi movies, compare modern ones to the ones from the tail end of the 20th century. Year by year, VFX gets more and more detailed (and expensive) - more and better lights, finer details on every material, more stuff moving and emitting lights, etc. But all that effort arguably killed immersion and believability, by making scenes incomprehensible. There's way too much visual noise in action scenes in particular - bullets and lighting bolts zip around, and all that detail just blurs together. Contrast the 20th century productions - textures weren't as refined, but you could at least tell who's shooting who and when.
Or take video games, where all that graphics works makes everything look the same. Especially games that go for realistic style, they're all homogenous these days, and it's all cheap plastic.
(Seriously, what the fuck went wrong here? All that talk, and research, and work into "physically based rendering", yet in the end, all PBR materials end up looking like painted plastic. Raytracing seems to help a bit when it comes to liquids, but it still can't seem to make metals look like metals and not Fischer-Price toys repainted to gray.)
So I guess in this way, more precision just makes the audience give up entirely.
> they will know that it looks like every other AI generated cyberpunk cityscape and mentally file your creation in the slop folder.
The answer here is the same as with human-produced slop: don't. People are good at spotting patterns, so keep adding those low-order bits until it's no longer obvious you're doing the same thing everyone else is.
EDIT: Also, obligatory reminder that generative models don't give you average of training data with some noise mixed up; they sample from learned distribution. Law of large numbers apply, but it just means that to get more creative output, you need to bias the sampling.
Video games (the much larger industry of the two, by revenue) seems to be closer to understanding this. AAA games dominate advertising and news cycles, but on any best-seller list AAA games are on par with indie and B games (I think they call them AA now?). For every successful $60M PBR-rendered Unreal 5 title there is an equally successful game with low-fidelity graphics but exceptional art direction, story or gameplay.
Western movie studios may discover the same thing soon, with the number of high-budget productions tanking lately.
I agree. The one shining hope I have is the incredible art and animation style of Fortiche[0]'s Arcane[1] series. Watch that, and then watch any recent (and identikit) Pixar movie, and they are just streets ahead. It's just brilliant.
Part of the problem is the "half decent approximations" tend towards a clichéd average, the audience won't know that the cool cyberpunk cityscape you generated isn't exactly what you had in mind, but they will know that it looks like every other AI generated cyberpunk cityscape and mentally file your creation in the slop folder.
I think the pursuit of fidelity has made the models less creative over time, they make fewer glaring mistakes like giving people six fingers but their output is ever more homogenized and interchangable.