The winning image entry for "The Yarrctic Circle" by OpenAI 4o doesn't actually wields a cutlass. It's very aesthetically pleasing, even though it's so wrong in all fundamental aspects (perspective is nonsensical and anatomy is messed up, with one leg 150% longer than the other, ...).
It's a very interesting resource to map some of the limits of existing models.
In my own testing between the two this is what I’ve noticed. Imagen will follow the instructions, and 4o will often not, but produces aesthetically more pleasing images.
I don’t know which is more important, but I would say that people mostly won’t pay for fun but disposable images, and I think people will pay for art but there will be an increased emphasis on the human artist. However users might pay for reliable tools that can generate images for a purpose, things like educational illustrations, and those need to be able to follow the spec very well.
People pay for digital sticker packs so their memoji in iMessage are customized. How much money they make on sticker packs is unknown to me, but image generation platform Midjourney seems to be doing alright.
Midjourney got in REALLY early in the GenAI game despite only allowing image generation through Discord for at least a year. I heard that it was one of the largest Discord channels ever having something absurd like 20+ million members.
I'd love to see some financials but I'd tend to agree they're probably doing pretty well.
Google Flow is remarkable as video editing UX, but Imagen 4 doesn't really stand out amongst its image gen peers.
I want to interrupt all of this hype over Imagen 4 to talk about the totally slept on Tencent Hunyuan Image 2.0 that stealthily launched last Friday. It's absolutely remarkable and features:
- millisecond generation times
- real time image-to-image drawing capabilities
- visual instructivity (eg. you can circle regions, draw arrows, and write prompts addressing them.)
- incredible prompt adherence and quality
Nothing else on the market has these properties in quite this combination, so it's rather unique.
Tencent Hunyuan had a bunch of model releases all wrapped up in a product that they call "Hunyuan Game", but the Hunyuan Image 2.0 real time drawing canvas is the real star of it all. It's basically a faster, higher quality Krea: https://x.com/TencentHunyuan/status/1924713242150273424
You can see how this is an incredible illustration tool. If they were to open source this, this would immediately become the top image generation model over Flux, Imagen 4, etc. At this point, really only gpt-image-1 stands apart as having godlike instructivity, but it's on the other end of the [real time <--> instructive] spectrum.
A total creative image tool kit might just be gpt-image-1 and Hunyuan Image 2.0. The other models are degenerate cases.
> but Imagen 4 doesn't really stand out amongst its image gen peers.
In this AI rat race, whenever one model gets ahead, they all tend to reach parity within 3-6 months. If you can wait 6 months to create your video I'm sure Imagen 5 will be more than good enough.
It's honestly kind of ridiculous the pace things are moving at these days. 10 years ago waiting a year for something was very normal, nowadays people are judging the model-of-the-week against last week's model-of-the-week but last week's org will probably not sleep and they'll release another one next week.
I've given this some more thought. Even if Imagen 4 isn't that great on its own, all of Google's models and UX products in conjunction (Veo 3, Flow, etc.) are orders of magnitude above the rest of the playing field.
If Tencent wants to keep Google from winning the game, they should open source their models. From my perspective right now, it looks like Google is going to win this entire game, and open source AI might be the only way to stop that from being a runaway victory.
It's a very interesting resource to map some of the limits of existing models.