Jin Daily Trivia: Why is ChatGPT’s Image Generation So Good?

Ever wondered why OpenAI’s new image generation model is so much faster than its predecessor, DALL-E? 🤔

Did you know how older image models work? you might notice they start with a noisy background that gradually gets refined into the final image.

This process is called a diffusion model (hence we got “Stable Diffusion”) and is used by most text-to-image models like MidJourney, Google Imagen, and Dall-E.

But this time, OpenAI decided to try something different— a method called the autoregressive model (AR).

This approach is actually more similar to how ChatGPT itself works. Instead of refining an image through 30+ steps of “denoising,” AR models generate images sequentially, one piece at a time. This makes the process much faster! Fine details are added at the last step, but the overall structure of the image remains consistent throughout.

Fun fact: Another image model that uses AR is Grok Aurora! 🎨 That’s why both model excel at creating photorealistic images and following text instructions with precision.

Pretty cool, right?

Hope you learned something new today—see you next time! ✌️

Trivia Image

Jin Daily Trivia: Why is ChatGPT’s Image Generation So Good?

Topics