Jin Daily Trivia: Why is ChatGPT’s Image Generation So Good?
Jin Daily Trivia: Why is ChatGPT’s Image Generation So Good?
Ever wondered why OpenAI’s new image generation model is so much faster than its predecessor, DALL-E? 🤔
Did you know how older image models work? you might notice they start with a noisy background that gradually gets refined into the final image.
This process is called a diffusion model (hence we got “Stable Diffusion”) and is used by most text-to-image models like MidJourney, Google Imagen, and Dall-E.
But this time, OpenAI decided to try something different— a method called the autoregressive model (AR).
This approach is actually more similar to how ChatGPT itself works. Instead of refining an image through 30+ steps of “denoising,” AR models generate images sequentially, one piece at a time. This makes the process much faster! Fine details are added at the last step, but the overall structure of the image remains consistent throughout.
Fun fact: Another image model that uses AR is Grok Aurora! 🎨 That’s why both model excel at creating photorealistic images and following text instructions with precision.
Pretty cool, right?
Hope you learned something new today—see you next time! ✌️
