Back to timeline

DALL·E 2

OpenAI unveils DALL·E 2, producing photorealistic images from text at 1024×1024 resolution using diffusion models and CLIP guidance.

Model Release

What Happened

OpenAI released DALL·E 2, a vastly improved successor to the original DALL·E. Unlike its predecessor, DALL·E 2 used a diffusion-based approach guided by CLIP embeddings to generate photorealistic images at 1024×1024 resolution. It could also edit existing images, extend images beyond their borders (outpainting), and create variations of input images.

Why It Matters

DALL·E 2 represented a quantum leap in text-to-image quality, producing results that were often indistinguishable from photographs or professional illustrations. It catalyzed a wave of interest in generative AI art tools, prompted urgent conversations about AI's impact on creative industries, and set off an arms race in image generation that soon included Midjourney, Stable Diffusion, and others.

Technical Details

1. A prior that generates CLIP image embeddings from text captions 2. A decoder (modified GLIDE diffusion model) that generates images conditioned on those embeddings