DALL·E 2 (2022) | AI Timeline

What Happened

OpenAI released DALL·E 2, a vastly improved successor to the original DALL·E. Unlike its predecessor, DALL·E 2 used a diffusion-based approach guided by CLIP embeddings to generate photorealistic images at 1024×1024 resolution. It could also edit existing images, extend images beyond their borders (outpainting), and create variations of input images.

Why It Matters

DALL·E 2 represented a quantum leap in text-to-image quality, producing results that were often indistinguishable from photographs or professional illustrations. It catalyzed a wave of interest in generative AI art tools, prompted urgent conversations about AI's impact on creative industries, and set off an arms race in image generation that soon included Midjourney, Stable Diffusion, and others.

Technical Details

Architecture: Two-stage model combining CLIP and diffusion:

1. A prior that generates CLIP image embeddings from text captions 2. A decoder (modified GLIDE diffusion model) that generates images conditioned on those embeddings

Resolution: 1024×1024 (4x the original DALL·E)
Key capabilities: Text-to-image generation, inpainting (editing regions of existing images), outpainting (extending image boundaries), and image variations
Safety: Implemented content filters and prohibited photorealistic faces of real people

What Happened

Why It Matters

Technical Details

Sources