Back to timeline

Gemini (Google)

Google DeepMind launches Gemini, a natively multimodal AI model family that processes text, images, audio, and video, competing directly with GPT-4.

Model Release

What Happened

Google DeepMind launched Gemini, a family of natively multimodal AI models available in three sizes: Ultra, Pro, and Nano. Unlike GPT-4, which was primarily a text model with image understanding added, Gemini was designed from the ground up to process and reason across text, images, audio, video, and code simultaneously. Gemini Ultra was the first model to achieve human-expert performance on the MMLU benchmark.

Why It Matters

Gemini represented Google's most serious response to OpenAI's GPT-4 and the ChatGPT phenomenon. It signaled that the AI race was intensifying among major tech companies. The model's native multimodality — being trained on multiple modalities from the start rather than bolting them together — pointed toward the future of AI systems that could seamlessly understand and generate across different types of content.

Technical Details