What Happened
Google DeepMind released Gemini 1.5 Pro, featuring a context window of up to 1 million tokens â a massive leap from the typical 32Kâ128K context lengths available at the time. The model could process approximately 1 hour of video, 11 hours of audio, over 30,000 lines of code, or 700,000 words in a single prompt while maintaining strong performance across the entire context.
Why It Matters
The 1 million token context window was a paradigm shift for how AI models could be used. Instead of chunking and summarizing documents, users could feed entire books, codebases, or video recordings directly into the model. This enabled new use cases like full-repository code analysis, long-form video understanding, and comprehensive document QA without retrieval-augmented generation. Later expanded to 2 million tokens in research, it set a new standard for context handling.
Technical Details
- Architecture: Mixture-of-Experts (MoE) Transformer â uses sparse activation so only a subset of parameters are active for each token, improving efficiency
- Context window: 1 million tokens standard, 2 million tokens in research preview
- "Needle in a haystack" performance: >99% recall across the full 1M token context for retrieving embedded facts
- Efficiency: Despite the massive context, maintained near-perfect retrieval and strong reasoning performance
- Multimodal long-context: Could process long video and audio natively, not just text
- Deployment: Available via Google AI Studio and Vertex AI