What Happened
Meta released Llama 3.1, featuring models at 8B, 70B, and 405B parameter scales under an updated open license. The 405B model was the largest openly available language model at the time and was competitive with leading closed models like GPT-4o and Claude 3.5 Sonnet. Meta also released the full training recipe and extensive documentation.
Why It Matters
Llama 3.1 405B proved that open-weight models could match frontier closed models on major benchmarks. By releasing a model of this caliber openly, Meta:
- Challenged the closed-model paradigm — enterprises could now self-host frontier-class models
- Lowered barriers for AI research and innovation globally
- Enabled sovereign AI — countries and organizations could run powerful models without depending on US cloud providers
- Established a reference architecture that the open-source community could build upon
Technical Details
- Architecture: Dense Transformer decoder with grouped-query attention
- Sizes: 8B, 70B, 405B parameters
- Training data: 15 trillion tokens from publicly available sources
- Context window: 128K tokens
- Training infrastructure: 16,000 H100 GPUs
- Key features:
- Support for tool use and function calling
- Multilingual capabilities (8 languages)
- Improved coding and mathematical reasoning
- Benchmark performance (405B): Competitive with GPT-4 (April 2024), GPT-4o, and Claude 3.5 Sonnet across MMLU, HumanEval, GSM8K, and MATH