What Happened
Meta AI released LLaMA (Large Language Model Meta AI), a collection of foundation language models ranging from 7 billion to 65 billion parameters. The models were initially released to researchers under a non-commercial license. Within a week, the model weights were leaked online, making them broadly accessible to the open-source community.
Why It Matters
LLaMA demonstrated that smaller, well-trained models could match or exceed the performance of much larger models like GPT-3. The 13B LLaMA model outperformed GPT-3 (175B) on most benchmarks, while the 65B model was competitive with Chinchilla-70B and PaLM-540B. The leak of the weights, while unintended, ignited an explosion of open-source LLM development — Alpaca, Vicuna, and hundreds of fine-tuned variants appeared within weeks, democratizing access to powerful language models.
Technical Details
- Architecture: Transformer decoder-only, incorporating several improvements:
- Pre-normalization with RMSNorm (from GPT-3)
- SwiGLU activation function (from PaLM)
- Rotary positional embeddings (RoPE, from GPTNeo)
- Sizes: 7B, 13B, 33B, 65B parameters
- Training data: 1.0T tokens (7B) to 1.4T tokens (65B) from publicly available datasets (CommonCrawl, C4, GitHub, Wikipedia, ArXiv, Books, StackExchange)
- Key finding: Training smaller models on more tokens is more compute-efficient than training larger models on fewer tokens (validating Chinchilla scaling laws)