LLaMA (Meta) (2023) | AI Timeline

What Happened

Meta AI released LLaMA (Large Language Model Meta AI), a collection of foundation language models ranging from 7 billion to 65 billion parameters. The models were initially released to researchers under a non-commercial license. Within a week, the model weights were leaked online, making them broadly accessible to the open-source community.

Why It Matters

LLaMA demonstrated that smaller, well-trained models could match or exceed the performance of much larger models like GPT-3. The 13B LLaMA model outperformed GPT-3 (175B) on most benchmarks, while the 65B model was competitive with Chinchilla-70B and PaLM-540B. The leak of the weights, while unintended, ignited an explosion of open-source LLM development — Alpaca, Vicuna, and hundreds of fine-tuned variants appeared within weeks, democratizing access to powerful language models.

Technical Details

Architecture: Transformer decoder-only, incorporating several improvements:
Pre-normalization with RMSNorm (from GPT-3)
SwiGLU activation function (from PaLM)
Rotary positional embeddings (RoPE, from GPTNeo)
Sizes: 7B, 13B, 33B, 65B parameters
Training data: 1.0T tokens (7B) to 1.4T tokens (65B) from publicly available datasets (CommonCrawl, C4, GitHub, Wikipedia, ArXiv, Books, StackExchange)
Key finding: Training smaller models on more tokens is more compute-efficient than training larger models on fewer tokens (validating Chinchilla scaling laws)

What Happened

Why It Matters

Technical Details

Sources