What Happened
In May 2023, the QLoRA paper proposed fine-tuning approaches that significantly reduce memory requirements by training adapters on top of 4-bit quantized base models.
Why It Matters
QLoRA lowered the barrier for community fine-tuning and experimentation on large open models, enabling more development on consumer and mid-range GPUs.
Technical Details
QLoRA combines quantized base weights (e.g., 4-bit formats) with trainable low-rank adapters, often paired with optimizer and memory-management techniques to handle training-time spikes.