Back to timeline

QLoRA

QLoRA shows how to fine-tune large models efficiently by combining 4-bit quantization with LoRA-style adapters.

Architecture

What Happened

In May 2023, the QLoRA paper proposed fine-tuning approaches that significantly reduce memory requirements by training adapters on top of 4-bit quantized base models.

Why It Matters

QLoRA lowered the barrier for community fine-tuning and experimentation on large open models, enabling more development on consumer and mid-range GPUs.

Technical Details

QLoRA combines quantized base weights (e.g., 4-bit formats) with trainable low-rank adapters, often paired with optimizer and memory-management techniques to handle training-time spikes.