[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage?

patricky168 · 1 year ago

Thanks - I was wondering though, for QLoRA what does the LoRA bit really do?

Since I feel like there have been some success(?) in just quantizing the model and doing full fine-tuning and it still reduces memory consumption, so does the LoRA mainly assist in trying to “recover” the lost precision? Or does the LoRA part in QLoRA still significantly reduce memory vs. say, just 4 bit quantization + full finetuning?

patricky168 · 1 year ago

Yeah what I mean is that despite LoRA only updating gradients for the adapters on the attention weights, we still need to calculate gradients for downstream layers that aren’t being updated and that takes GPU memory. So the only memory saved is from the optimizer states if I am not mistaken.

patricky168 · 1 year ago

[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage?