patricky168OPBtoMachine Learning@academy.garden•[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage?English
1·
1 year agoYeah what I mean is that despite LoRA only updating gradients for the adapters on the attention weights, we still need to calculate gradients for downstream layers that aren’t being updated and that takes GPU memory. So the only memory saved is from the optimizer states if I am not mistaken.
Thanks - I was wondering though, for QLoRA what does the LoRA bit really do?
Since I feel like there have been some success(?) in just quantizing the model and doing full fine-tuning and it still reduces memory consumption, so does the LoRA mainly assist in trying to “recover” the lost precision? Or does the LoRA part in QLoRA still significantly reduce memory vs. say, just 4 bit quantization + full finetuning?