[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage?

patricky168 · 1 year ago

[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage?

patricky168 · 1 year ago

Thanks - I was wondering though, for QLoRA what does the LoRA bit really do?

Since I feel like there have been some success(?) in just quantizing the model and doing full fine-tuning and it still reduces memory consumption, so does the LoRA mainly assist in trying to “recover” the lost precision? Or does the LoRA part in QLoRA still significantly reduce memory vs. say, just 4 bit quantization + full finetuning?