We propose Tied-LoRA, a simple paradigm utilizes weight tying and selective training to further increase parameter efficiency of the Low-rank adaptation (LoRA) method. Our investigations include all feasible combinations parameter training/freezing in conjunction with weight tying to identify the optimal balance between performance and the number of trainable parameters. Through experiments covering a variety of tasks and two base language models, we provide analysis revealing trade-offs between efficiency and performance. Our experiments uncovered a particular Tied-LoRA configuration that stands out by demonstrating comparable performance across several tasks while employing only 13~% percent of parameters utilized by the standard LoRA method.
hmm, one of the really interesting details here - normal lora in rank 8 tested better than in rank 128 - genuine question - how is it possible? medicore data used for lora? I have done few finetunes recently and see a similar situation between rank 128 and 256
There are tests in the original lora paper where the boost is very small once the rank is greater than 8.
https://preview.redd.it/ii53qcx8031c1.png?width=1080&format=png&auto=webp&s=821bac1232255bf791120afde7d9e9f3506a89f5