- 1 Post
- 6 Comments
lightSpeedBrickBtoMachine Learning@academy.garden•[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage?English
1·2 years agoMy understanding is that with LoRA you reduce the number of trainable parameters and therefore the memory needed to track optimizer states (e.g for Adam that tracks 2 state parameters for each model parameter). This means that you need far less RAM to fine-tune the model. Imagine 70B parameters * 4 bytes for fp32 training plus 70B * 8bytes for Adam. Lora reduces that second part to say 1% of 70B * 8 bytes.
You can also use gradient checkpointing, which isn’t specific to LoRA, to reduce memory consumption at the expense of training time. Here you recompute activations during back-prop and cache some intermediate activations.
Can you explain what you mean by “caching intermediate gradients during backprop”? I’m not familiar with what that is.
lightSpeedBrickBtoMachine Learning@academy.garden•[D] For those interested, Please, help build a new and small subreddit community centered on positive and enthusiastic AI discourse.English
1·2 years agoOh, don’t get me wrong, the dominant sentiment on r/singularity is not for me and I am no fan of the reverence certain public figures get from members of that community. I was going for polite understatement with my comment, but perhaps failed 😅
lightSpeedBrickBtoMachine Learning@academy.garden•[D] For those interested, Please, help build a new and small subreddit community centered on positive and enthusiastic AI discourse.English
1·2 years agoWhat’s wrong with r/singularity? Folks over there are optimistic, perhaps a little too eager and optimistic. In fact most opinions that aren’t optimistic get downvoted pretty quickly.
lightSpeedBrickBto
LocalLLaMA@poweruser.forum•667 of OpenAI's 770 employees have threaten to quit. Microsoft says they all have jobs at Microsoft if they want them.English
1·2 years agoI threaten to quit too. I don’t work at OpenAI, but I’ll quit my job and happily accept Microsoft’s offer in solidarity.
lightSpeedBrickBtoMachine Learning@academy.garden•[R] Seeking input: transformer modification with 25-30% improvement in validation loss across 3 datasetsEnglish
1·3 years agoLargely unrelated, but this has a similar vibe. I wonder what happened to that high school kid who invented the transformer even before Vaswani et al, and then a year later another guy who claimed to invent a brand new neural network architecture that was supposed to break the internet.
Ah, I hadn’t thought of that. I’ll look into it. Thank you for the suggestion!