Train Smarter, Not Harder? - MiniSymposium 7b

kindacognizant · 1 year ago

Train Smarter, Not Harder? - MiniSymposium 7b

kindacognizant · 1 year ago

> Multiple passes at lower learning rates isn’t supposed to produce different results.

Oh, I was wrong on this, then, my bad.

So would my interpretation be correct that this is essentially causing the overfitting to still happen, just significantly slower, and that a higher LR would work? The problem is at first the average loss tanked in the span of like a single epoch to near zero which overfit, but this LR didn’t have the same effect.