Train Smarter, Not Harder? - MiniSymposium 7b

kindacognizant · 2 years ago

Train Smarter, Not Harder? - MiniSymposium 7b

kindacognizant · 2 years ago

Considering there’s an implementation of the cosine scheduler with warmup steps, is there any implementation of a scheduler that starts slow, then rapidly accelerates, and finally stabilizes to learn the subtle features (like a sigmoidal function?) To avoid starting too high in the first place.

https://preview.redd.it/qb1z0n7oci2c1.png?width=1200&format=png&auto=webp&s=15dbab7b3a18ab918defbbbe2ab6816aaa46b489