MADGRAD: A high performance deep learning optimizer

I've just open sourced an implementation of the MADGRAD optimizer that I developed together with Samy Jelassi. It out-performs Adam on every problem I've tried it on, and it has generalization performance comparable to SGD, avoiding the overfitting problems of adaptive methods entirely! Check it out here:

Leave a Reply

Your email address will not be published. Required fields are marked *