Fantastic pretraining optimizers and where to find them

hodgehog11 12 hours ago

Ugh, did they really have to continue on with that ridiculous title meme? For those that don't know, it's inspired by this paper: https://arxiv.org/abs/1912.02178

This may all seem in good fun, but it makes a real difference when you have to introduce this paper to students and other academics getting into the discipline. It's just embarrassing, and always garners a reaction from a facepalm to disgust. This is especially true now that Rowling is a controversial figure.

As for the paper itself, this provides a good source for referencing, but the conclusions drawn here seem to be pretty commonly known in the folklore. I think we're finally starting to see a healthy and meaningful shift in tone from the optimization community that has been obsessed with early convergence rates for years. It's good to have options in optimizers, but the decision on which optimizer to use is rarely so straightforward and comes from prior experience. Most will stick with AdamW.

odo1242 11 hours ago

What’s it referencing? I don’t get the reference lol
- viraptor 11 hours ago
  
  https://en.m.wikipedia.org/wiki/Fantastic_Beasts_and_Where_t...

jackblemming 10 hours ago

Haven’t read the entire paper, but the abstract seems to be focused on speed only. That is one component, but if alternative optimizers get similar results but use x2 less memory, that’s huge.