Ugh, did they really have to continue on with that ridiculous title meme? For those that don't know, it's inspired by this paper: https://arxiv.org/abs/1912.02178
This may all seem in good fun, but it makes a real difference when you have to introduce this paper to students and other academics getting into the discipline. It's just embarrassing, and always garners a reaction from a facepalm to disgust. This is especially true now that Rowling is a controversial figure.
As for the paper itself, this provides a good source for referencing, but the conclusions drawn here seem to be pretty commonly known in the folklore. I think we're finally starting to see a healthy and meaningful shift in tone from the optimization community that has been obsessed with early convergence rates for years. It's good to have options in optimizers, but the decision on which optimizer to use is rarely so straightforward and comes from prior experience. Most will stick with AdamW.
Haven’t read the entire paper, but the abstract seems to be focused on speed only. That is one component, but if alternative optimizers get similar results but use x2 less memory, that’s huge.
Ugh, did they really have to continue on with that ridiculous title meme? For those that don't know, it's inspired by this paper: https://arxiv.org/abs/1912.02178
This may all seem in good fun, but it makes a real difference when you have to introduce this paper to students and other academics getting into the discipline. It's just embarrassing, and always garners a reaction from a facepalm to disgust. This is especially true now that Rowling is a controversial figure.
As for the paper itself, this provides a good source for referencing, but the conclusions drawn here seem to be pretty commonly known in the folklore. I think we're finally starting to see a healthy and meaningful shift in tone from the optimization community that has been obsessed with early convergence rates for years. It's good to have options in optimizers, but the decision on which optimizer to use is rarely so straightforward and comes from prior experience. Most will stick with AdamW.
What’s it referencing? I don’t get the reference lol
https://en.m.wikipedia.org/wiki/Fantastic_Beasts_and_Where_t...
Haven’t read the entire paper, but the abstract seems to be focused on speed only. That is one component, but if alternative optimizers get similar results but use x2 less memory, that’s huge.