You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
so based on experiments it was found that GELU has a significantly smoother gradient transition and its not abrupt or sharp like relu , if u look at both the functions u would understand.
Moreover look at the GPT2 code , they use gelu and many other models i have encountered also use GELU so went with it.
Going through these papers
I feel the order should be this
The text was updated successfully, but these errors were encountered: