Introducing GoLU: a new activation function for neural networks

This title was summarized by AI from the post below.

Gompertz Linear Unit (GoLU): a new era in activation functions Activation functions are the heartbeat of neural networks, they decide how neurons “fire” and what patterns models learn. From ReLU and GELU to Swish and Mish, each innovation has refined how information flows through deep architectures. This year, Indrashis Das, Mahmoud Safari, Steven Adriaensen, and Frank Hutter introduced GoLU from The University of Freiburg and Prior Labs. Definition GoLU(x) = x · e^(-e^(-x)) A smooth, asymmetric activation that stabilizes learning while maintaining gradient flow. Why it matters • Right-leaning asymmetry reduces activation variance and smooths training • Flatter loss landscapes and more stable optimization • Broader weight distributions that capture richer features • Strong results across vision, language, and diffusion benchmarks, often outperforming ReLU, GELU, and Swish Code: https://lnkd.in/dzkDgyNx Paper: https://lnkd.in/dsmX_8pE #DeepLearning #NeuralNetworks #ActivationFunctions #GoLU #MachineLearning #AIResearch

  • text

I was just reading about it the other day, it’s fascinating how the Gompertz growth model translates to neural optimization, the term creates a soft saturating gate that’s gentler than sigmoid but more expressive than ReLU. Curious if stability gains hold in very deep architectures and whether the broader weight distributions benefit transfer learning scenarios.

Like
Reply

To view or add a comment, sign in

Explore content categories