
Re-Examining LayerNorm
This post is part of the work done at Conjecture. Special thanks to Sid Black, Dan Braun, Carlos Ramón Guevara, Beren Millidge, Chris Scammell, Lee Sharkey, and Lucas Teixeira for feedback on early drafts. There's a lot of non-linearities floating around in neural networks these days, but one that often