Shivam's Space

Search

❯

❯

Weight initialisation techniques

Weight initialisation techniques

May 30, 20241 min read

dl
revisionNotes

Zero
Random
Xavier initialisation (Glorot Initialisation)
Weights are initialised from a distribution with zero mean and a specific variance designed to keep the scale of gradients roughly the same across layers.
➕ good for tanh and sigmoid
➖ not good for relu

He Initialisation (Kaiming Initialisation)
Similar to Xavier initialization but with a higher variance to account for the rectified linear units (ReLU) activations.
➖ can lead to exploding gradient in deep deep networks

LeCun Initialisation
Designed for activation functions like sigmoid and tanh.
➕ suitable for sigmoid/tanh activations
➖ not good for relu

Orthogonal Initialisation
Weights are initialized to be orthogonal matrices
➕ preserves the variance of the activations.
➖ expensive computation wise

Variational Initialisation
Weights are sampled from a probability distribution, often Gaussian, and the parameters of this distribution are learned during training.
➕ can adapt to the data during training.
➖ more complex and requires tuning.

Few other techniques include uniform initialisation, scaled initialisation and Layer-sequential Unit-variance (LSUV) initialisation.

Graph View

Backlinks

No backlinks found

Created with love ❤️ by Shivam

GitHub
LinkedIn
Twitter