Neural Networks - Regularization
Regularization is a technique used to address overfitting. Regularization artificially discourages complex explanations even if they sometimes fit what has been observed. This allow statistical model to generalize better to combat measurement noise and useless features. You keep all features but reduce magnitude of certain parameters.
Regularization works well when you have a lot of features. The more surgical and planned out your features are, the less you will probably get from regularization. But it rarely hurts to try it out.
Too heavy regularization will cause loss to skyrocket after a while when training starts to drop important weights.
Commonly used regularizations:
- L₁ aka. Lasso: Shrink some weights to zero. It gets rid of irrelevant features.
- L₂ aka. Ridge: Shrink some weights to relatively small. It minimizes impact of irrelevant features.
- L₁/L₂ aka. Elastic: Apply both Lasso and Ridge.
- Dropout: Freeze and ignore a random selection of the neurons in a network layer for a single training step, but bring them back for evaluation.
L₁ and L₂ are more commonly used with linear regression. Dropout is more commonly used with neural networks, between 20% - 50%. Because L₁ makes some weights zero, it also perform variable selection and helps to create sparse models.