Neural Networks - Cost/Loss Functions
C tells how well the neural network is performing. Aim is to minimize
C(W, B), cost function output with given weights and biases. So we want to find a set of weights and biases which make cost as small as possible.
Loss function vs. cost function terminology can be confusing. People frequently talk about them as synonyms.
- Loss function is for a single training example (sample + prediction + label).
- Cost function is over the entire batch of gradient descent; and frequently includes regularization.
Common features of cost functions:
- Result should be positive or zero.
- Result should be close to zero when the weights and biases are performing well, zero if it's perfect on the given input set.
Common cost functions:
- Quadratic Cost Function
- Cross-entropy Cost Function: Avoids learning slowdown caused by saturated activation. Almost always better choice for sigmoid neurons than QCF.
Good learning rate is very dependent on which cost function is in use. This is why you should tune learning rate when changing your cost function.