🧠 Neural Networks -
Cost/Loss Functions

Updated at 2019-02-01 01:35

Cost function C tells how well the neural network is performing. Aim is to minimize C(W, B), cost function output with given weights W and biases B. So we want to find a set of weights and biases which make cost as small as possible.

Loss function vs. cost function terminology can be confusing. People frequently talk about them as synonyms.

Loss function is for a single training example; sample + prediction + label.
Cost function is over the entire batch of gradient descent; and frequently includes regularization.

Common features of cost functions:

Result should be positive or zero.
Result should be close to zero when the weights and biases are performing well, zero if it's perfect on the given input set.

Common cost functions:

Quadratic Cost Functions: Work well with linear neurons, but not with sigmoid neurons. It's slow to learn with saturated neurons.
Cross-entropy Cost Functions: Avoids learning slowdown caused by saturated activation. Better for sigmoid neurons than QCF.

Good learning rate is very dependent on which cost function is in use. This is why you should tune learning rate when changing your cost function.

Sources

Using neural nets to recognize handwritten digits
Role of Bias in Neural Networks
Activation Functions in Neural Networks
The Master Algorithm, Pedro Domingos

🧠 Neural Networks - Cost/Loss Functions

Sources

🧠 Neural Networks -
Cost/Loss Functions