Doing Classification Correctly

The Old Way

When there are more than 2 classes, we so far have suggested doing the following:

There are problems with this method. First, there is a disconnect between the definition of the error function and the determination of the class. A minimum error does not necessary produce the network with the largest number of correct prediction.

By varying the above method a little bit we can remove this inconsistency. Let us start by changing the interpretation of the output:

The New Way

New Interpretation: The output of yi is interpreted as the probability that i is the correct class. This means that:

How do we achieve this? There are several things to vary.

To decide, let's start by thinking about what makes sense intuitively. With a linear network using gradient descent on a MSE function, we found that the weight updates were proportional to the error (t-y). This seems to make sense. If we use a sigmoid activation function, we obtain a more complicated formula:

See derivatives of activation functions to see where this comes from.

This is not quite what we want. It turns out that there is a better error function/activation function combination that gives us what we want.

Error Function:

Cross Entropy is defined as

where c is the number of classes (i.e. the number of output nodes). 

This equation comes from information theory and is often applied when the outputs (y) are interpreted as probabilities. We won't worry about where it comes from but let's see if it makes sense for certain special cases.

Activation function:

Softmax is defined as

where fi is the activation function of the ith output node and c is the number of classes.
Note that this has the following good properties:

where dij = 0 if i=j and zero otherwise. Note that if r is the correct class then tr = 1 and RHS of the above equation reduces to (tr-yr)xs. If q!=r is the correct class then tr = 0 the above also reduces to (tr-yr)xs. Thus we have

Look familiar?

[Top] [Next: Optimizing] [Back to the first page]