Also known by the names:
- Adaline Rule
- Widrow-Hoff Rule
- Least Mean Squares (LMS) Rule
Change from Perceptron:
- Replace the step function in the with a continuous (differentiable)
activation function, e.g linear
- For classification problems, use the step function only to determine
the class and not to update the weights.
- Note: this is the same algorithm we saw for regression. All that
really differs is how the classes are determined.
Delta Rule: Training by Gradient Descent Revisited
Construct a cost function E that measures how well the network has
learned. For example
n = number of examples
ti = desired target value associated with the i-th example
yi = output of network when the i-th input pattern is presented to
- To train the network, we adjust the weights in the network so as
to decrease the cost (this is where we require differentiability). This is called gradient descent.
- Initialize the weights with some small random value
- Until E is within desired tolerance, update the weights according
where E is evaluated at W(old), m is the learning rate.:
and the gradient is
More than Two Classes.
If there are mor ethan 2 classes we could still use the same network
but instead of having a binary target, we can let the target take on discrete values. For example of there ar 5
classes, we could have t=1,2,3,4,5 or t= -2,-1,0,1,2. It turns out, however, that the network has a much easier
time if we have one output for class. We can think of each output node as trying to solve a binary problem (it
is either in the given class or it isn't).
[Top] [Next: Correct] [Back to the first page]