The Perceptron 

Assumptions:

Geometric Interpretation:

With this binary function f, the problem reduces to finding weights such that

sign( W X) = t

That is, the weight must be chosen so that the projection of pattern X onto W has the same sign as the target t. But the boundary between positive and negative projections is just the plane W X = 0 , i.e. the same decision boundary we saw before.

The Perceptron Algorithm

  1. initialize the weights (either to zero or to a small random value)
  2. pick a learning rate m ( this is a number between 0 and 1)
  3. Until stopping condition is satisfied (e.g. weights don't change):

For each training pattern (x, t):

  • compute output activation y = f(w x)
  • If y = t, don't change weights
  • If y != t, update the weights:

    w(new) = w(old) + 2 m t x

    or

    w(new) = w(old) + m (t - y ) x, for all t


    Consider wht happens below when the training pattern p1 or p2 is chosen. Before updating the weight W, we note that both p1 and p2 are incorrectly classified (red dashed line is decision boundary). Suppose we choose p1 to update the weights as in picture below on the left. P1 has target value t=1, so that the weight is moved a small amount in the direction of p1. Suppose we choose p2 to update the weights. P2 has target value t=-1 so the weight is moved a small amount in the direction of -p2. In either case, the new boundary (blue dashed line) is better than before.

Comments on Perceptron

x2 = - (w1/w2) x1 - w3 / w2

where we have defined w3 to be the bias: W = (w1,w2,b) = (w1,w2,w3)

The perceptron is guaranteed to converge in a finite number of steps if the problem is separable. May be unstable if the problem is not separable.

Come to class for proof!!

Outline: Find a lower bound L(k) for |w|2 as a function of iteration k. Then find an upper bound U(k) for |w|2. Then show that the lower bound grows at a faster rate than the upper bound. Since the lower bound can't be larger than the upper bound, there must be a finite k such that the weight is no longer updated. However, this can only happen if all patterns are correctly classified.

Perceptron Decision Boundaries

 

 

Two Layer Net: The above is not the most general region. Here, we have assumed the top layer is an AND function.

Problem: In the general for the 2- and 3- layers cases, there is no simple way to determine the weights.

[Top] [Next: Delta] [Back to the first page]