Delta-Bar-Delta (Jacobs)

 Since the cost surface for multi-layer networks can be complex, choosing a learning rate can be difficult. What works in one location of the cost surface may not work well in another location. Delta-Bar-Delta is a heuristic algorithm for modifying the learning rate as training progresses:


gij(t) = gradient of E wrt wij at time t

then define

Then the learning rate mij for weight wij at time t+1 is given by

where b, g , and k are chosen by the hand.


 [Top] [Next: Unsupervised] [Back to the first page]