Since the cost surface for multi-layer networks can be complex,
choosing a learning rate can be difficult. What works in one location of the cost surface may not work well in
another location. Delta-Bar-Delta is a heuristic algorithm for modifying the learning rate as training progresses:
- Each weight has its own learning rate.
- For each weight: the gradient at the current timestep is compared
with the gradient at the previous step (actually, previous gradients are averaged)
- If the gradient is in the same direction the learning rate is
- If the gradient is in the opposite direction the learning rate
- Should be used with batch only.
gij(t) = gradient of E wrt wij at time t
Then the learning rate mij for
weight wij at time t+1 is given by
where b, g ,
and k are chosen by the hand.
- Knowing how to choose the parameters b,
g , and k is not easy.
- Doesn't work for online.