# Summary of Nonlinear Networks and Applications

### Backpropagation

- Implementing backprop
- characteristics of cost surfaces

### Activation Functions

- linear
- threshold: binary, bipolar
- sigmoid: bipolar (symmetric), sigmoid
- softmax

### Cost Functions

- Mean Squared Error (MSE)
- Cross Entropy

### Improving Generalization

- using noise to improve learning, annealing
- what does it mean to overtrain?
- early stopping
- weight decay
- pruning (e.g. optimal brain damage)

### Speed-up Techniques

### Unsupervised Learning

- Dimension Reduction for Compression using Autoassociative Networks
- Principal Component Analysis (PCA) using 3 layer nets
- Nonlinear PCA using 5-layer nets

- Clustering for Compression
- Kohonen's Self-Organizing Maps (SOMs)

### Misc Terminology

- correlation matrix vs Hessian
- linear separability
- bias
- decision boundary
- clustering
- dimension reduction
- overtraining

### Experimental Design

- What techniques would you use to understand the data? (graphing data, examining correlation matrix, dimension
reduction,...)
- What type of architecture would you use? (number of layers, number of nodes, activation functions) Why?
- What learning algorithm would you use (speed-up technique)? Why?
- What do you do to insure the net is trained adequately? (but not overtrained)