In general, the data does not lie perfectly on a linear subspace. In this case, some information
is lost when the data is compressed. The problem here is to find the compression direction that results in the
least amount of information that is lost.

### Principal Component Analysis (PCA)

The l_{1} direction corresponds to

- the direction of largest variance of the data.
- the eigenvector associated with the largest eigenvalue of the correlation matrix ( <x x
^{T}>
).

If we have n dimensional data, we can compress it down to m dimensions
by projecting it down to the space spanned by eigen vectors of the m largest eigenvalues.

The methods that can be used for finding these directions is called
**Principal Component Analysis (PCA).**

### Finding the Principal Components using an Autoassociative Network

An autoassociative network is a network whose inputs and targets are the same. That is, the net
must find a mapping from an input to itself.

Why do this? Well, when the number of hidden nodes is smaller than the number of input node, the
network is forced to learn an efficient low dimensional representation of the data.

See Maple example of the above network.

### Example: Image Compression (Cottrell et al, 87)

- 64 inputs: 8x8 pixel regions of an image specified to 8 bit precision
- 16 hidden units
- 64 outputs: targets = inputs

Trained on randomly selected patches of an image (150,000 training
steps). It was then tested on the entire image patch by patch using the entire set of non overlapping patches See
"Fundamentals of Artificial Neural Networks", Hassoun, pp247-253.

They found that nonlinearity in the hidden units gave no advantage
(this was later confirmed theoretically).