Principal Components Analysis: A Background

Principal Components Analysis, first introduced on page 1-14, is a procedure for transforming a set of correlated variables into a new set of uncorrelated variables. This transformation is a rotation of the original axes to new orientations that are orthogonal to each other and therefore there is no correlation between variables. The graph below shows a plot of band 2 versus band 1 of the Morro Bay TM scene. As you an see, the value of band 2 for a particular pixel is related to the value for band 1. The correlation is high.

Since the rotation is a linear combination of the original measurements, if all of the axes are included in the rotation, no information is lost. "No information is lost" means that the original measurements can be recovered from the principal components. If the original data set is singular, then principal components will produce a new representation that is not singular. There are several ways of viewing this transformation:

1. It can be viewed as a *rotation* of the existing axes to new positions in the space defined by the original variables. In this new rotation, there will be no correlation between the new variables defined by the rotation. The first new variable contains the maximum amount of variation, the second new variable contains the maximum amount of variation unexplained by the first and orthogonal to the first, etc...

2. It can be viewed as finding a *projection* of the observations onto orthogonal axes contained in the space defined by the original variables. The criteria being that the first axis "contains" the maximum amount of variation, or "accounts" for the maximum amount of variation. The second axis contains the maximum amount of variation orthogonal to the first. The third axis contains the maximum amount of variation orthogonal to the first and second axis and so on until one has the last new axis which is the last amount of variation left. As you can see these are really two slightly different ways of saying the same thing!

There are several algorithms for calculating the Principal Components. Given the same starting data they will produce the same results with the one exception (are you surprised?). This exception is that, if at some point, there are two or more possible rotations that contain the same "maximum" variation, then which one is used is indeterminate. In two dimensions the data cloud would look like a circle, instead of an ellipse. In a circle, any rotation would be equivalent. In an elliptical data cloud, the first component would be parallel to the major axis of the ellipse.

To calculate the rotation we can start with either a Variance-covariance Matrix or a Correlation Matrix. If one standardizes the data and calculates a Variance-covariance Matrix, then the result will be the same as a Correlation Matrix. Those that wish to practice their algebra can prove this by deriving the formula for the Variance-covariance Matrix and the Correlation Matrix calculated on "raw" data and then the Variance-covariance Matrix calculated on standardized data.

The histogram of the first Principal Component for the Morro Bay scene is:

The histogram for the second Principal Component of the Morro Bay scene is:

Compare these with the histograms of the original bands., which you can do by flipping back and forth to page 1-3 and then to this page.

We can plot the second principal component versus the first to get the 2D view that follows.

How do we get this figure? The elliptical cloud that lies parallel to the X axis is what we might expect. But we need to remember is that we are carrying out our rigid rotation of axes in a 7 dimensional space, one for each band (or variable). We can see here that the original data was not Multivariate Normal, an assumption that would need to be met if one wanted to carry out any parametric statistical tests. This non-normality is indicated the anomalous cloud of points going diagonally across the graph. If the data were multivariate normal in 7 dimensions, then the plot would only have a cloud like the horizontal one in the above plot.

For the Morro Bay TM scene there are 7 spectral bands. Thus each pixel has 7 values. The pixel in row i, column j of the image is a vector:

x(i,j,1) x(i,j,2) x(i,j,3) x(i,j,4) x(i,j,5) x(i,j,6) x(i,j,7)

x(i,j,1) is the value of band 1 in row i, column j, x(i,j,2) is the value of band 2 in row i, column j, etc.

A linear combination of these values, to calculate the first Principal Component, would look like:

This multiplication and addition is carried out for each of the picture elements, pixels, in the image. The Principal Components Analysis is the calculation of the values of the set of vectors __a__ and then the multiplication of the image data by them to get the projections of the data points onto the Principal Components.

A singular matrix is one in which one or more of the rows or columns can be calculated as a linear combination of the other rows or columns. If one calculates the Variance-Covariance matrix of a singular data matrix, the determinant of that Variance-Covariance matrix will be 0.

For example consider the "data" matrix below with 4 variables and 5 observations.

3 | 9 | 11 | 2 | 5 |

5 | 3 | 4 | 3 | 1 |

2 | 7 | 5 | 5 | 11 |

17 | 42 | 41 | 22 | 44 |

If we call this matrix x, we can for example generate the fourth row as a linear combination of the other rows like this:

y = at*x'

Where x' is the data matrix without row 4

3 | 9 | 11 | 2 | 5 |

5 | 3 | 4 | 3 | 1 |

2 | 7 | 5 | 5 | 11 |

and a is a vector of 3 coefficients

2 |

1 |

3 |

that are used to pre multiply x' to produce y, the the fourth row. The mean vector is:

6 | 3.2 | 6 | 33.2 |

We then subtract the mean vector from each "observation" to shift the mean to zero

-3 | 3 | 5 | -4 | -1 |

1.8 | -0.2 | 0.8 | -0.2 | -2.2 |

-4 | 1 | -1 | -1 | 5 |

-16.2 | 8.8 | 7.8 | -11.2 | 10.8 |

before calculating the Variance-Covariance matrix as **vcv = xm*xm**t

The Variance-Covariance is:

60 | 1 | 9 | 148 |

1 | 8.8 | -19 | -46.2 |

9 | -19 | 44 | 131 |

148 | -46.2 | 131 | 642.8 |

and the determinant is: 3.699*10-11 which is within rounding error of 0

If we delete the 4 th variable and recalculate the determinat for the 3 variable data set, we get: 473.2 clearly much larger than 0! As an exercise, you can try calculating this value by hand, or with a matrix algebra package. Mathcad 5 plus was used to calculate this example.

For the purposes of our discussion, a Space is defined by the image bands. Each band defines one dimension of the space. Thus if we have two spectral bands they define a two dimensional space. This space can be visualized by plotting the two spectral intensities for each pixel in a 2 dimensional x-y plot such as the figure below which is a plot of bands 1 and 2 from the Morro Bay scene.

If we add another band, then we will have a 3 dimensional space. We can still visualize this and even make a 3D model (as shown below with blue data points on the tips of red pins to show location on the Band 1 - Band 2 plane) but when we add the fourth band, we must resort to arguments by analogy to the lower dimensional 2D and 3D spaces.

With the 7 spectral bands available from Thematic Mapper, a 7 dimensional space is defined by the observations.

Standardizing a set of observations means substracting the mean from each observation and dividing it by the standard deviation.

The mean is:

The standard deviation is:

The standarized values of each observation are then:

As an exercise, substitute Ybar and sd into the above equation.

Source: http://rst.gsfc.nasa.gov/