In PCA, we want to find a set of orthogonal basis vectors that maximize variance of the projections of the data on them. Let $\mathbf{X}$ be the $n \times d$ matrix where each row is a data point and $\mathbf{w}$ be the $d \times 1$ dimension basis vector. The optimization problem can be formulated as –

Introducing Lagrange multiplier $\lambda$ we get –

Rewriting the norm in terms of dot product –

The Jacobian $\nabla J(\mathbf{w})$ is given by –

The maxima will occur when $\nabla J(\mathbf{w}) = 0$.

Taking transpose on both sides we get –

From the last equation, it is easy to see that the orthogonal basis vectors are the eigenvectors of the scatter matrix $\mathbf{(X^\intercal X)}$ . This can be conveniently implemented in MATLAB using –

where eigvec are the eigenvectors and eigval are the eigenvectors.

Using PCA, we will get a set of atmost $d$ eigenvectors with their corresponding non-zero eigenvalue. The eigenvalue represents how much information about the data is represented by a principal component. The greater the eigenvalue – the more information the principal component contains.

# Resources

LaTeXed notes of the blog post are available in –