Positive Semi-Definite Covariance Matrix

Linear Algebra · Easy · Free problem

Let $X = (X_1, \ldots, X_n)^T$ be a random vector with covariance matrix $\Sigma$, where $\Sigma_{ij} = \text{Cov}(X_i, X_j)$.

  1. Prove that $\Sigma$ is positive semi-definite, i.e., $\mathbf{a}^T \Sigma \mathbf{a} \geq 0$ for every $\mathbf{a} \in \mathbb{R}^n$.
  1. Under what condition is $\Sigma$ strictly positive definite?

Hints

  1. What does the quadratic form $\mathbf{a}^T \Sigma \mathbf{a}$ represent in terms of the random vector $X$?
  2. Write out $\mathbf{a}^T \Sigma \mathbf{a} = \sum_{i,j} a_i a_j \text{Cov}(X_i, X_j)$ and recognize it as $\text{Var}(\mathbf{a}^T X)$.
  3. For strict positive definiteness, ask: when can $\text{Var}(\mathbf{a}^T X) = 0$ for some $\mathbf{a} \neq \mathbf{0}$? This happens exactly when $\Sigma$ is singular.

Worked Solution

How to Think About It: The covariance matrix encodes all the pairwise variances and covariances of a random vector. The key insight is that the quadratic form $\mathbf{a}^T \Sigma \mathbf{a}$ has a direct probabilistic interpretation -- it is the variance of a linear combination of the components of $X$. Since variance is always non-negative, the proof is essentially one line once you see this connection. For strict positive definiteness, you need every non-trivial linear combination to have strictly positive variance, meaning no component is a perfect linear function of the others.

Approach: Express the quadratic form as a variance and use the fact that variance is non-negative.

Formal Solution:

Let $\mathbf{a} \in \mathbb{R}^n$ be arbitrary. Consider the scalar random variable $Y = \mathbf{a}^T X = \sum_{i=1}^n a_i X_i$. Then:

$\mathbf{a}^T \Sigma \mathbf{a} = \sum_{i=1}^n \sum_{j=1}^n a_i a_j \text{Cov}(X_i, X_j) = \text{Var}\left(\sum_{i=1}^n a_i X_i\right) = \text{Var}(Y)$

Since variance is always non-negative, we have $\mathbf{a}^T \Sigma \mathbf{a} \geq 0$ for all $\mathbf{a} \in \mathbb{R}^n$. This is exactly the definition of positive semi-definiteness, so $\Sigma$ is PSD. $\square$

Strict Positive Definiteness:

$\Sigma$ is strictly positive definite if and only if $\mathbf{a}^T \Sigma \mathbf{a} > 0$ for every $\mathbf{a} \neq \mathbf{0}$. From the identity above, this is equivalent to requiring that no non-trivial linear combination $\mathbf{a}^T X$ has zero variance. A random variable has zero variance only if it is constant (almost surely), so the condition is:

$\text{No } \mathbf{a} \neq \mathbf{0} \text{ such that } \sum_{i} a_i X_i = c \text{ a.s. for some constant } c$

Equivalently, $\Sigma$ is strictly positive definite if and only if $\text{rank}(\Sigma) = n$ -- that is, the components of $X$ are not perfectly linearly dependent.

Answer: $\mathbf{a}^T \Sigma \mathbf{a} = \text{Var}(\mathbf{a}^T X) \geq 0$ for all $\mathbf{a}$, so every covariance matrix is PSD. It is strictly positive definite if and only if $\Sigma$ has full rank, which means no component $X_i$ is an exact affine function of the others.

Intuition

This result connects two seemingly different mathematical facts: the algebraic property of positive semi-definiteness (a condition on quadratic forms) and the probabilistic fact that variance cannot be negative. The proof is elegant because it bridges linear algebra and probability in a single identity. Once you see that $\mathbf{a}^T \Sigma \mathbf{a}$ is just the variance of a portfolio, the result is obvious -- a portfolio's risk (variance) can be zero but never negative.

In practice, this matters constantly in quantitative finance. Portfolio optimization minimizes a quadratic form $\mathbf{w}^T \Sigma \mathbf{w}$ over weight vectors $\mathbf{w}$, and the PSD property guarantees this is a convex problem with a well-defined minimum. When $\Sigma$ is only semi-definite (not strictly PD) -- which happens when you have more assets than observations, or when some assets are exact linear combinations of others -- the optimizer can find infinitely many solutions with zero variance, all of them artifacts. This is why sample covariance matrices in high-dimensional settings are always regularized (shrinkage, factor models) before being fed into an optimizer.

Open the full interactive solver →