PCA Factor Hedging and Residual Variance Minimization
You run PCA on the standardized returns of $d$ assets and keep the top $k$ principal components. Let $V = [v_1, \ldots, v_k]$ be the $d \times k$ matrix of factor loadings (eigenvectors).
- Define the factor returns $f_t = V^\top r_t$ and the residual returns $u_t = r_t - V f_t$. What does $V f_t$ represent geometrically?
- Given a portfolio with weight vector $w \in \mathbb{R}^d$, express the portfolio's factor exposure vector and its residual variance in terms of $w$, $V$, and the residual covariance matrix $\Sigma_u$.
3. Formulate an optimization problem that finds the portfolio $w$ minimizing residual variance subject to: - (a) Unit leverage: $\mathbf{1}^\top w = 1$ (or $\|w\|_1 = 1$ for a long-short version) - (b) Zero exposure to the first $m < k$ principal components
Is this optimization problem convex? Justify your answer.
Hints
- Think about what PCA gives you geometrically -- the factor loadings define a subspace, and projecting returns onto it separates systematic from idiosyncratic risk.
- The portfolio's factor exposure is $V^\top w$, so zeroing out exposure to the first $m$ PCs means requiring $V_m^\top w = 0$. These are linear constraints.
- The objective $w^\top \Sigma_u w$ is a quadratic form with a PSD matrix, and all constraints are affine equalities -- classify the resulting optimization problem and solve via KKT / Lagrange multipliers.
Worked Solution
How to Think About It: PCA decomposes the covariance of asset returns into systematic factors (the top eigenvectors) and idiosyncratic residuals. When you build a portfolio, your P&L has two pieces: the part driven by common factors (market, sector rotations, etc.) and the part that is stock-specific. A stat-arb or market-neutral desk wants to hedge out the systematic piece and harvest the residual -- that is exactly what this problem sets up. The constraint "zero exposure to the first $m$ PCs" is how you say "I don't want any market or sector risk" in math. Minimizing residual variance on top of that gives you the tightest idiosyncratic portfolio possible.
Part (i): Factor and Residual Returns
The factor returns are the projection of asset returns onto the principal component directions:
$f_t = V^\top r_t \in \mathbb{R}^k$
Each component $f_{t,j} = v_j^\top r_t$ is the return of the $j$-th factor portfolio. The reconstructed systematic return is:
$V f_t = V V^\top r_t$
Geometrically, $V V^\top$ is the orthogonal projection matrix onto the column space of $V$ (the subspace spanned by the top $k$ eigenvectors). So $V f_t$ is the projection of $r_t$ into the factor subspace. The residual is the orthogonal complement:
$u_t = r_t - V V^\top r_t = (I - V V^\top) r_t$
This lives in the $(d - k)$-dimensional subspace orthogonal to the factors.
Part (ii): Portfolio Factor Exposure and Residual Variance
The portfolio return is $w^\top r_t$. Split it:
$w^\top r_t = w^\top V f_t + w^\top u_t$
The factor exposure vector is:
$\beta = V^\top w \in \mathbb{R}^k$
The $j$-th entry $\beta_j = v_j^\top w$ is the portfolio's loading on the $j$-th PC. This is the number a risk manager looks at -- it tells you how much your portfolio moves when factor $j$ moves by one unit.
The residual return of the portfolio is $w^\top u_t$. Its variance is:
$\sigma_u^2(w) = w^\top \Sigma_u \, w$
where $\Sigma_u = \text{Cov}(u_t)$ is the residual covariance matrix. Since $u_t = (I - V V^\top) r_t$, we have:
$\Sigma_u = (I - V V^\top) \Sigma (I - V V^\top)$
where $\Sigma$ is the full covariance matrix. In practice, $\Sigma_u$ is often approximated as diagonal (each stock's idiosyncratic variance is independent), which makes the optimization much cheaper.
Part (iii): The Optimization Problem
Let $V_m = [v_1, \ldots, v_m]$ be the $d \times m$ matrix of the first $m$ factor loadings. The problem is:
$\min_{w \in \mathbb{R}^d} \; w^\top \Sigma_u \, w$
subject to:
$\mathbf{1}^\top w = 1 \quad \text{(unit leverage)}$
$V_m^\top w = 0 \quad \text{(zero exposure to first } m \text{ PCs)}$
The zero-exposure constraint $V_m^\top w = 0$ is a system of $m$ linear equations. Together with the leverage constraint, we have $m + 1$ linear equality constraints.
Convexity: Yes, this problem is convex. The argument is straightforward:
- The objective $w^\top \Sigma_u \, w$ is a quadratic form in $w$. Since $\Sigma_u$ is a covariance matrix, it is positive semidefinite, so the objective is convex (in fact, if $\Sigma_u$ has full rank restricted to the feasible subspace, it is strictly convex).
- All constraints are affine (linear equalities), and affine sets are convex.
- Minimizing a convex function over a convex set is a convex optimization problem.
This is a quadratic program (QP) with linear equality constraints. It has a closed-form solution via the KKT conditions (Lagrange multipliers). Introducing multipliers $\lambda$ for the leverage constraint and $\mu \in \mathbb{R}^m$ for the factor-neutrality constraints:
$w^{*} = \Sigma_u^{-1} \left( \lambda \, \mathbf{1} + V_m \mu \right)$
where $\lambda$ and $\mu$ are determined by substituting back into the constraints. When $\Sigma_u$ is diagonal (independent idiosyncratic variances), the solution simplifies significantly and can be computed in $O(d)$ time.
Long-short version: If the leverage constraint is $\|w\|_1 = 1$ instead of $\mathbf{1}^\top w = 1$, the $\ell_1$-norm constraint is still convex (it is the sublevel set of a convex function), but it is no longer a smooth linear constraint. The problem becomes a quadratically constrained convex program, still efficiently solvable but without a simple closed-form solution.
Answer: The factor exposure vector is $\beta = V^\top w$, residual variance is $w^\top \Sigma_u \, w$, and the minimum-residual-variance portfolio subject to unit leverage and zero exposure to the first $m$ PCs is a convex QP with linear equality constraints. Convexity follows from a PSD quadratic objective and affine constraints. The closed-form solution is $w^{*} = \Sigma_u^{-1}(\lambda \mathbf{1} + V_m \mu)$ with multipliers determined by the constraints.
Intuition
This problem captures the core workflow of statistical arbitrage: decompose risk into systematic (factor) and idiosyncratic (residual) components using PCA, then build portfolios that are neutral to the big systematic drivers. The first few principal components typically capture market beta, sector tilts, and maybe a value/momentum axis. By constraining $V_m^\top w = 0$, you are saying "I don't want my portfolio to move when the market moves or when sectors rotate" -- you only want to collect the idiosyncratic spread between individual names. Minimizing residual variance on top of that is about concentrating your bets where your signal-to-noise is best.
The convexity result is not just a math fact -- it is practically important. It means there is a unique global optimum, no local minima to get trapped in, and the solution can be computed reliably even with hundreds of assets. In production, this QP gets solved thousands of times a day as the optimizer rebalances in response to new prices. The diagonal-$\Sigma_u$ approximation (which assumes idiosyncratic risks are independent across stocks) is standard because it makes the problem separable and extremely fast. The key pitfall: if your PCA is estimated on too short a window, the eigenvectors are noisy and your "hedged" portfolio can still carry significant hidden factor exposure.