Constrained Mean-Variance Portfolio Optimization

Optimization · Hard · Free problem

You want to build a long-only portfolio of $d$ assets. Let $w \in \mathbb{R}^d$ be the vector of portfolio weights, $\Sigma$ the $d \times d$ covariance matrix of returns, and $\mu$ the $d \times 1$ vector of expected returns.

Your constraints are: - Fully invested: $\mathbf{1}^T w = 1$ - Long only: $w_i \geq 0$ for all $i$ - Minimum target return: $\mu^T w \geq r_0$ for some given $r_0 > 0$

Your objective is to minimize portfolio variance $w^T \Sigma w$.

  1. Write this as a standard quadratic program (QP). State the full set of KKT conditions.
  1. Propose a numerical method to solve this QP (e.g., projected gradient descent or an interior-point method). State the per-iteration computational complexity in terms of $d$.
  1. In practice, the expected return vector $\mu$ is estimated with error. Explain how estimation error in $\mu$ distorts the effective constraint $\mu^T w \geq r_0$ and the resulting portfolio. Propose a robust alternative -- either shrinkage of $\mu$ or adding an $\ell_2$ penalty on $w$ -- and write out the precise modified optimization problem.

Hints

  1. Start by writing the objective and all constraints in standard QP form $\min \frac{1}{2} x^T Q x + c^T x$ subject to equality and inequality constraints, then recall the general KKT conditions for such problems.
  2. For the numerical method, think about how interior-point methods handle inequality constraints via log-barrier terms. What is the dominant cost per Newton step when the problem has $d$ variables?
  3. For part (iii), consider the magnitude of estimation error in sample means relative to the signal. How does a mean-variance optimizer respond to noise in $\mu$, and what does adding $\frac{\delta}{2}\|w\|_2^2$ to the objective do to the effective covariance matrix?

Worked Solution

How to Think About It: This is the bread-and-butter optimization problem in quantitative portfolio management. The unconstrained Markowitz problem has a clean closed-form solution via Lagrange multipliers, but the moment you add long-only constraints ($w_i \geq 0$), the closed form goes away and you need numerical methods. The real-world twist is part (iii): estimated $\mu$ values are notoriously noisy, and optimizers love to load up on whichever assets have the largest estimation errors. This is why naive mean-variance optimization produces garbage portfolios in practice.

Key Insight: The QP formulation is standard, but the practical value is understanding why the optimizer "breaks" with noisy inputs and how regularization fixes it.

---

Part (i): QP Formulation and KKT Conditions

The quadratic program is:

$\min_w \frac{1}{2} w^T \Sigma w$

subject to: - $\mathbf{1}^T w = 1$ (budget constraint) - $\mu^T w \geq r_0$ (return constraint) - $w_i \geq 0$ for $i = 1, \ldots, d$ (long-only)

The Lagrangian is:

$\mathcal{L}(w, \lambda, \gamma, \nu) = \frac{1}{2} w^T \Sigma w - \lambda (\mathbf{1}^T w - 1) - \gamma (\mu^T w - r_0) - \nu^T w$

where $\lambda \in \mathbb{R}$ is the multiplier for the equality constraint, $\gamma \geq 0$ for the return constraint, and $\nu \in \mathbb{R}^d$ with $\nu_i \geq 0$ for the long-only constraints.

The KKT conditions are:

  1. Stationarity: $\Sigma w - \lambda \mathbf{1} - \gamma \mu - \nu = 0$
  2. Primal feasibility: $\mathbf{1}^T w = 1$, $\mu^T w \geq r_0$, $w_i \geq 0$
  3. Dual feasibility: $\gamma \geq 0$, $\nu_i \geq 0$
  4. Complementary slackness: $\gamma (\mu^T w - r_0) = 0$ and $\nu_i w_i = 0$ for all $i$

Since $\Sigma$ is positive semidefinite, the problem is convex. If $\Sigma$ is positive definite and a feasible point exists, the KKT conditions are necessary and sufficient for a global optimum.

---

Part (ii): Numerical Method

*Interior-point method:*

An interior-point (barrier) method handles the inequality constraints by adding logarithmic barrier terms. Replace the long-only constraints and return constraint with barrier functions:

$\min_w \frac{1}{2} w^T \Sigma w - t^{-1} \left( \sum_{i=1}^d \ln w_i + \ln(\mu^T w - r_0) \right)$

subject to $\mathbf{1}^T w = 1$, where $t > 0$ is a barrier parameter that increases over iterations.

Each iteration requires solving a linear system of the form $(\Sigma + D)\Delta w = b$, where $D$ is a diagonal-plus-rank-one correction from the barrier Hessian. This involves:

  • Forming the Hessian: $O(d^2)$
  • Solving the $d \times d$ linear system: $O(d^3)$ in general (via Cholesky factorization)

Per-iteration complexity: $O(d^3)$.

With an iterative solver (conjugate gradient) and sparse structure, this can be reduced to roughly $O(d^2)$ per iteration, but $O(d^3)$ is the standard dense case.

*Projected gradient alternative:* Each iteration computes a gradient step $w \leftarrow w - \eta \Sigma w$ (cost $O(d^2)$ for the matrix-vector product), then projects onto the constraint set $\{w : \mathbf{1}^T w = 1, w \geq 0, \mu^T w \geq r_0\}$. The projection onto this simplex-like set costs $O(d \log d)$ via sorting. So projected gradient runs at $O(d^2)$ per iteration but converges more slowly than interior-point.

---

Part (iii): Estimation Error and Robust Alternative

The expected return vector $\mu$ is estimated from historical data, typically as a sample mean $\hat{\mu}$. The estimation error in $\hat{\mu}$ is on the order of $\sigma / \sqrt{T}$ per asset, where $T$ is the sample size. This is large -- for daily returns with $T = 250$ and $\sigma \approx 0.20$ annually, the standard error of each mean return is about $0.20/\sqrt{250} \approx 1.3\%$, which is comparable to the mean itself.

The optimizer treats $\hat{\mu}$ as exact and tilts the portfolio aggressively toward assets whose returns were overestimated by chance. The constraint $\hat{\mu}^T w \geq r_0$ effectively becomes a bet on estimation noise rather than true expected returns. The result is a concentrated, unstable portfolio that performs poorly out of sample.

Robust alternative -- ridge penalty on weights:

$\min_w \frac{1}{2} w^T \Sigma w + \frac{\delta}{2} \|w\|_2^2$

subject to $\mathbf{1}^T w = 1$, $\mu^T w \geq r_0$, $w_i \geq 0$.

This is equivalent to replacing $\Sigma$ with $\Sigma + \delta I$, which shrinks the portfolio toward equal weights. The parameter $\delta > 0$ controls the strength of regularization. This has two effects: - It penalizes concentration, so the optimizer cannot load up on a single asset even if its estimated $\hat{\mu}$ is large. - It makes the effective covariance matrix better conditioned, reducing sensitivity to estimation error in both $\Sigma$ and $\mu$.

Shrinkage alternative: Replace $\hat{\mu}$ with a shrinkage estimator $\tilde{\mu} = (1 - \alpha) \hat{\mu} + \alpha \bar{\mu} \mathbf{1}$, where $\bar{\mu}$ is the grand mean of all asset returns and $\alpha \in [0, 1]$ controls shrinkage intensity. The modified problem uses $\tilde{\mu}$ in the return constraint: $\tilde{\mu}^T w \geq r_0$. This pulls extreme return estimates toward the cross-sectional average, reducing the influence of estimation noise.

---

Answer: The long-only mean-variance problem is a convex QP with $d$ nonnegativity constraints, one equality constraint, and one inequality constraint. The KKT conditions require stationarity ($\Sigma w = \lambda \mathbf{1} + \gamma \mu + \nu$), feasibility, and complementary slackness on the active constraints. An interior-point method solves it in $O(d^3)$ per iteration. To handle estimation error in $\mu$, add a ridge penalty $\frac{\delta}{2}\|w\|_2^2$ (equivalently, replace $\Sigma$ with $\Sigma + \delta I$) or shrink $\hat{\mu}$ toward a common mean.

Intuition

Mean-variance optimization is one of the most elegant ideas in finance and one of the most dangerous to use naively. The unconstrained solution is beautiful -- it inverts the covariance matrix and tilts toward high-return assets. But the optimizer has no way of knowing which "high-return" assets are genuinely good and which just got lucky in the estimation window. This is the classic "error maximization" critique of Markowitz: the optimizer finds and exploits every mistake in your inputs. The long-only constraint actually helps somewhat (it prevents extreme shorts), but it is not enough.

The practical lesson is that regularization is not optional in portfolio optimization -- it is load-bearing. Whether you shrink $\mu$ toward a common prior, add a ridge penalty on weights, or use a robust covariance estimator, the goal is the same: acknowledge that your inputs are noisy and prevent the optimizer from acting as if they are perfect. Every serious quant portfolio construction system includes some form of this. The ridge penalty $\frac{\delta}{2}\|w\|_2^2$ is especially popular because it is trivial to implement (just add $\delta I$ to $\Sigma$), it improves conditioning, and it naturally pushes portfolios toward diversification.

Open the full interactive solver →