Problems With L0 Regularization

Optimization · Medium · Free problem

In regression, we sometimes want to penalize the number of non-zero coefficients directly. The $L_0$ "norm" of a coefficient vector $\beta$ counts how many entries are non-zero: $\|\beta\|_0 = \#\{j : \beta_j \neq 0\}$ The $L_0$-regularized regression problem is: $\min_{\beta} \|Y - X\beta\|_2^2 + \lambda \|\beta\|_0$ 1. Why is this problem fundamentally harder to solve than $L_1$ or $L_2$ regularized regression? 2. What specific mathematical properties make $L_0$ regularization intractable for standard optimization algorithms? 3. Why is $L_1$ (Lasso) used as a practical substitute, and in what sense is it the "best" convex substitute for $L_0$?

Open the full interactive solver, hints, and worked solution →