Ridge vs. Lasso Shrinkage in the SVD Basis
You have standardized features $X \in \mathbb{R}^{n \times p}$ and a return vector $y \in \mathbb{R}^n$. Ridge regression solves
$\min_{\beta} \|y - X\beta\|_2^2 + \lambda \|\beta\|_2^2$
and Lasso solves
$\min_{\beta} \|y - X\beta\|_2^2 + \lambda \|\beta\|_1$
(i) Derive the closed-form solution $\hat{\beta}^{\text{ridge}}$ and express it in the SVD basis of $X$. What are the shrinkage factors applied to each singular direction, and how do they depend on $\lambda$?
(ii) Contrast this with Lasso's shrinkage behavior. Why does Lasso produce sparse solutions while Ridge does not? In the orthonormal design case, write down the explicit shrinkage operator for each method.
(iii) Explain how cross-validation should be structured to select $\lambda$ and produce an honest out-of-sample $R^2$ estimate. Why is naive cross-validation (using the same CV loop for both tuning and evaluation) problematic, and how do you fix it?
(iv) In a typical quant setting with $p \gg n$ and many weak, correlated signals, which method would you default to and why?
Open the full interactive solver, hints, and worked solution →