OLS vs Ridge vs Lasso Regression

Regression · Medium · Free problem
You are building a factor model for stock returns. You have $p = 200$ candidate factors and $n = 500$ daily observations. Let $X$ be the $n \times p$ design matrix and $Y$ the $n \times 1$ vector of returns. 1. Derive the OLS estimator $\hat{\beta}_{\text{OLS}}$ in closed form. Under what conditions does it break down, and why is $p = 200$, $n = 500$ already a warning sign? 2. Write down the Ridge regression objective (with penalty parameter $\lambda$) and derive its closed-form solution. Then show that Ridge is equivalent to the MAP estimator under Bayesian linear regression with a specific Gaussian prior on $\beta$. What is that prior? 3. Why might Lasso ($L^1$ penalty) be preferred over Ridge for this factor model? What practical advantage does it give you that Ridge cannot?

Open the full interactive solver, hints, and worked solution →