Regression Interview Questions
Regression is the single most-used tool in quant research: it is how you turn noisy data into estimated relationships, build factor models, and forecast returns. This playlist takes you from the geometric heart of OLS (projection onto a column space) through the assumptions that make estimates trust
How to think about regression questions
Regression looks like calculus — minimize a sum of squared errors — but it's really geometry. You're dropping a perpendicular from your data onto the space your model can reach, and almost every result here falls out of that one picture.
PROJECTION, NOT ALGEBRA
The least-squares fit is the orthogonal projection of y onto the column space of your predictors; the residuals are what's left over, perpendicular to everything you fit. That's why adding a regressor can only shrink the error, and why the normal equations look the way they do.
WHEN THE LINE LIES
The clean picture bends when assumptions break. Noise in the predictor attenuates the slope toward zero; correlated errors and outliers distort the fit. Recognizing which assumption failed — not memorizing a fix — is what these problems train.
Once you see OLS as a projection, the bias terms, the R², and the multicollinearity headaches all read off the same diagram.
Regression questions (100)
- OLS Estimator Derivation
- OLS Assumptions: What Is Not Required
- Definition and Range of R-Squared
- Effect of Demeaning Variables in OLS
- Why Use Adjusted R-Squared?
- Interpreting an OLS Regression Plot
- Point Always on the No-Intercept Regression Line
- Multicollinearity Consequences in OLS
- One-Hot Encoding a Categorical Variable
- Sum of Fitted Value Deviations from the Mean
- Deriving the OLS Estimator
- OLS Slopes of Y on X vs X on Y
- Advantages of Median Regression over OLS
- Sequential vs Joint OLS with Orthogonal Regressors
- Limitations of R-Squared
- Range of Reverse Regression Slope Given Forward Slope
- Vectorized Per-Symbol OLS Regression in Pandas
- Detecting Heteroscedasticity in OLS
- Inverse Regression Prediction Bias
- Ridge Regression and Feature Scaling
- Regression with Duplicated Data
- WLS Weights When Observations Are Averages
- Effect of Doubling Sample Size on Regression
- Three Perspectives on OLS
- OLS Assumptions, Violations, and Diagnostics
- Can a Positive Correlation Produce a Negative Coefficient?
- Quantile Regression for Directional Trading
- Ridge vs. Lasso Shrinkage in the SVD Basis
- Detecting Multicollinearity in OLS
- Quantile Regression: Check Loss, KKT Conditions, and Scalable Optimization
- Generalized Linear Models
- R-Squared Invariance Under Transformations
- Omitted Variable Bias in OLS
- OLS Fitted Values with a Redundant Feature
- Leverage Points in Linear Regression
- Lasso, Ridge, and Elastic Net Comparison
- OLS Unbiasedness, Endogeneity, and Instrumental Variables
- Why OLS Requires More Observations Than Parameters
- When Lasso Equals OLS
- Sources of Bias in Regression Coefficients
- Distributed OLS Computation
- Median Regression for Heavy-Tailed Noise
- Omitted Variable Bias in a Return Factor Model
- Logistic Regression: Log-Likelihood, Gradient, and Threshold Selection
- Regression with More Features Than Observations
- OLS with Correlated Errors
- Rolling OLS Slope and T-Statistic in O(T) Time
- Implementing Ridge Regression with a Black-Box OLS Routine
- Omitted Variable Bias and Suppressor Variables
- Multicollinearity and Measurement Error in OLS
- Multicollinearity and the Variance Inflation Factor
- Cross-Sectional Factor Model Estimation and Diagnostics
- Huber Regression via Iteratively Reweighted Least Squares
- R-Squared and Feature Count in Linear Regression
- Variance Blowup Under Severe Multicollinearity
- OLS Slope When the True Relationship Is Non-Linear
- Best Subset Selection for R-Squared
- Geometric Interpretation of LARS for Lasso
- OLS with Equicorrelated Predictors
- Bias-Variance Tradeoff in OLS Model Selection
- Low R-Squared with a Highly Significant Coefficient
- Range of R-Squared in Multiple Regression
- LASSO for Return Prediction with Time-Series Validation
- R-Squared, Adjusted R-Squared, and the F-Test for Linear Restrictions
- R-Squared and Adding Redundant vs. Interaction Variables
- Linear Regression, Ridge, and Lasso
- HAC vs Two-Way Clustered Standard Errors
- Median Regression vs. OLS
- Cross-Sectional Factor Model: OLS, R-squared, and Regularization
- Regression Diagnostics: Outliers, Influence, and Multicollinearity
- Optimal Blend of Two Correlated Alpha Signals
- Ridge Regression Regularization in Time Series
- OLS Coefficient Confidence Intervals
- Stacking Six Weak NLP Signals into One Alpha
- Common Regression Pathologies and Fixes
- OLS vs. Total Least Squares: When to Minimize Perpendicular Distance
- OLS and Ridge Estimators in a Linear Factor Model
- HAR-RV Model for Realized Variance Forecasting
- L1 vs. L2 Regularization
- Ridge Regression Bias-Variance Tradeoff
- Ridge vs. Lasso in Cross-Sectional Factor Models
- Ridge and LASSO: Closed-Form Solutions and Gradient Descent
- Errors-in-Variables Regression Bias
- Lasso Coefficients Under Orthogonal Transformation
- GLS and Cochrane-Orcutt for AR(1) Errors
- Regression Coefficients for Uniform Points in a Triangle
- Closed-Form Lasso Under Orthonormal Design
- Poisson GLM with Exposure: Likelihood, IRLS, and Sandwich Errors
- Online Ridge Regression via Sherman-Morrison
- Weighted Lasso Regression and KKT Conditions
- Robust Regression for Factor Models via IRLS
- Recursive Least Squares Update
- Logistic Regression: Gradient, Hessian, and Convexity
- Stambaugh Bias in Predictive Regression
- Angle Between y and y-hat in Ridge Regression
- Recursive Least Squares with Forgetting Factor
- Loss Function Minimizers and Regression Variants
- Two-Stage Least Squares with Overidentification
- Multicollinearity Effects and Mitigation
- Fama-MacBeth Two-Pass Regression
Regression interview questions FAQ
What kind of regression questions show up in quant interviews?
This page collects 100 regression problems that recur in quant trading and research interviews, each with a full worked solution and the intuition behind it. They range from quick warmups to the harder variants firms use to separate candidates.
How hard are regression interview questions?
The set spans 15 easy, 63 medium and 22 hard problems. Most sit at medium difficulty — a few minutes of clean reasoning — with a harder tail that rewards knowing the canonical approach rather than grinding.
How should I practice regression for quant interviews?
Work through them by difficulty, starting just below your level, and write the solution out before checking. 12 are free to open with the full worked solution, so you can judge the quality first. Focus on the recurring patterns rather than memorizing answers — the same handful of ideas generate most variants.
Are these real quant interview questions?
They are a curated set drawn from our problem bank — the kind of regression question that actually appears in quant interviews, rewritten for clarity with solutions we author ourselves. We don't claim any single wording is verbatim, and every problem carries a full solution.