Optimal Smoothing for Signal-Noise Tradeoff in Portfolio Weights
You manage a portfolio where each asset $i$ has a daily raw score $s_i(t)$. To reduce noise, you apply exponential smoothing:
$\tilde{s}_i(t) = (1-\lambda)\tilde{s}_i(t-1) + \lambda s_i(t), \quad \lambda \in (0, 1]$
Portfolio weights are set proportional to the smoothed scores and normalized so that $\sum_i |w_i(t)| = 1$.
The raw scores follow a signal-plus-noise model: $s_i(t) = u_i(t) + \varepsilon_i(t)$, where $u_i(t)$ is a slowly varying true signal and $\varepsilon_i(t)$ is i.i.d. mean-zero noise with variance $\sigma^2$.
- Derive how $\lambda$ affects expected portfolio turnover $E[\|w(t) - w(t-1)\|_1]$.
- Derive the expected signal lag (bias) of $\tilde{s}_i(t)$ relative to the true signal $u_i(t)$.
- Propose a criterion for selecting $\lambda$ to maximize net Sharpe ratio (after transaction costs).
Hints
- Treat the exponential moving average as a linear filter and write the steady-state variance of the smoothed score and its period-to-period change as functions of $\lambda$ and $\sigma^2$.
- For the bias/lag, model the slowly varying signal $u_i(t)$ locally as a linear trend and compute $E[\tilde{s}_i(t)] - u_i(t)$ using the geometric weight representation $\tilde{s}_i(t) = \lambda \sum_{k=0}^{\infty} (1-\lambda)^k s_i(t-k)$.
- For the optimal $\lambda$ criterion, express net Sharpe as gross Sharpe minus a cost term proportional to $E[\text{Turnover}]$, and note that gross Sharpe is a decreasing function of lag while turnover is an increasing function of $\lambda$.
Worked Solution
How to Think About It: This is the classic bias-variance tradeoff in disguise. A small $\lambda$ (heavy smoothing) gives you a clean, slow-moving signal -- low turnover, low transaction costs -- but you are always chasing yesterday's truth, so you lag behind when the signal moves. A large $\lambda$ (little smoothing, approaching raw scores) is responsive but noisy -- weights jump around, turnover explodes, and transaction costs eat your alpha. The interviewer wants to see you quantify both sides of this and then propose a principled way to balance them.
Key Insight: Exponential smoothing is a low-pass filter. The smoothed score $\tilde{s}_i(t)$ is a geometrically weighted average of past raw scores, so noise averages out (reducing variance by roughly $\lambda / (2 - \lambda)$) while slow signals pass through with a phase lag.
Part 1 -- Turnover as a function of $\lambda$:
Write the smoothed score update as: $\tilde{s}_i(t) = \tilde{s}_i(t-1) + \lambda (s_i(t) - \tilde{s}_i(t-1))$
So the change in smoothed score per period is: $\Delta \tilde{s}_i(t) = \tilde{s}_i(t) - \tilde{s}_i(t-1) = \lambda(s_i(t) - \tilde{s}_i(t-1))$
Under the stylized model with slowly varying $u_i$ (treat $u_i(t) \approx u_i(t-1)$), the dominant source of $\Delta \tilde{s}_i$ is the noise innovation: $\Delta \tilde{s}_i(t) \approx \lambda (\varepsilon_i(t) - (1-\lambda)\varepsilon_i(t-1) - (1-\lambda)^2 \varepsilon_i(t-2) - \cdots)$
The variance of $\tilde{s}_i(t)$ (in steady state) is: $\text{Var}(\tilde{s}_i(t)) = \sigma^2 \cdot \frac{\lambda}{2-\lambda}$
The variance of $\Delta \tilde{s}_i(t)$ is: $\text{Var}(\Delta \tilde{s}_i(t)) = \sigma^2 \cdot \frac{2\lambda^2}{2-\lambda}$
Weight changes are proportional to score changes (to first order in small $\Delta w$). For a portfolio of $N$ assets, turnover scales as: $E[\|w(t) - w(t-1)\|_1] \propto \sqrt{N} \cdot \sigma \cdot \frac{\lambda}{\sqrt{2-\lambda}}$
The key takeaway: turnover is proportional to $\lambda$ for small $\lambda$. Cutting $\lambda$ in half roughly halves turnover.
Part 2 -- Signal lag (bias):
The EWM filter has an impulse response that decays geometrically. For a slowly varying signal $u_i(t)$, we can model it locally as a trend: $u_i(t) \approx u_i(0) + g \cdot t$ for some drift rate $g$.
The steady-state expected smoothed signal is: $E[\tilde{s}_i(t)] = \sum_{k=0}^{\infty} \lambda(1-\lambda)^k u_i(t-k) \approx u_i(t) - g \cdot \frac{1-\lambda}{\lambda}$
So the expected bias (lag) relative to the true signal is: $\text{Bias}(\lambda) = E[\tilde{s}_i(t)] - u_i(t) \approx -g \cdot \frac{1-\lambda}{\lambda}$
The effective lag in time units is $(1-\lambda)/\lambda$. For $\lambda = 0.1$, the lag is 9 days; for $\lambda = 0.5$, it is 1 day. Bias is inversely proportional to $\lambda$.
Part 3 -- Optimal $\lambda$ criterion:
Net Sharpe is approximately: $\text{SR}_{\text{net}}(\lambda) = \text{SR}_{\text{gross}}(\lambda) - \frac{c \cdot E[\text{Turnover}(\lambda)]}{\sigma_p}$
where $c$ is one-way transaction cost per unit turnover and $\sigma_p$ is portfolio volatility.
Gross Sharpe degrades as $\lambda$ decreases (due to lag bias). Turnover cost grows with $\lambda$. The optimal $\lambda^{*}$ balances these:
$\lambda^{*} = \arg\max_{\lambda} \left[ \text{IR}(\lambda) \cdot \sqrt{\text{SignalVariance}(\lambda)} - c \cdot \text{Turnover}(\lambda) \right]$
In practice, a cleaner heuristic is to set the turnover budget first: given a target annual turnover $\tau$, solve for the $\lambda$ that achieves it. Then verify that the implied lag is acceptable for your signal's half-life. If your signal has a half-life of $H$ days, a good rule of thumb is $\lambda \approx 1 - 2^{-1/H}$, and then check whether resulting transaction costs are within budget.
Answer: Turnover $\propto \lambda / \sqrt{2-\lambda}$ (approximately linear in $\lambda$ for small $\lambda$). Signal lag $\approx (1-\lambda)/\lambda$ days. Optimal $\lambda$ is found by maximizing net Sharpe, trading off lag-induced signal decay against turnover costs. A practical approach: set a turnover budget or match $\lambda$ to signal half-life, then verify costs.
Intuition
Exponential smoothing is one of the most widely used tools in systematic trading, and this problem captures its fundamental tension perfectly. The smoothing parameter $\lambda$ controls where you sit on the bias-variance frontier: low $\lambda$ reduces noise (good) but makes you stale (bad); high $\lambda$ keeps you current but turns the portfolio over constantly (expensive). The right choice depends on your signal's decay rate relative to your transaction cost -- a fast-decaying signal can afford more lag, while a signal with high information content justifies paying more in costs to stay current.
In practice, most shops choose $\lambda$ by estimating the signal half-life from historical IC decay, then computing the $\lambda$ that matches it, and back-checking whether the implied turnover is affordable. The formal optimization criterion (maximize net Sharpe) is theoretically correct but hard to estimate precisely; the half-life matching heuristic is more robust to estimation error. Knowing how to frame this tradeoff analytically -- and then admit where the theory breaks down in practice -- is exactly what a senior interviewer wants to see.