Net Sharpe Ratio With Transaction Costs

Finance · Medium · Free problem

A strategy produces i.i.d. daily gross returns $g_t \sim N(\mu, \sigma^2)$. The strategy executes with average daily turnover $\tau$ (fraction of portfolio traded per day). A proportional transaction cost of $c$ per dollar traded reduces each day's return, so net daily returns are:

$r_t = g_t - c \cdot \tau$

You run a backtest of length $n$ days and compute the sample mean $\bar{r}$ and sample standard deviation $\hat{\sigma}_r$ of the net returns.

Derive the net sample Sharpe ratio $S_{\text{net}} = \bar{r} / \hat{\sigma}_r$ and express the population net Sharpe ratio in terms of $\mu$, $\sigma$, $c$, and $\tau$.

Construct a large-sample $t$-statistic for testing $H_0: S_{\text{net}} \leq 0$ versus $H_1: S_{\text{net}} > 0$. What is the distribution of this statistic under the null, and what sample size $n$ do you need to detect a given Sharpe ratio at a specified significance level?

Now suppose you estimate Sharpe using overlapping $k$-day returns instead of daily returns. Explain why this introduces bias and autocorrelation, and discuss how it affects the $t$-statistic you derived.

Hints

Transaction costs shift the mean return but leave the variance unchanged -- what does that do to the Sharpe ratio formula?
The sample Sharpe ratio is approximately $\hat{S} \approx N(S, (1 + S^2/2)/n)$ for large $n$. Use this to build your $t$-test and solve for required sample size.
Overlapping $k$-day returns share $k-1$ days of data, creating an MA($k-1$) autocorrelation structure. Think about what that does to the effective sample size in your $t$-statistic.

Worked Solution

How to Think About It: Transaction costs are the silent killer of backtested strategies. A strategy with a gross Sharpe of 2.0 can easily become mediocre after costs if it turns over aggressively. The key insight is dead simple: costs shift the mean return down by $c\tau$ per day but leave the volatility essentially unchanged (since $c\tau$ is a constant subtracted every day). So the net Sharpe ratio is just the gross Sharpe minus a "cost drag" term. The statistical question is then: given finite sample noise, can you distinguish this net Sharpe from zero? That is where the $t$-statistic comes in.

Quick Estimate: Suppose $\mu = 5$ bps/day, $\sigma = 100$ bps/day, turnover $\tau = 0.10$, cost $c = 10$ bps. Then daily cost drag is $c\tau = 1$ bp. Net mean return: $5 - 1 = 4$ bps. Net Sharpe per day: $4/100 = 0.04$. Annualized net Sharpe: $0.04 \times \sqrt{252} \approx 0.63$. To detect this at 5% significance (one-sided, $z_{0.05} = 1.645$), you need roughly $n \geq (1.645 / 0.04)^2 \approx 1,691$ trading days, about 6.7 years. That is the cruel reality of Sharpe ratio inference -- you need long track records.

Approach: We derive the population net Sharpe, then use the asymptotic distribution of the sample Sharpe ratio to construct the test.

Formal Solution:

Part 1 -- Net Sharpe Ratio:

Since $r_t = g_t - c\tau$, the net return has:

$E[r_t] = \mu - c\tau, \qquad \text{Var}(r_t) = \sigma^2$

The constant cost $c\tau$ shifts the mean but does not change the variance. The population net Sharpe ratio (daily) is:

$S_{\text{net}} = \frac{\mu - c\tau}{\sigma}$

The sample estimator is:

$\hat{S}_{\text{net}} = \frac{\bar{r}}{\hat{\sigma}_r}$

where $\bar{r} = \frac{1}{n}\sum_{t=1}^n r_t$ and $\hat{\sigma}_r = \sqrt{\frac{1}{n-1}\sum_{t=1}^n (r_t - \bar{r})^2}$.

Note that the gross Sharpe ratio is $S_{\text{gross}} = \mu / \sigma$, so:

$S_{\text{net}} = S_{\text{gross}} - \frac{c\tau}{\sigma}$

The cost drag on Sharpe is $c\tau / \sigma$ -- it depends on cost, turnover, and volatility. A low-vol strategy with high turnover gets destroyed.

Part 2 -- $t$-Statistic:

Since the $r_t$ are i.i.d. normal, $\bar{r}$ is exactly $N(\mu - c\tau,\; \sigma^2 / n)$ and $\hat{\sigma}_r^2$ is an independent scaled chi-squared. The ratio $\hat{S}_{\text{net}} = \bar{r}/\hat{\sigma}_r$ is a $t$-type statistic.

More precisely, define:

$t = \frac{\bar{r}}{\hat{\sigma}_r / \sqrt{n}} = \sqrt{n} \cdot \hat{S}_{\text{net}}$

Under $H_0: \mu - c\tau = 0$, this follows a $t_{n-1}$ distribution (or approximately $N(0,1)$ for large $n$). Under the alternative with true net Sharpe $S_{\text{net}}$, it follows a non-central $t$ with non-centrality parameter $\delta = \sqrt{n} \cdot S_{\text{net}}$, which for large $n$ is approximately:

$t \approx N\left(\sqrt{n} \cdot S_{\text{net}},\; 1 + \frac{S_{\text{net}}^2}{2}\right)$

The variance correction

+ S_{\text{net}}^2/2$ comes from the asymptotic variance of the sample Sharpe ratio (Lo, 2002), but for typical Sharpe ratios ($S_{\text{net}} < 0.1$ daily), this is nearly 1 and can be ignored.

For a one-sided test at significance level $\alpha$, reject $H_0$ when $t > z_\alpha$. The required sample size for power $\beta$ against true Sharpe $S_{\text{net}}$ is:

$n \geq \left(\frac{z_\alpha + z_\beta}{S_{\text{net}}}\right)^2$

This is the key formula. With $\alpha = 0.05$, $\beta = 0.80$ (so $z_\alpha = 1.645$, $z_\beta = 0.842$), and daily $S_{\text{net}} = 0.04$, you need $n \geq (2.487/0.04)^2 \approx 3,867$ days -- about 15 years for 80% power. Sharpe ratio inference requires patience.

Part 3 -- Overlapping Returns Bias:

Suppose you form overlapping $k$-day returns: $R_t^{(k)} = \sum_{j=0}^{k-1} r_{t+j}$. These have mean $k(\mu - c\tau)$ and variance $k\sigma^2$, so the $k$-day Sharpe ratio is $\sqrt{k} \cdot S_{\text{net}}$, which is just the square-root-of-time scaling.

The problem is that overlapping returns are serially correlated even when the underlying daily returns are i.i.d. Consecutive $k$-day windows share $k-1$ daily returns, inducing an MA($k-1$) structure with autocorrelation:

$\rho_h = \frac{k - h}{k} \quad \text{for } 0 < h < k, \qquad \rho_h = 0 \text{ for } h \geq k$

This means:

The sample mean is unbiased, but its variance is no longer $\sigma_k^2 / n$. The effective number of independent observations is roughly $n/k$, not $n$.
The sample standard deviation is biased downward because the overlapping observations are not independent, leading to underestimation of the true sampling variability of $\bar{R}$.
The naive $t$-statistic $\sqrt{n} \cdot \hat{S}$ is inflated by a factor of roughly $\sqrt{k}$, making strategies look more significant than they are.

To correct for this, you must use Newey-West or Hansen-Hodrick standard errors that account for the known MA($k-1$) structure, or simply use non-overlapping returns and accept the smaller sample size. In practice, many published "significant" Sharpe ratios use overlapping returns without proper correction, which is one reason backtests overstate significance.

Answer: The population net Sharpe ratio is $S_{\text{net}} = (\mu - c\tau)/\sigma = S_{\text{gross}} - c\tau/\sigma$. The $t$-statistic is $t = \sqrt{n} \cdot \hat{S}_{\text{net}}$, which is approximately $N(0,1)$ under the null for large $n$. Required sample size is $n \geq (z_\alpha + z_\beta)^2 / S_{\text{net}}^2$. Overlapping returns inflate the $t$-statistic by roughly $\sqrt{k}$ due to artificial serial correlation, requiring HAC-adjusted standard errors for valid inference.

Intuition

The core lesson here is that transaction costs are a constant drag that erodes Sharpe ratios in a very predictable way -- the net Sharpe is just the gross Sharpe minus $c\tau/\sigma$. This formula is one of the most practically useful in quant finance because it tells you immediately whether a high-turnover strategy can survive real-world frictions. A strategy with a gross daily Sharpe of 0.05 and a cost drag of 0.03 only has a net Sharpe of 0.02, which is nearly undetectable statistically and probably not worth trading.

The statistical side is equally important and often misunderstood. The sample Sharpe ratio converges painfully slowly -- you need years of daily data to distinguish a legitimate Sharpe of 1.0 (annualized) from noise, and many more years for a Sharpe of 0.5. This is why the quant industry has a replication crisis: most published strategies have nowhere near enough data to be statistically significant after costs. The overlapping returns trap makes it worse by artificially inflating $t$-statistics, giving a false sense of precision. Whenever you see a backtest result, your first question should be: how many independent observations went into this Sharpe estimate?

Open the full interactive solver →