Profit-Aware Classification Threshold

Machine Learning · Hard · Free problem

You have fit a logistic regression model for predicting whether the next-day return $r_{t+1}$ is positive. The model outputs $\hat{p}_t = P(r_{t+1} > 0 \mid x_t)$, where $x_t$ is your feature vector.

Your trading rule is: - If $\hat{p}_t > \theta$: go long (trade), earning P&L $= r_{t+1} - c$, where $c > 0$ is the per-trade cost. - If $\hat{p}_t \leq \theta$: do nothing, P&L $= 0$.

Derive the optimal threshold $\theta^{*}$ that maximizes expected profit per period.
Explain why $\theta^{*} \neq 0.5$ in general, and give intuition for the direction of the shift.

Hints

Write expected profit as $\Pi(\theta) = \int_{\theta}^1 (\mu(p) - c) f(p) \, dp$ where $\mu(p) = E[r_{t+1} \mid \hat{p}_t = p]$. The optimal $\theta$ is where the integrand equals zero.
Differentiate $\Pi(\theta)$ with respect to $\theta$ using Leibniz's rule. The first-order condition gives $\mu(\theta^{*}) = c$.
In the symmetric case with constant up/down magnitudes $\mu^+$ and $\mu^-$, the threshold becomes $\theta^{*} = (c + \mu^-)/(\mu^+ + \mu^-)$, which exceeds $0.5$ when $c > 0$.

Worked Solution

How to Think About It: This is a decision theory problem dressed up as trading. The standard classification threshold of $0.5$ minimizes misclassification error, but that is not what we care about -- we care about expected profit. The asymmetry comes from the cost $c$: every time you trade, you pay $c$ regardless of whether the return is positive or negative. So you need the expected return conditional on trading to exceed $c$, not just be positive. This pushes the optimal threshold above $0.5$.

Quick Estimate: If $c = 0$, you should trade whenever $\hat{p}_t > 0.5$ (any positive expected return is worth taking for free). If $c$ is large, you should be more selective -- only trade when you are very confident the return is positive. So $\theta^{*}$ increases with $c$. In the symmetric case where up and down moves have equal magnitude $\mu$, the break-even condition is roughly $\theta^{*} \cdot \mu - c = 0$, giving $\theta^{*} \approx c / \mu + 0.5$.

Approach: Write the expected profit as a function of $\theta$, take the derivative with respect to $\theta$, and find the optimum.

Formal Solution:

**(i) Deriving $\theta^{*}$:**

The expected profit per period is: $\Pi(\theta) = E\left[(r_{t+1} - c) \cdot \mathbf{1}(\hat{p}_t > \theta)\right]$

Using the law of iterated expectations, condition on $\hat{p}_t$: $\Pi(\theta) = E\left[E[r_{t+1} - c \mid \hat{p}_t] \cdot \mathbf{1}(\hat{p}_t > \theta)\right]$

Let $\mu(p) = E[r_{t+1} \mid \hat{p}_t = p]$ denote the expected return conditional on the model's predicted probability. Then: $\Pi(\theta) = \int_{\theta}^{1} (\mu(p) - c) \, f(p) \, dp$

where $f(p)$ is the density of $\hat{p}_t$.

Differentiating with respect to $\theta$ using Leibniz's rule: $\frac{d\Pi}{d\theta} = -(\mu(\theta) - c) \cdot f(\theta)$

Setting this to zero (and assuming $f(\theta) > 0$): $\mu(\theta^{*}) = c$

The optimal threshold $\theta^{*}$ is the value of $\hat{p}_t$ at which the conditional expected return equals the trading cost.

To make this more explicit, if we assume that the return conditional on $\hat{p}_t = p$ can be decomposed as: $E[r_{t+1} \mid \hat{p}_t = p] = p \cdot E[r_{t+1} \mid r_{t+1} > 0, \hat{p}_t = p] + (1-p) \cdot E[r_{t+1} \mid r_{t+1} \leq 0, \hat{p}_t = p]$

Denote the average up-move as $\mu^+$ and the average down-move magnitude as $\mu^-$ (both positive). In the simplest case where these are constant across $p$: $\mu(p) = p \cdot \mu^+ - (1-p) \cdot \mu^-$

Setting $\mu(\theta^{*}) = c$: $\theta^{*} \cdot \mu^+ - (1 - \theta^{*}) \cdot \mu^- = c$ $\theta^{*}(\mu^+ + \mu^-) = c + \mu^-$ $\theta^{*} = \frac{c + \mu^-}{\mu^+ + \mu^-}$

When $c = 0$ and $\mu^+ = \mu^-$: $\theta^{*} = 0.5$. When $c > 0$: $\theta^{*} > 0.5$.

**(ii) Why $\theta^{*} \neq 0.5$:**

The standard 0.5 threshold minimizes classification error -- it treats false positives and false negatives symmetrically. But in a trading context, the costs are asymmetric:

False positive (predict up, market goes down): you lose the negative return AND pay cost $c$. Total loss: $|r_{t+1}| + c$.
False negative (predict up but don't trade): you miss a gain, but you lose nothing. Total loss: $0$ (opportunity cost only).

Since false positives are more costly than false negatives (you pay $c$ every time you trade), you should demand a higher probability of being right before trading. This pushes $\theta^{*}$ above $0.5$.

Additionally, if down moves are larger than up moves on average ($\mu^- > \mu^+$, which is typical for equities -- "stairs up, elevator down"), the threshold shifts further above $0.5$ even without costs.

Answer: The optimal threshold is $\theta^{*} = \frac{c + \mu^-}{\mu^+ + \mu^-}$, where $\mu^+$ is the average up-move and $\mu^-$ is the average down-move magnitude. More generally, $\theta^{*}$ is the value where $E[r_{t+1} \mid \hat{p}_t = \theta^{*}] = c$. The threshold exceeds $0.5$ because trading costs make false positives more expensive than false negatives, so you need higher confidence before acting.

Intuition

This problem captures one of the most important lessons in systematic trading: the optimal decision threshold is not the one that maximizes prediction accuracy -- it is the one that maximizes expected profit. In classification terms, you are choosing the point on the ROC curve that maximizes a cost-weighted objective, not the point closest to the top-left corner. Transaction costs break the symmetry between false positives and false negatives, pushing the threshold higher than 0.5.

In practice, this means that even a model with high accuracy can lose money if the threshold is set too aggressively (too many small wins eaten by costs) or too conservatively (too few trades to cover fixed overhead). The optimal threshold depends not just on the model's calibration but also on the distribution of returns conditional on the signal. Getting this right is often the difference between a profitable strategy and an unprofitable one -- and it is exactly the kind of question a quant researcher should think about before a single backtest is run.

Open the full interactive solver →