Optimal Stopping on a Fair Coin Random Walk

Optimization · Hard · Free problem

You flip a fair coin repeatedly. After each flip, your running score $d$ changes by $+1$ (heads) or $-1$ (tails), starting at $d = 0$. At any time you may stop and collect payoff $d$. However, if $d$ ever drops to $-3$ (tails exceed heads by 3), the game ends immediately and you receive $-3$.

Set up the Bellman equation for the value function $V(d)$, where $d \in \{-2, -1, 0, 1, 2, \ldots\}$.

Solve for $V(d)$ exactly and determine the optimal stopping rule.

Verify your answer by computing the expected payoff of any threshold strategy "stop when $d = k

Loading problems...

Hints

The running score $d_n$ is a symmetric random walk -- what do you know about the expected value of a martingale at a stopping time?

Write the Bellman equation $V(d) = \max(d,\, \tfrac{1}{2}V(d+1) + \tfrac{1}{2}V(d-1))$ and try the guess $V(d) = d$. What happens to the continuation value?

Verify with gambler's ruin: from state $d$, the probability of reaching target $k$ before $-3$ is $(d+3)/(k+3)$. Compute the expected payoff of the threshold strategy and confirm it equals $d$.

Worked Solution

How to Think About It: The first thing to notice is that the running score $d_n = (\text{heads}) - (\text{tails})$ is a symmetric random walk -- a martingale. In any optimal stopping problem, when the underlying process is a martingale, the continuation value from any state equals the current state (by the optional stopping theorem). So your gut should say: the value of being at state $d$ is just $d$, and it does not matter when you stop. The absorbing barrier at $-3$ looks like it should hurt you, but the payoff at the barrier ($-3$) is exactly consistent with the martingale, so it introduces no distortion. That is the punchline -- let us prove it.

Quick Estimate: Consider starting at $d = 0$ and using the strategy "stop when you first reach $d = k$." By gambler's ruin for a symmetric walk on $\{-3, -2, \ldots, k\}$, the probability of hitting $k$ before $-3$ is

$P(\text{hit } k \text{ before } {-3} \mid d = 0) = \frac{0 - (-3)}{k - (-3)} = \frac{3}{k + 3}.$

The expected payoff is

$k \cdot \frac{3}{k+3} + (-3) \cdot \frac{k}{k+3} = \frac{3k - 3k}{k+3} = 0.$

No matter how large a target $k$ you pick, the expected payoff from $d = 0$ is exactly $0$. The upside from reaching $k$ is perfectly offset by the risk of absorption at $-3$. This strongly suggests $V(0) = 0$, and more generally $V(d) = d$.

Approach: Set up the Bellman equation and show that $V(d) = d$ satisfies it.

Formal Solution:

*Step 1 -- Bellman equation.* At each state $d \geq -2$, the player chooses to stop (payoff $d$) or continue (pay nothing, flip, and move to $d+1$ or $d-1$ each with probability

/2$). The boundary condition is $V(-3) = -3$. The Bellman equation is:

$V(d) = \max\!\left(d,\; \tfrac{1}{2}\,V(d+1) + \tfrac{1}{2}\,V(d-1)\right), \quad d \geq -2,$

with $V(-3) = -3$.

*Step 2 -- Guess and verify.* Claim: $V(d) = d$ for all $d \geq -3$.

Check the Bellman equation: the continuation value at state $d$ is

$\tfrac{1}{2}\,V(d+1) + \tfrac{1}{2}\,V(d-1) = \tfrac{1}{2}(d+1) + \tfrac{1}{2}(d-1) = d.$

So $V(d) = \max(d,\, d) = d$. The boundary condition gives $V(-3) = -3$. Both conditions are satisfied.

*Step 3 -- Why is this the correct value function (not just a solution to the Bellman equation)?*

We must check that no strategy can beat $V(d) = d$. Consider any stopping strategy starting from state $d_0$. The process $d_n$ is a martingale (fair coin), so for any bounded stopping time $\tau$, $E[d_\tau] = d_0$. But every strategy either stops voluntarily at some finite time or is absorbed at $-3$. Since the symmetric random walk is recurrent, it hits $-3$ with probability 1 if the player never stops. Thus the player cannot do better than $d_0$ in expectation.

Conversely, stopping immediately achieves exactly $d_0$. So $V(d) = d$.

*Step 4 -- Optimal stopping rule.* The player is indifferent between stopping and continuing at every state $d$. Any stopping rule is optimal, including stopping immediately. There is no unique optimal threshold -- every threshold $d^{*} \in \{-2, -1, 0, 1, 2, \ldots\}$ yields the same expected payoff.

*Step 5 -- Verification via gambler's ruin.* From state $d$ using the threshold strategy "stop at $d = k

Loading problems...

quot; (for $k > d$), the gambler's ruin probability on $\{-3, \ldots, k\}$ gives

$P(\text{hit } k \text{ before } {-3} \mid \text{start at } d) = \frac{d + 3}{k + 3}.$

The expected payoff is

$k \cdot \frac{d+3}{k+3} + (-3) \cdot \frac{k - d}{k+3} = \frac{kd + 3k - 3k + 3d}{k+3} = \frac{d(k+3)}{k+3} = d.$

This confirms $V(d) = d$ for every starting state.

Answer: The value function is $V(d) = d$ for all $d \geq -3$. Every stopping rule is optimal -- in particular, stopping immediately is optimal. The absorbing barrier at $d = -3$ with payoff $-3$ is consistent with the martingale property of the fair random walk, so it introduces no penalty beyond the current state value. The Bellman equation $V(d) = \max(d,\, \tfrac{1}{2}V(d+1) + \tfrac{1}{2}V(d-1))$ is satisfied by $V(d) = d$ because the continuation value equals the stopping value at every state.

Intuition

This problem is a beautiful illustration of why martingales make optimal stopping trivial -- or at least degenerate. When the underlying process is a martingale (fair coin, no drift), the optional stopping theorem tells you that the expected payoff at any stopping time equals the starting value. No amount of clever timing can beat the current score. The absorbing barrier at $d = -3$ looks dangerous, but because the forced payoff ($-3$) equals the process value at that state, it is perfectly consistent with the martingale property and introduces no additional cost.

In practice, this shows up whenever you are asked to time an exit from a fair game. If the game has no edge (no drift), the expected profit from waiting is zero -- you cannot create alpha from timing alone. This is the core intuition behind the efficient market hypothesis applied to trading: if prices are martingales, no stopping rule beats buy-and-hold. The common mistake is thinking you can "wait for a big win" -- gambler's ruin shows that the probability of reaching a large target shrinks in exact proportion to the target size, keeping the expected payoff pinned at the current level.