Optimal Stopping with Absorbing Perfect Squares
You roll a fair six-sided die repeatedly, keeping a running sum $S$. If $S$ ever lands exactly on a perfect square ($4, 9, 16, 25, 36, 49, \ldots$), the game ends immediately with payoff $\$0$. At any point before rolling, you may choose to stop and collect $S$ as your payoff.
- Formulate the Bellman optimality equation for $V(s)$, the maximum expected payoff starting from a current sum of $s$.
- Compute $V(35)$ exactly using backward induction and determine whether the optimal action at $s = 35$ is to stop or roll.
Hints
- Think of this as a Markov decision process with absorbing states at every perfect square. At each non-square state $s$, you choose between stopping (payoff $s$) and rolling (transition to $s + k$ for $k$ uniform on $\{1,\ldots,6\}$). Write the Bellman equation for $V(s)$.
- Solve by backward induction starting from known boundary values. The perfect squares $36$ and $49$ bracket the critical region. First determine $V(s)$ for states $43$-$48$ (just below $49$), noting that $V(49) = 0$ heavily penalizes rolling from nearby states.
- From $s = 35$, a roll of $ sends you to $36 = 6^2$ (payoff $0$), while rolls of$-$6$ land in $\{37, \ldots, 41\}$ where $V(s) \approx 44$-$45$. Compute $(1/6)(0 + V(37) + \cdots + V(41))$ and compare to $35$.
Worked Solution
How to Think About It: At $s = 35$, you are one pip away from $36 = 6^2$, an absorbing state that pays zero. Rolling a 1 kills you. That sounds dangerous -- but look at the other side. Rolls of 2 through 6 land you at $37$-$41$, which sit comfortably in the gap between the perfect squares $36$ and $49$ (a gap of 13, well beyond die range). From those safe states, you have room to accumulate more before the next threat at $49$. So the question is really: does the $5/6$ chance of landing in a safe zone worth roughly $45$ each outweigh the
Quick Estimate: Let us put numbers on this. From $s = 35$, the six outcomes are $\{36, 37, 38, 39, 40, 41\}$. We know $V(36) = 0$ (perfect square). For states $37$-$41$, these are in the safe zone between squares $36$ and $49$, meaning no single roll from any of them can hit a perfect square. From these safe states, you can roll again risk-free, accumulating toward the $43$-$48$ range where the values stabilize around $43$-$46$. A rough estimate: each of $V(37)$ through $V(41)$ is worth about $44$-$45$ (you are around $37$-$41$ but can safely roll up by an expected $3.5$ per roll, and the states near $43$-$45$ stop at their face value). Using $\bar{V} \approx 45$ for the five safe outcomes:
$\text{Roll value} \approx \frac{1}{6}\bigl(0 + 5 \times 45\bigr) = \frac{225}{6} = 37.5$
Since $37.5 > 35$, the quick estimate says roll. The exact calculation confirms $V(35) \approx 37.41$, remarkably close to this back-of-envelope figure.
Approach: Formulate the Bellman equation for the MDP, then solve by backward induction starting from states near the next few perfect squares and working back to $s = 35$.
Formal Solution
Part 1: Bellman Equation
Define $V(s)$ as the optimal expected payoff when the current sum is $s$ and it is your turn to decide.
- If $s$ is a perfect square, the game is already over: $V(s) = 0$.
- Otherwise, you choose between stopping (payoff $s$) and rolling (expected continuation value):
$V(s) = \begin{cases} 0 & \text{if } s \text{ is a perfect square} \\\\ \displaystyle\max\!\left(s, \;\frac{1}{6}\sum_{k=1}^{6} V(s+k)\right) & \text{otherwise} \end{cases}$
This is the standard Bellman optimality equation for a Markov decision process with absorbing states. The optimal policy is a state-dependent threshold: at each state, compare the certain payoff $s$ against the expected value of continuing.
Part 2: Backward Induction for $V(35)$
From $s = 35$, rolling gives outcomes $\{36, 37, 38, 39, 40, 41\}$. Since $36 = 6^2$, we have $V(36) = 0$. We need $V(37)$ through $V(41)$, which depend on further states. The relevant perfect squares are $36, 49, 64, 81, \ldots$, with gaps of
We solve layer by layer, starting far enough out that stopping is clearly optimal and working backward.
Layer 1 -- States near $49 = 7^2$.
States $43$-$48$ can reach the absorbing state $49$ in one roll:
- $s = 43$: roll value $\approx 38.83 < 43$. Stop. The $V(49) = 0$ drags the average down too much.
- $s = 44$: roll value $\approx 41.46 < 44$. Stop.
- $s = 45$: roll value $\approx 43.94 < 45$. Stop.
- $s = 46$: roll value $\approx 46.18 > 46$. Roll. Five of six outcomes avoid $49$, and states beyond $49$ have high value.
- $s = 47$: roll value $\approx 48.08 > 47$. Roll.
- $s = 48$: roll value $\approx 49.73 > 48$. Roll.
So $V(43) = 43$, $V(44) = 44$, $V(45) = 45$. States $43$-$45$ stop because they are too close to $49$ with insufficient upside, while $46$-$48$ roll because the probability of clearing $49$ is high enough and the beyond-$49$ values are large.
Layer 2 -- States $37$-$42$ (safe zone between $36$ and $49$).
From any state in $\{37, \ldots, 42\}$, the reachable states are $\{38, \ldots, 48\}$. No perfect square is reachable in one step from this zone (the nearest are $36$ below and $49$ above, both out of range). Rolling is therefore risk-free -- no immediate absorption can occur. The continuation values are:
| $s$ | Roll Value | $V(s)$ | Action | |-----|-----------|---------|--------| | 42 | 46.00 | 46.00 | Roll | | 41 | 45.38 | 45.38 | Roll | | 40 | 44.92 | 44.92 | Roll | | 39 | 44.72 | 44.72 | Roll | | 38 | 44.67 | 44.67 | Roll | | 37 | 44.78 | 44.78 | Roll |
All of these roll, since the continuation value (around $44$-$46$) far exceeds the stop value ($37$-$42$).
Layer 3 -- $V(35)$.
Now we assemble the final calculation:
$V(35) = \max\!\left(35, \;\frac{1}{6}\bigl(V(36) + V(37) + V(38) + V(39) + V(40) + V(41)\bigr)\right)$
$= \max\!\left(35, \;\frac{1}{6}(0 + 44.78 + 44.67 + 44.72 + 44.92 + 45.38)\right)$
$= \max\!\left(35, \;\frac{224.47}{6}\right) = \max(35,\; 37.41)$
The exact value, computed with rational arithmetic through full backward induction, is:
$V(35) = \frac{459{,}477{,}755{,}144{,}647{,}312{,}610{,}215}{12{,}281{,}884{,}428{,}929{,}630{,}994{,}432} \approx 37.411$
Since $37.41 > 35$, the expected value of rolling exceeds the stop payoff.
Answer: $V(35) \approx 37.41$. The optimal action at $s = 35$ is to roll. Despite the
Intuition
The core lesson is that proximity to an absorbing boundary does not automatically make stopping optimal -- what matters is the asymmetry between the disaster probability and the upside. At $s = 35$, one-sixth of outcomes kill you, but five-sixths vault you into a comfortable safe zone worth about $45$ each. The weighted average wins handily. This is a recurring pattern in optimal stopping problems: the stop region clusters just below absorbing barriers (here, states $43$-$45$ stop because they face $49$ with insufficient upside), while states further from the next barrier prefer to continue.
In practice, this type of analysis appears whenever you have a process with absorbing boundaries and a choice of when to exit. Think of a trader deciding whether to hold a position that could hit a stop-loss (absorbing at zero P&L) or continue toward a target. The answer depends on the geometry of the boundaries -- how far you are from each, and what the payoff landscape looks like beyond them. Backward induction from the absorbing states is the standard tool, and the key intuition is always the same: map out the danger zones (near boundaries) and safe zones (in the gaps), then let the Bellman equation tell you where the threshold falls.