80s. (The exact value turns out to be about
87.94.)
Approach: Solve the exact finite-horizon DP. Let the state be $(t, a, b)$ = (turns remaining, dollars in box 1, dollars in box 2); by symmetry $V_t(a,b)=V_t(b,a)$. Define $V_t(a,b)$ as the maximum expected future payoff. Terminal condition $V_0(a,b)=0$. The two actions give:
$\text{Place: } P_t(a,b) = \tfrac12 V_{t-1}(a+2,\,b) + \tfrac12 V_{t-1}(a,\,b+2)$ $\text{Take: } T_t(a,b) = \tfrac12\big[a + V_{t-1}(0,\,b)\big] + \tfrac12\big[b + V_{t-1}(a,\,0)\big]$ $V_t(a,b) = \max\{P_t(a,b),\, T_t(a,b)\}.$
We want $V_{100}(0,0)$, computed exactly with rational arithmetic.
Formal Solution: Backward induction on the recursion above (memoized over all reachable even balances $a,b$ with $a+b\le 200$) yields the optimal value. The optimal *policy* is intuitive and confirms why a fixed block plan is suboptimal:
- The first move from $(0,0)$ is place, and with many turns remaining (here roughly $t\ge 8$) placing is always optimal -- you build the pot.
- As the horizon shrinks, a take threshold kicks in: take only once the pot is large relative to turns left. For example, with $t=1$ left you take any positive pot; with $t=2$ left you take once the pot reaches $3$ units ($\$6$); with $t=3$, once it reaches $7$ units; the threshold grows steeply as $t$ rises. Crucially the policy *interleaves* -- after a take empties one box you may place again to refill it -- which a place-then-take block can never do.
Carrying out the exact DP (denominator
00$ turns) gives
$V_{100}(0,0) = \frac{7444974155886261287010383660543}{39614081257132168796771975168} = \frac{7444974155886261287010383660543}{2^{95}} \approx 187.93757.$
Sanity check against the block strategy. Restricting to "place $N$ times, then take
00-N$ times" gives the inferior value $f(N)=N\big(2-2^{-(99-N)}\big)$, maximized at $N=94$ with $f(94)=2961/16=\
85.0625$. The true optimum $\
87.9376$ strictly exceeds this, exactly because the optimal policy adapts to the state and interleaves places and takes rather than committing to a single switch point.
Answer: Under optimal play the expected total payoff is $\dfrac{7444974155886261287010383660543}{2^{95}} \approx \
87.94$ (more precisely $\
87.93756966270317$).
Intuition
This problem is really about balancing accumulation against extraction risk. Each placement grows the pot by
, but the pot is trapped behind a randomized gate -- you need your takes to cover both boxes. With $k$ takes, the chance of missing a box entirely is
^{-(k-1)}$, which drops exponentially. So even a modest number of takes (6 in this case) gives you a 97% chance of sweeping everything. The marginal value of one more take decreases exponentially, while the marginal value of one more place is a steady
. That is why the optimal split leaves so few rounds for taking.
This structure appears often in trading and resource allocation: you have a phase where you build a position and a phase where you liquidate, and the liquidation mechanism has some friction or randomness. The lesson is that when extraction success improves exponentially with the number of attempts, you should be aggressive about accumulation and lean on the exponential to protect you during extraction. The same logic applies to market making with random fill rates, or to coupon-collector setups where you need to cover multiple states.
Open the full interactive solver →