Place or Take: Two-Box Optimal Stopping

Expectation · Hard · Free problem

You are playing a one-player game with two opaque boxes. The game lasts exactly 100 turns. On each turn, you choose one of two actions:

  • Place: A third party puts $\
$ into one of the two boxes, chosen uniformly at random.
  • Take: One of the two boxes is chosen uniformly at random and emptied -- you pocket whatever was inside.
  • You must choose "place" or "take" on every turn (no skipping). Under optimal play, what is the expected total payoff?

    Hints

    1. Think about what happens if you place for $N$ turns and then take for the remaining
      00 - N$ turns. What is the total pot, and what could go wrong during the taking phase?
    2. The only way you fail to collect everything is if all your takes happen to land on the same box. What is the probability of this, and how much do you lose when it happens?
    3. The expected payoff is $f(N) = N(2 - 2^{-(99-N)})$. Evaluate this for $N$ near 94 to find the optimum.

    Worked Solution

    How to Think About It: Each turn you either *place* ( dropped into a uniformly random box) or *take* (a uniformly random box is emptied into your pocket). The tension is between hoarding and harvesting: placing grows the total in the boxes but pays nothing now, while taking converts box contents to cash but only from one randomly chosen box. The key subtlety -- and where a naive "place for a while, then take for the rest" plan goes wrong -- is that the decision should depend on the *current state* (turns left and how much sits in each box), not on a fixed pre-committed switch point. When lots of turns remain you want to keep building; near the end you should take whenever the pot is already large, and you can even profitably place again *after* a take to refill a box you just drained. This is a finite-horizon dynamic-programming (optimal-stopping) problem, and the optimal policy is an interleaved threshold rule, not a single block of places followed by a single block of takes.

    Quick Estimate: If you could always collect everything you place, 100 placements would yield 00. You can never quite achieve that -- some turns must be spent taking, and takes are random -- so the answer should be a bit under 00 but well above the

    50-
    85 range that crude block strategies give. A good ballpark is "almost all of
    00, minus a modest endgame inefficiency," i.e. high
    80s. (The exact value turns out to be about
    87.94.)

    Approach: Solve the exact finite-horizon DP. Let the state be $(t, a, b)$ = (turns remaining, dollars in box 1, dollars in box 2); by symmetry $V_t(a,b)=V_t(b,a)$. Define $V_t(a,b)$ as the maximum expected future payoff. Terminal condition $V_0(a,b)=0$. The two actions give:

    $\text{Place: } P_t(a,b) = \tfrac12 V_{t-1}(a+2,\,b) + \tfrac12 V_{t-1}(a,\,b+2)$ $\text{Take: } T_t(a,b) = \tfrac12\big[a + V_{t-1}(0,\,b)\big] + \tfrac12\big[b + V_{t-1}(a,\,0)\big]$ $V_t(a,b) = \max\{P_t(a,b),\, T_t(a,b)\}.$

    We want $V_{100}(0,0)$, computed exactly with rational arithmetic.

    Formal Solution: Backward induction on the recursion above (memoized over all reachable even balances $a,b$ with $a+b\le 200$) yields the optimal value. The optimal *policy* is intuitive and confirms why a fixed block plan is suboptimal:

    • The first move from $(0,0)$ is place, and with many turns remaining (here roughly $t\ge 8$) placing is always optimal -- you build the pot.
    • As the horizon shrinks, a take threshold kicks in: take only once the pot is large relative to turns left. For example, with $t=1$ left you take any positive pot; with $t=2$ left you take once the pot reaches $3$ units ($\$6$); with $t=3$, once it reaches $7$ units; the threshold grows steeply as $t$ rises. Crucially the policy *interleaves* -- after a take empties one box you may place again to refill it -- which a place-then-take block can never do.

    Carrying out the exact DP (denominator

    ^{95}$, since the only randomness is fair coin flips over
    00$ turns) gives

    $V_{100}(0,0) = \frac{7444974155886261287010383660543}{39614081257132168796771975168} = \frac{7444974155886261287010383660543}{2^{95}} \approx 187.93757.$

    Sanity check against the block strategy. Restricting to "place $N$ times, then take

    00-N$ times" gives the inferior value $f(N)=N\big(2-2^{-(99-N)}\big)$, maximized at $N=94$ with $f(94)=2961/16=\
    85.0625$. The true optimum $\
    87.9376$ strictly exceeds this, exactly because the optimal policy adapts to the state and interleaves places and takes rather than committing to a single switch point.

    Answer: Under optimal play the expected total payoff is $\dfrac{7444974155886261287010383660543}{2^{95}} \approx \

    87.94$ (more precisely $\
    87.93756966270317$).

    Intuition

    This problem is really about balancing accumulation against extraction risk. Each placement grows the pot by , but the pot is trapped behind a randomized gate -- you need your takes to cover both boxes. With $k$ takes, the chance of missing a box entirely is ^{-(k-1)}$, which drops exponentially. So even a modest number of takes (6 in this case) gives you a 97% chance of sweeping everything. The marginal value of one more take decreases exponentially, while the marginal value of one more place is a steady . That is why the optimal split leaves so few rounds for taking.

    This structure appears often in trading and resource allocation: you have a phase where you build a position and a phase where you liquidate, and the liquidation mechanism has some friction or randomness. The lesson is that when extraction success improves exponentially with the number of attempts, you should be aggressive about accumulation and lean on the exponential to protect you during extraction. The same logic applies to market making with random fill rates, or to coupon-collector setups where you need to cover multiple states.

    Open the full interactive solver →