Alternating Addition Game to a Target

Game Theory · Medium · Free problem

Two players take turns adding an integer from $\{1, 2, 3\}$ to a running total that starts at $0$. The first player to make the total reach or exceed a target $N$ wins.

Assume both players play optimally.

  1. Give a dynamic programming algorithm that, for a general target $N$, determines for each total $t \in \{0, 1, \ldots, N\}$ whether the next player to move has a winning strategy. Your algorithm should run in $O(N)$ time and $O(N)$ space.
  1. Apply your algorithm by hand for $N = 20$. Does the first or second player have a winning strategy? Describe the optimal play from the starting position.
  1. Briefly discuss how this framework extends to more general impartial combinatorial games where the set of allowed moves may differ from state to state.

Hints

  1. Work backwards from the target. Which positions within striking distance of $N$ are immediate wins?
  2. A position is losing if and only if every move from it leads to a winning position for the opponent. Look for a repeating pattern in the $W/L$ labels.
  3. The losing positions are multiples of
    + 2 + 3 + 1 = 4$. For $N = 20$, check whether $0$ is a multiple of 4 to determine who wins.

Worked Solution

How to Think About It: This is a classic combinatorial game that reduces to a modular arithmetic pattern. The key insight is to work backwards from the target. Any position within striking distance of $N$ (i.e., $N-1$, $N-2$, $N-3$) is an immediate win for the player to move. A position is losing if and only if every move from it leads to a winning position for the opponent. Once you see that the losing positions repeat with period 4, the entire analysis clicks into place -- and the DP is just the formal version of discovering that pattern.

Algorithm:

Label each state $t$ as $W$ (winning for the player to move) or $L$ (losing for the player to move), working backwards from $t = N$ down to $t = 0$.

  • Base cases: Any state $t \geq N$ is terminal -- the player who just moved already won, so these states are not reachable as "your turn" states. For the DP, set $W[t] = \text{true}$ for $t = N$ (you can win immediately if you are already at the target), and conceptually any $t > N$ is also a win for the previous mover.

More precisely, define:

$W[t] = \begin{cases} \text{true} & \text{if } \exists \, m \in \{1,2,3\} \text{ such that } t + m \geq N \text{ or } W[t+m] = \text{false} \\ \text{false} & \text{otherwise} \end{cases}$

  • A state is $W$ if you can move to some state that is $L$ for your opponent (or directly reach/exceed $N$).
  • A state is $L$ if every move you make leaves the opponent in a $W$ state.

Pseudocode:

```python def solve_game(N): # W[t] = True means the player to move from total t can force a win W = [False] * (N + 1)

# Fill from t = N down to 0 for t in range(N, -1, -1): for m in [1, 2, 3]: if t + m >= N: # Can reach or exceed target -- immediate win W[t] = True break elif not W[t + m]: # Opponent is in a losing state after this move W[t] = True break

return W ```

This runs in $O(N)$ time (each state does at most 3 lookups) and $O(N)$ space.

Part 2 -- applying the algorithm to $N = 20$:

Working backwards, the $W/L$ labels are:

| $t$ | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| | Label | -- | W | W | W | L | W | W | W | L | W | W | W | L | W | W | W | L | W | W | W | L |

  • $t = 19, 18, 17$: can reach $\geq 20$ in one move, so $W$.
  • $t = 16$: moves go to
    7(W), 18(W), 19(W)$ -- all winning for the opponent, so $L$.
  • $t = 15, 14, 13$: can move to
    6(L)$, so $W$.
  • $t = 12$: moves go to
    3(W), 14(W), 15(W)$, so $L$.
  • The pattern continues: $L$ positions are $\{0, 4, 8, 12, 16\}$ -- exactly the multiples of 4.

Since $t = 0$ is $L$, the second player has a winning strategy.

The second player's optimal strategy: after each move by the first player (who adds $m$), the second player adds $4 - m$. This ensures the total increases by exactly 4 on each pair of turns, hitting $4, 8, 12, 16, 20$ in sequence. Since

0 = 5 \times 4$, the second player always makes the total reach the next multiple of 4 and eventually lands on 20.

From the starting position $t = 0$, the first player has no winning move -- every move (add 1, 2, or 3) leaves the total at a $W$ position for the second player. The first player is forced to lose against optimal play.

Part 3 -- generalization:

This framework extends naturally to any impartial combinatorial game under normal play convention (last player to move wins). The generalization is Sprague-Grundy theory:

The DP approach remains the same: iterate backwards over states and compute each label (or Grundy number) from successors. The complexity depends on the state space size and the branching factor at each state.

Answer: The DP labels each state $W$ or $L$ in $O(N)$ time by backward induction. For $N = 20$, the losing positions are multiples of 4, so $t = 0$ is $L$ -- the second player wins by always complementing the first player's move to 4. The general framework is Sprague-Grundy theory, which assigns nimbers to states via the mex function and handles composite games via XOR.

Intuition

This game is a thinly disguised Nim variant, and the core lesson is that modular arithmetic governs who wins. When moves are drawn from $\{1, 2, \ldots, k\}$, the losing positions are always multiples of $k + 1$, because the second player can always "mirror" the first player's move to maintain that invariant. The DP is the formal proof, but the pattern is visible almost immediately once you compute a few states by hand. In an interview, showing that you can spot the period-4 structure before writing any code signals strong combinatorial intuition.

The deeper takeaway is Sprague-Grundy theory: every impartial game under normal play is equivalent to a Nim heap, and the Grundy number (computed by mex over successors) fully characterizes the strategic value of each position. This comes up in quant interviews not just as a puzzle but because the backward-induction structure is identical to pricing American options or solving optimal stopping problems -- you evaluate terminal states first, then propagate values backward through the state space. The habit of thinking "what are the base cases, and how do I propagate?" transfers directly to dynamic programming in pricing and risk.

Open the full interactive solver →