Tic-Tac-Toe AI via Minimax
Describe how you would implement an AI to play Tic-Tac-Toe optimally. Your answer should cover:
- How to model the game and define the value of a position.
- The minimax algorithm -- how it works and why it guarantees optimal play.
- At least two concrete optimizations that reduce the search space in practice.
What is the game-theoretic outcome of Tic-Tac-Toe under optimal play from both sides?
Hints
- Model the game as a tree where each node is a board position and each edge is a legal move; the leaves are terminal positions with values +1, -1, or 0.
- Minimax works by backward induction: at X's turn, pick the child with maximum value; at O's turn, pick the child with minimum value. Propagate values up to the root.
- Alpha-beta pruning eliminates branches where the current player's guaranteed outcome is already worse than what they could achieve elsewhere -- track the best guaranteed values for each player as you search.
Worked Solution
How to Think About It: Tic-Tac-Toe is a two-player, zero-sum, perfect-information game with a finite game tree. That combination has a known solution method: minimax (backward induction). One player (X) tries to maximize the outcome; the other (O) tries to minimize it. If you search the entire game tree and both players play optimally, the result is fully determined. The interesting parts of this problem are (1) the algorithm structure, (2) the optimizations that make it practical, and (3) the actual game value.
Quick Estimate: The state space is tiny by modern standards. The board has at most $9$ cells, so a loose upper bound on raw labelings is $3^9 = 19{,}683$, and the number of board states that can actually arise in legal play is even smaller. So an exhaustive minimax search is trivially feasible (microseconds), and the only real question is the game-theoretic value -- which, as for many simple symmetric games, turns out to be a draw under best play.
Approach: Model positions as Trie/game-tree nodes, value terminal positions, run minimax (backward induction), and add pruning/symmetry/memoization to shrink the work. Then read the value of the start position to get the outcome.
Formal Solution:
1. Game representation and position value. Model the board as a $3\times 3$ grid with cell states empty, X, O -- e.g. a length-9 array or a bitmask. Moves from a position are the empty cells. A position is terminal if (a) some player has three marks in a line (row, column, or diagonal), or (b) the board is full with no line. Value a terminal position from X's point of view: $+1$ if X has three in a row, $-1$ if O does, $0$ if the board is full with no winner. The value of a non-terminal position is *defined* by best play below it, which minimax computes.
2. Minimax algorithm and why it is optimal. Recurse to the bottom of the tree and propagate values up. At a terminal node return its score. At a non-terminal node it is some player's turn: - on X's move, the value is the $\max$ over the children's values; - on O's move, the value is the $\min$ over the children's values.
```python def minimax(board, is_maximizing): score = evaluate(board) # +1 / -1 / 0 / None if score is not None: # terminal return score if is_maximizing: # X's turn return max(minimax(play(board, m, 'X'), False) for m in available_moves(board)) else: # O's turn return min(minimax(play(board, m, 'O'), True) for m in available_moves(board)) ```
*Why it is optimal (backward induction on tree height):* terminal nodes are valued correctly by definition. If every child of a node already carries its correct best-play value, the player to move can do no better than picking the child best for them (max for X, min for O); any other move hands the opponent a position at least as good for the opponent. Hence the value is correct at that node, and inductively at the root. Neither side can deviate to improve -- exactly what optimal play means.
3. Optimizations (each preserves the minimax value).
*(a) Alpha-beta pruning.* Carry the best score the maximizer is already guaranteed ($\alpha$) and the best the minimizer is already guaranteed ($\beta$). If $\alpha \ge \beta$ at a node, stop expanding its remaining children: the side that moved into this node would never actually allow it, so the rest of that subtree cannot change the root value. It returns the *same* answer as full minimax, visiting fewer nodes; with good move ordering it cuts the effective branching factor from $b$ to about $\sqrt{b}$ ($O(b^{d/2})$ vs $O(b^d)$).
*(b) Symmetry reduction.* The $3\times 3$ board has $8$ symmetries (the dihedral group $D_4$: $4$ rotations and $4$ reflections). Positions related by a symmetry have the same value, so canonicalize each position (e.g. to the lexicographically smallest of its $8$ images) and treat equivalents as one. This collapses the distinct opening moves from $9$ to just $3$ (corner, edge, center) and shrinks the search throughout.
*(c) Memoization / transposition table.* Different move orders reach the same board (a transposition); cache each position's value keyed by its canonical form so every distinct position is solved once. A related lever is move ordering -- trying strong moves (center, then corners) first makes alpha-beta prune sooner -- which changes the order of work, not the value.
A note on the counts (these are commonly mislabeled): the figures