Predicting Platformer Level Difficulty from its Layout

Machine Learning · Medium · Free problem

Each game level is a pandas DataFrame describing its layout left-to-right:

| column | meaning | |---|---| | x_position | horizontal position along the level | | obstacle_type | categorical (spike, pit, moving platform, ...) | | gap_size | width of the gap to clear (0 if none) | | enemy_count | enemies in that segment |

The target is a playtester difficulty rating. Train on rated levels, predict for new layouts.

Work through:

  1. How do you featurise a variable-length *spatial* sequence of obstacles?
  2. Levels belong to "worlds" (1-1, 1-2, …). Plot difficulty against position-within-world. What pattern emerges, and what does it imply about the label?
  3. How do you validate so that levels from the same world don't leak across folds?

Hints

  1. Aggregate the spatial sequence: obstacle density, spacing variance, the hardest local stretch (peak windowed density), and counts by obstacle type are all fixed-length features.
  2. Difficulty is rated within a 'world', so playtesters anchor to the other levels in that world — the hardest level in a world is over-rated relative to its absolute difficulty, and it resets each world.
  3. Group your cross-validation by world; levels in the same world share theme and reference set and will leak otherwise.

Worked Solution

How to Think About It: A platformer level is a variable-length spatial sequence, so this is the same featurise-then-regress shape as the hiking problem — and it shares the same label twist. Playtesters rate a level *relative to the other levels in its world*, so the difficulty label carries a within-world reference-point bias.

Key Insight: Ratings are world-relative. The hardest level in a world is over-rated and the scale resets at each world boundary, producing a sawtooth in difficulty vs. position-within-world.

The Method: 1. Featurise the layout. Obstacle density (count per unit length), spacing variance, peak local density over a sliding window (the hardest stretch), gap-size distribution (mean/max), enemy density, and one-hot counts by obstacle_type. Add a couple of "shape" features: does difficulty ramp monotonically or spike? 2. Reference-effect feature. Position within the world (index, or normalised rank) so the model can represent the playtesters' anchoring instead of fighting it. 3. Model. Gradient-boosted trees on the engineered features; a regularised linear model as an interpretable baseline. 4. Validate. Group cross-validation by world — levels in a world share theme and reference set, so a random split leaks.

Practical Considerations: The graded insight is spotting the within-world sawtooth in the residuals and modelling it. Watch for length confounding (longer levels feel harder — check performance within length bands) and category sparsity (rare obstacle types need pooling). Decide whether you predict the world-relative playtester rating or an absolute difficulty, and design features accordingly.

Answer: Featurise the spatial layout with density/spacing/peak/type aggregates, add a position-within-world feature to capture the playtesters' reference-point bias, fit gradient-boosted trees, and validate with world-grouped CV.

Intuition

A spatial layout becomes features the same way a path does: density, spacing, peaks, and type counts. The twist mirrors the hiking problem — playtesters rate a level relative to its world, so the label has a within-world reference-point bias that resets at world boundaries. Add a 'position within world' feature and group CV by world, and the problem is well-posed; ignore the reference effect and your residuals will show a sawtooth.

Open the full interactive solver →