Predicting Platformer Level Difficulty from its Layout
Each game level is a pandas DataFrame describing its layout left-to-right:
| column | meaning | |---|---| | x_position | horizontal position along the level | | obstacle_type | categorical (spike, pit, moving platform, ...) | | gap_size | width of the gap to clear (0 if none) | | enemy_count | enemies in that segment |
The target is a playtester difficulty rating. Train on rated levels, predict for new layouts.
Work through:
- How do you featurise a variable-length *spatial* sequence of obstacles?
- Levels belong to "worlds" (1-1, 1-2, …). Plot difficulty against position-within-world. What pattern emerges, and what does it imply about the label?
- How do you validate so that levels from the same world don't leak across folds?
Hints
- Aggregate the spatial sequence: obstacle density, spacing variance, the hardest local stretch (peak windowed density), and counts by obstacle type are all fixed-length features.
- Difficulty is rated within a 'world', so playtesters anchor to the other levels in that world — the hardest level in a world is over-rated relative to its absolute difficulty, and it resets each world.
- Group your cross-validation by world; levels in the same world share theme and reference set and will leak otherwise.
Worked Solution
How to Think About It: A platformer level is a variable-length spatial sequence, so this is the same featurise-then-regress shape as the hiking problem — and it shares the same label twist. Playtesters rate a level *relative to the other levels in its world*, so the difficulty label carries a within-world reference-point bias.
Key Insight: Ratings are world-relative. The hardest level in a world is over-rated and the scale resets at each world boundary, producing a sawtooth in difficulty vs. position-within-world.
The Method: 1. Featurise the layout. Obstacle density (count per unit length), spacing variance, peak local density over a sliding window (the hardest stretch), gap-size distribution (mean/max), enemy density, and one-hot counts by obstacle_type. Add a couple of "shape" features: does difficulty ramp monotonically or spike? 2. Reference-effect feature. Position within the world (index, or normalised rank) so the model can represent the playtesters' anchoring instead of fighting it. 3. Model. Gradient-boosted trees on the engineered features; a regularised linear model as an interpretable baseline. 4. Validate. Group cross-validation by world — levels in a world share theme and reference set, so a random split leaks.
Practical Considerations: The graded insight is spotting the within-world sawtooth in the residuals and modelling it. Watch for length confounding (longer levels feel harder — check performance within length bands) and category sparsity (rare obstacle types need pooling). Decide whether you predict the world-relative playtester rating or an absolute difficulty, and design features accordingly.
Answer: Featurise the spatial layout with density/spacing/peak/type aggregates, add a position-within-world feature to capture the playtesters' reference-point bias, fit gradient-boosted trees, and validate with world-grouped CV.
Intuition
A spatial layout becomes features the same way a path does: density, spacing, peaks, and type counts. The twist mirrors the hiking problem — playtesters rate a level relative to its world, so the label has a within-world reference-point bias that resets at world boundaries. Add a 'position within world' feature and group CV by world, and the problem is well-posed; ignore the reference effect and your residuals will show a sawtooth.