Predicting Platformer Level Difficulty from its Layout

Question

Each game level is a pandas DataFrame describing its layout left-to-right: | column | meaning | |---|---| | x_position | horizontal position along the level | | obstacle_type | categorical (spike, pit, moving platform, ...) | | gap_size | width of the gap to clear (0 if none) | | enemy_count | enem…

Accepted Answer

How to Think About It: A platformer level is a variable-length spatial sequence, so this is the same featurise-then-regress shape as the hiking problem — and it shares the same label twist. Playtesters rate a level *relative to the other levels in its world*, so the difficulty label carries a within-world reference-point bias. Key Insight: Ratings are world-relative. The hardest level in a world is over-rated and the scale resets at each world boundary, producing a sawtooth in difficulty vs. position-within-world. The Method: 1. Featurise the layout. Obstacle density (count per unit length), spacing variance, peak local density over a sliding window (the hardest stretch), gap-size distribution (mean/max), enemy density, and one-hot counts by obstacle_type. Add a couple of "shape" features: does difficulty ramp monotonically or spike? 2. Reference-effect feature. Position within the world (index, or normalised rank) so the model can represent the playtesters' anchoring instead of fighting it. 3. Model. Gradient-boosted trees on the engineered features; a regularised linear model as a…

Predicting Platformer Level Difficulty from its Layout

Hints

Worked Solution

Intuition