Forecasting an Article's 7-Day Pageviews from its First Six Hours

Question

Each article gives you an early-traffic pandas DataFrame plus its headline: | column | meaning | |---|---| | hour | hours since publication (0–6) | | cumulative_views | total views so far | and a headline string. The target is total views at day 7. Train on older articles whose 7-day totals are kno…

Accepted Answer

How to Think About It: This is early-trajectory extrapolation. The first six hours contain most of the predictive signal, so the work is (1) shape features that generalise to day 7 and (2) recognising that some early traffic is *exogenous promotion*, not organic demand. Headline text is a weak secondary signal. Key Insight: Step jumps in the early curve are usually editorial promotion — a treatment effect, not content quality. Mis-attributing them inflates predictions and corrupts the headline features. The Method: 1. Curve-shape features. Cumulative views at hour 6, the early growth rate, and curvature (fit a simple power-law or log-log slope to see if growth is accelerating or saturating). The shape of organic attention decay is fairly universal, so the early slope extrapolates. 2. Seasonality. Publication hour-of-day and day-of-week; normalise the early curve by the expected diurnal pattern. 3. Promotion detection. Flag discontinuities/step jumps in cumulative_views (a large residual against a smooth fit) and add a "was promoted" feature, rather than letting the jump leak into ot…

Forecasting an Article's 7-Day Pageviews from its First Six Hours

Hints

Worked Solution

Intuition