You are given historical taxi trip data including pickup/dropoff locations, timestamps, fares, tips, and trip durations. 1. How would you design a model to predict the most profitable routes for a taxi driver throughout the day? 2. Discuss your choice of target variable, feature engineering, modeli…

How to Think About It: The core challenge is not just predicting fare amounts -- it is predicting profit per unit time. A $50 fare that takes 2 hours and burns $20 in fuel is worse than a $15 fare that takes 15 minutes. The driver's scarce resource is time, so the target variable must be a rate, not a total. The second challenge is that this is a sequential decision problem: after dropping off a passenger, the driver must decide where to go next, and that choice affects future opportunities. Key Insight: The right target variable is profit per hour, not total profit per trip. This accounts for both trip duration and dead time between fares. The Method: Step 1: Define the target variable. $\text{Profit per hour} = \frac{\text{fare} + \text{tip} - \text{fuel cost}}{\text{trip time} + \text{wait time for next fare}}$ The denominator is critical -- it includes the time spent waiting or cruising for the next passenger. Without this, you bias toward long airport runs that look profitable per-trip but waste time. Step 2: Feature engineering. - Spatial features: Pickup location (grid cell…

Designing a Profitable Taxi Route Prediction Model

Machine Learning · Medium · Free problem

You are given historical taxi trip data including pickup/dropoff locations, timestamps, fares, tips, and trip durations.

How would you design a model to predict the most profitable routes for a taxi driver throughout the day?

Discuss your choice of target variable, feature engineering, modeling approach, and how you would evaluate the system.

Hints

The target variable should be profit per hour, not profit per trip. Why does the denominator matter so much?
Think about what features capture demand variation: time of day, location clusters, weather, and events all create predictable demand patterns.
This is fundamentally a sequential decision problem. A greedy policy (pick the best next action) is a strong baseline, but reinforcement learning can capture multi-step value.

Worked Solution

How to Think About It: The core challenge is not just predicting fare amounts -- it is predicting profit per unit time. A $50 fare that takes 2 hours and burns

Intuition

This problem tests whether you can frame a real-world optimization as a machine learning pipeline. The most common mistake is treating it as a simple regression (predict fare from features) without thinking about what the driver actually needs to optimize. A driver does not care about the fare of a single trip in isolation -- they care about their earning rate across the entire shift. This means the target variable, the feature set, and the evaluation metric all need to reflect the time dimension.

The deeper lesson is about the gap between prediction and decision-making. A great fare prediction model is useless if it does not tell the driver what to do next. The decision layer -- whether greedy or RL-based -- is where the value is created. In quant finance, this is analogous to the difference between a return forecast and a portfolio optimizer: the forecast is necessary but not sufficient.