MDP Formulation for Stock Trading

Optimization · Medium · Free problem
You want to trade a single stock over a finite horizon of $T$ time steps to maximize your expected total profit. At each step you observe the current stock price $S_t$ and whether you currently hold the stock or not. At each time step, you can take one of three actions: **buy** one share (if you have no position), **sell** your share (if you are holding), or **hold** (do nothing). Buying costs you $S_t$ and selling earns you $S_t$. Assume you can only hold at most one share at a time and there are no transaction costs beyond the price itself. The stock price follows a discrete-time Markov process -- that is, $S_{t+1}$ depends only on $S_t$, not on the full history. 1. Formulate this problem as a Markov Decision Process: define the state space, action space, transition probabilities, and reward function. 2. Write down the Bellman equation for the value function $V(t, s, h)$ where $t$ is time, $s$ is the stock price, and $h \in \{0, 1\}$ indicates whether you hold the stock. 3. How would you solve this MDP in practice? Discuss the approach and any complications that arise when the price space is large or continuous.

Open the full interactive solver, hints, and worked solution →