Knowledge Gradient for Two Bernoulli Arms

Optimization · Hard · Free problem

You face a two-armed bandit problem with a one-step lookahead. Arm $i$ ($i = 1, 2$) has an unknown success probability $p_i$, with prior $p_i \sim \text{Beta}(\alpha_i, \beta_i)$. You may collect one Bernoulli sample from one arm, then must play the empirically best arm once for a payoff of 1 on success and 0 on failure. 1. Derive a closed-form expression for the expected value of information (EVOI) of sampling arm $i$. 2. State the decision rule for which arm to sample. 3. Simplify the EVOI formula for the case of symmetric priors ($\alpha_1 = \beta_1 = \alpha_2 = \beta_2$).

Open the full interactive solver, hints, and worked solution →