Knowledge Gradient for Two Bernoulli Arms
You face a two-armed bandit problem with a one-step lookahead. Arm $i$ ($i = 1, 2$) has an unknown success probability $p_i$, with prior $p_i \sim \text{Beta}(\alpha_i, \beta_i)$.
You may collect one Bernoulli sample from one arm, then must play the empirically best arm once for a payoff of 1 on success and 0 on failure.
1. Derive a closed-form expression for the expected value of information (EVOI) of sampling arm $i$.
2. State the decision rule for which arm to sample.
3. Simplify the EVOI formula for the case of symmetric priors ($\alpha_1 = \beta_1 = \alpha_2 = \beta_2$).
Open the full interactive solver, hints, and worked solution →