Bayesian Kelly Bet Sizing

Finance · Hard · Free problem

You are repeatedly betting on a biased coin with unknown heads probability $p$. Your prior on $p$ is $\text{Beta}(\alpha, \beta)$. Each round you choose a fraction $f \in [0, 1]$ of your current wealth to bet on heads at even odds -- heads multiplies your wealth by $(1 + f)$, tails by $(1 - f)$.

After observing $k$ heads in $n$ total flips:

  1. What is the posterior distribution of $p$?
  1. Derive the fraction $f^{*}$ that maximizes the expected log-wealth growth per round, $\mathbb{E}[\log W_{n+1} - \log W_n \mid \text{data}]$, under the posterior predictive distribution.
  1. What does $f^{*}$ reduce to when $\alpha = \beta = 1$ (uniform prior)?

Hints

  1. Start by identifying the conjugate prior for a Binomial likelihood -- updating the Beta parameters after observing $k$ heads in $n$ flips is one line.
  2. Write out $G(f) = \mathbb{E}[\log W_{n+1} - \log W_n \mid \text{data}]$ using the posterior predictive probability of heads $\hat{p}$. The objective has the same form as the classical Kelly problem with $\hat{p}$ replacing $p$.
  3. Set $dG/df = 0$ and solve for $f$. The first-order condition gives $f^{*} = 2\hat{p} - 1$; remember to enforce the constraint $f \geq 0$ -- do not bet when the posterior mean is at or below
    /2$.

Worked Solution

How to Think About It: This is the Kelly Criterion with a Bayesian twist. The classic Kelly formula -- bet $f^{*} = 2p - 1$ when the coin has known probability $p > 1/2$ -- is derived by maximizing expected log-wealth. Here $p$ is unknown, so you cannot plug it in directly. But the Beta-Binomial conjugate pair makes this tractable: the posterior is still Beta, and the posterior predictive probability of heads in the next flip is just the posterior mean $\hat{p} = (\alpha + k)/(\alpha + \beta + n)$. The punchline is that the optimal Kelly fraction under the posterior predictive is the same formula with $\hat{p}$ in place of $p$ -- but only because the log-wealth objective happens to be linear in the predictive probability of heads. That is worth verifying explicitly.

Quick Estimate: Take $\alpha = \beta = 1$ (uniform prior), $n = 20$, $k = 14$. Posterior mean: $\hat{p} = 15/22 \approx 0.682$. Classical Kelly says bet $f^{*} = 2(0.682) - 1 = 0.364$, or about 36% of wealth. Sanity check: if you had seen 10 heads in 20 flips, $\hat{p} = 11/22 = 0.5$ and $f^{*} = 0$, correctly saying do not bet on a fair coin. If you had seen 20 heads in 20 flips, $\hat{p} = 21/22 \approx 0.955$ and $f^{*} \approx 0.91$ -- aggressive but the coin looks very biased. The formula behaves correctly at the extremes.

Approach: Conjugate Bayesian update, then maximize $\mathbb{E}[\log W_{n+1} - \log W_n \mid \text{data}]$ over $f$ using the first-order condition.

Formal Solution:

Part 1 -- Posterior. The Beta distribution is the conjugate prior for the Binomial likelihood. Observing $k$ heads in $n$ flips, the posterior is:

$p \mid \text{data} \sim \text{Beta}(\alpha + k,\; \beta + n - k)$

The posterior mean (the predictive probability of heads on the next flip) is:

$\hat{p} = \frac{\alpha + k}{\alpha + \beta + n}$

Part 2 -- Optimal Kelly fraction. Let $\tilde{p}$ denote the posterior predictive probability of heads on the next flip. Under the posterior predictive, the next flip is heads with probability $\tilde{p} = \hat{p}$ and tails with probability

- \hat{p}$. The expected log-wealth increment is:

$G(f) = \mathbb{E}[\log W_{n+1} - \log W_n \mid \text{data}] = \hat{p} \log(1 + f) + (1 - \hat{p}) \log(1 - f)$

This is identical in form to the classical Kelly objective with $\hat{p}$ in place of $p$. Taking the derivative and setting it to zero:

$\frac{dG}{df} = \frac{\hat{p}}{1 + f} - \frac{1 - \hat{p}}{1 - f} = 0$

Cross-multiplying:

$\hat{p}(1 - f) = (1 - \hat{p})(1 + f)$

$\hat{p} - \hat{p} f = 1 - \hat{p} + f - \hat{p} f$

$2\hat{p} - 1 = f$

So the optimal fraction is:

$f^{*} = 2\hat{p} - 1 = \frac{2(\alpha + k) - (\alpha + \beta + n)}{\alpha + \beta + n} = \frac{\alpha - \beta + 2k - n}{\alpha + \beta + n}$

This is valid when $\hat{p} > 1/2$, i.e., $f^{*} > 0$. If $\hat{p} \leq 1/2$, the optimal bet is $f^{*} = 0$ (do not bet). The second derivative $d^2G/df^2 = -\hat{p}/(1+f)^2 - (1-\hat{p})/(1-f)^2 < 0$ confirms this is a maximum.

Part 3 -- Uniform prior. With $\alpha = \beta = 1$:

$\hat{p} = \frac{1 + k}{2 + n}, \qquad f^{*} = \frac{2k - n}{2 + n}$

This is the Laplace-smoothed heads fraction turned into a Kelly bet. Note that even if you observed $k = n$ (all heads), $f^{*} = n/(n+2) < 1$ -- the uniform prior prevents you from betting your entire wealth.

Answer: Posterior is $\text{Beta}(\alpha + k,\; \beta + n - k)$ with mean $\hat{p} = (\alpha + k)/(\alpha + \beta + n)$. Optimal Bayesian Kelly fraction:

$f^{*} = \max\!\left(0,\; 2\hat{p} - 1\right) = \max\!\left(0,\; \frac{\alpha - \beta + 2k - n}{\alpha + \beta + n}\right)$

Intuition

The Kelly Criterion says: bet the fraction of your wealth equal to your edge. With a known coin of probability $p$, the edge is

p - 1$. The Bayesian extension is conceptually clean -- replace the unknown $p$ with your best estimate of it, the posterior mean $\hat{p}$. What makes this more than just a heuristic is that it is provably optimal: under the posterior predictive distribution, $G(f)$ is strictly concave in $f$, and the first-order condition pins down exactly $f^{*} = 2\hat{p} - 1$. The Beta-Binomial conjugacy is what keeps the posterior mean in closed form and makes the whole derivation a few lines.

In practice, Bayesian Kelly sizing shows up in any regime where you have genuine parameter uncertainty -- which is to say, almost everywhere in trading. A pure frequentist would plug in $k/n$ directly; a Bayesian shrinks toward the prior mean, betting less aggressively when the sample is small. This matters enormously at the start of a trading strategy's life: with only 10 trades observed, the difference between $k/n$ and $\hat{p}$ can be the difference between a responsible position size and a ruinous one. The prior $\alpha, \beta$ encodes your pre-data belief about how biased coins (or strategies) tend to be -- a tight prior near

/2$ is appropriate skepticism about any new system.

Open the full interactive solver →