Expected Length of a Gamma-Coded Codeword
Gamma coding encodes a positive integer $x \geq 1$ as a binary string using three steps:
- Compute $N = \lfloor \log_2(x) \rfloor$.
- Write $N$ zero bits.
- Write $x$ in binary, which takes $N + 1$ bits.
The total codeword length is
For example, $x = 7$ has $N = 2$, so its gamma code is $00\,111$. And $x = 21$ has $N = 4$, giving $0000\,10101$.
Now generate an infinite string of random bits, where each bit is independently
Find $E[F]$ when $p = \frac{1}{3}$.
Hints
- The codeword length is entirely determined by the number of leading zeros. What familiar distribution describes the wait for the first $?
- If the first
$ appears at position $O$, then $N = O - 1$ leading zeros, and the total codeword length isN + 1$. Express $F$ in terms of $O$.- Use $O \sim \text{Geom}(p)$ with $E[O] = 1/p$ and apply linearity: $E[F] = 2E[O] - 1$.
Worked Solution
How to Think About It: The first codeword starts at the beginning of the random bit string. Its length is completely determined by how many leading zeros appear before the first
$. Once you see that first$, you know $N$ (the number of leading zeros), and you just have to read $N$ more bits to finish the codeword. So the total length is $N$ zeros $+$ $(N+1)$ bits of the encoded integer $= 2N + 1$. The problem reduces to: what is the expected number of leading zeros in a random binary string with bit probability $p$?/3$, so the number of leading zeros before the firstQuick Estimate: Each bit is $0$ with probability
$ is geometric-ish. The expected position of the first$ is/p = 3$, so $N$ (the number of zeros before it) is $3 - 1 = 2$ on average. The codeword length isN + 1 = 5$. Done -- the estimate gives the exact answer here because the relationship is linear.Approach: Let $Z$ be the number of leading zeros (bits before the first
$). Relate $Z$ to a geometric random variable and use linearity of expectation.Formal Solution:
Define $Z$ as the number of $0$-bits that appear before the first
$ in the string. Each bit is$ with probability $p$, independently. Let $O$ be the position of the first$ (i.e., the first-success trial number), so $O \sim \text{Geom}(p)$ with $E[O] = 1/p$. Then $Z = O - 1$, since the $O$-th bit is the first$ and the preceding $O - 1$ bits are all $0$.N + 1$. So the codeword length is a simple linear function of the number of leading zeros, which in a random bit string is just a shifted geometric random variable. Once you see that, the whole problem collapses to one line of algebra via linearity of expectation.The gamma codeword structure is: $Z$ zero-bits followed by $(Z + 1)$ bits encoding the integer. So the codeword length is:
$F = Z + (Z + 1) = 2Z + 1$
By linearity of expectation:
$E[F] = 2E[Z] + 1 = 2(E[O] - 1) + 1 = \frac{2}{p} - 1$
Substituting $p = 1/3$:
$E[F] = \frac{2}{1/3} - 1 = 6 - 1 = 5$
Answer: $E[F] = \dfrac{2}{p} - 1 = 5$ when $p = \dfrac{1}{3}$.
Intuition
The key insight is that in gamma coding, the "header" (the leading zeros) and the "payload" (the binary representation) have the same length up to one extra bit. Specifically, if there are $N$ leading zeros, the payload is $N + 1$ bits, making the total
This is a nice example of a problem that looks like it needs information theory but is really just geometric distributions in disguise. In quant interviews, encoding and compression problems often boil down to "how long until something happens?" -- which is almost always geometric or exponential. The general formula $E[F] = 2/p - 1$ also gives you quick intuition: as $p$ decreases (more zeros in the stream), codewords get longer because you are more likely to see long runs of zeros that inflate both the header and the payload.
- If the first