Expected Number and Variance of Records in a Random Permutation

Expectation · Medium · Free problem

Draw a uniformly random permutation of $\{1, 2, \ldots, n\}$. Call position $k$ a record if the value at position $k$ is strictly larger than all values at positions

, 2, \ldots, k-1$ (position 1 is always a record). Let $R_n$ denote the total number of records.

  1. Compute $E[R_n]$ exactly.
  1. Compute $\text{Var}(R_n)$ exactly.
  1. For large $n$, give simple approximations for both and explain the connection to harmonic numbers.

Hints

  1. Define an indicator variable $I_k$ for whether position $k$ is a record. What is the probability that the $k$-th element is the largest among the first $k$?
  2. By symmetry of random permutations, $P(I_k = 1) = 1/k$. For the variance, check whether the record indicators at different positions are correlated -- consider what happens when you condition on the relative ranking of the first $k$ values.
  3. The indicators $I_1, I_2, \ldots, I_n$ are mutually independent. So $\text{Var}(R_n) = \sum_{k=1}^{n} \text{Var}(I_k) = \sum 1/k - \sum 1/k^2$. Use $H_n \approx \ln n + \gamma$ for the asymptotics.

Worked Solution

How to Think About It: The trick with records is to avoid thinking about the permutation globally. Instead, ask a local question: what is the probability that position $k$ holds a record? That only depends on the relative ranking of the first $k$ entries, and by symmetry of random permutations, each of those $k$ values is equally likely to be the largest. So the probability of a record at position $k$ is

/k$. Once you see that, linearity of expectation hands you the answer immediately. Variance requires checking whether the indicators are correlated -- and the beautiful surprise is that they are independent.

Quick Estimate: For $n = 10$: $E[R_{10}] = H_{10} = 1 + 1/2 + 1/3 + \cdots + 1/10 \approx 2.93$. So out of 10 positions, you expect roughly 3 records. The variance is $H_{10} - \sum_{k=1}^{10} 1/k^2 \approx 2.93 - 1.55 \approx 1.38$, giving a standard deviation around

.17$. For large $n$, $H_n \approx \ln n + 0.5772$, so $E[R_{100}] \approx \ln 100 + 0.58 \approx 5.19$ and $\text{Var}(R_{100}) \approx 5.19 - \pi^2/6 \approx 5.19 - 1.64 \approx 3.54$. Records are rare and grow logarithmically.

Approach: Define indicator random variables and exploit symmetry of random permutations, then verify independence of the indicators.

Formal Solution:

Define $I_k = \mathbf{1}\{\text{position } k \text{ is a record}\}$ for $k = 1, 2, \ldots, n$. Then $R_n = \sum_{k=1}^{n} I_k$.

Step 1: Compute $P(I_k = 1)$.

Position $k$ is a record if and only if the value at position $k$ is the maximum of the first $k$ values. In a uniformly random permutation, the first $k$ values are an exchangeable set -- any of the $k$ values is equally likely to be in position $k$. So:

$P(I_k = 1) = \frac{1}{k}$

Step 2: Compute $E[R_n]$.

By linearity of expectation:

$E[R_n] = \sum_{k=1}^{n} E[I_k] = \sum_{k=1}^{n} \frac{1}{k} = H_n$

where $H_n$ is the $n$-th harmonic number.

Step 3: Show independence of $I_j$ and $I_k$.

For $j < k$, we need $P(I_j = 1, I_k = 1)$. Position $j$ is a record if the value at $j$ is the max of positions

, \ldots, j$. Position $k$ is a record if the value at $k$ is the max of positions
, \ldots, k$. Consider the relative ordering of the first $k$ values. There are $k!$ equally likely orderings. We need the $j$-th element to be the largest among the first $j$, AND the $k$-th element to be the largest among all $k$. The number of such orderings: choose the largest of the $k$ values for position $k$ (forced), then among the remaining $k-1$ values in positions
, \ldots, k-1$, the largest of positions
, \ldots, j$ must be in position $j$. These are independent constraints on disjoint parts of the ordering, giving:

$P(I_j = 1, I_k = 1) = \frac{1}{j} \cdot \frac{1}{k} = P(I_j = 1) \cdot P(I_k = 1)$

So $I_j$ and $I_k$ are independent for all $j \neq k$.

Step 4: Compute $\text{Var}(R_n)$.

Since the indicators are independent:

$\text{Var}(R_n) = \sum_{k=1}^{n} \text{Var}(I_k) = \sum_{k=1}^{n} \frac{1}{k}\left(1 - \frac{1}{k}\right) = \sum_{k=1}^{n} \frac{1}{k} - \sum_{k=1}^{n} \frac{1}{k^2}$

$\text{Var}(R_n) = H_n - H_n^{(2)}$

where $H_n^{(2)} = \sum_{k=1}^{n} 1/k^2$ is the $n$-th generalized harmonic number of order 2.

Step 5: Large-$n$ asymptotics.

The harmonic number satisfies:

$H_n = \ln n + \gamma + O(1/n)$

where $\gamma \approx 0.5772$ is the Euler-Mascheroni constant. The generalized harmonic number converges:

$H_n^{(2)} \to \frac{\pi^2}{6} \approx 1.6449$

So for large $n$:

$E[R_n] \approx \ln n + \gamma$

$\text{Var}(R_n) \approx \ln n + \gamma - \frac{\pi^2}{6}$

Since both the mean and the variance grow like $\ln n$, and the record indicators are independent Bernoullis, the CLT gives:

$\frac{R_n - \ln n}{\sqrt{\ln n}} \xrightarrow{d} N(0, 1)$

Answer:

$E[R_n] = H_n = \sum_{k=1}^{n} \frac{1}{k}$

$\text{Var}(R_n) = H_n - H_n^{(2)} = \sum_{k=1}^{n} \frac{1}{k} - \sum_{k=1}^{n} \frac{1}{k^2}$

For large $n$: $E[R_n] \approx \ln n + 0.5772$ and $\text{Var}(R_n) \approx \ln n - 1.07$. Records grow logarithmically -- in a million-entry permutation, you expect only about 14 records.

Intuition

This problem showcases the power of indicator variables and symmetry. Instead of reasoning about the global structure of a random permutation -- which is combinatorially nightmarish -- you zoom in on each position and ask a simple local question: is this spot a record? The symmetry of random permutations makes the answer trivially

/k$, and linearity of expectation does the rest. The surprise is that the indicators are not just uncorrelated but fully independent, which is unusual for combinatorial problems and makes the variance calculation clean.

The connection to harmonic numbers runs deep in combinatorics and computer science. The expected number of records $H_n \approx \ln n$ appears in the analysis of algorithms (quicksort comparisons, skip list levels, coupon collector variants) and in extreme value theory. In a trading context, if you track daily all-time highs of a stock price modeled as an exchangeable sequence, you would see about $\ln n$ new records in $n$ days -- far fewer than people intuitively expect. This mismatch between intuition and reality is why record-breaking events (new market highs) feel significant but are actually statistically inevitable even in a trendless market.

Open the full interactive solver →