Longest Substring with at Most K Distinct Characters
Coding · Medium · Free problem
Given a string $S$ of length $n$ over a fixed alphabet (e.g., lowercase English letters) and a non-negative integer $K$, find the length of the longest contiguous substring of $S$ that contains at most $K$ distinct characters.
Constraints: - $0 \leq K \leq 26$ - $0 \leq n \leq 10^5$ - $S$ consists of lowercase English letters
Edge cases: - If $K = 0$, return $0$ (no characters allowed). - If $S$ is empty, return $0$. - If $K \geq$ number of distinct characters in $S$, return $n$.
Examples:
Input: $S = $ "eceba", $K = 2$. Output: $3$. Explanation: The longest substring with at most 2 distinct characters is "ece" (length 3).
Input: $S = $ "aabbcc", $K = 1$. Output:
$. Explanation: The longest substring with at most 1 distinct character is "aa" or "bb" or "cc", each of length 2.
Input: $S = $ "aabbcc", $K = 3$. Output: $6$. Explanation: The entire string has exactly 3 distinct characters.
Target: $O(n)$ time, $O(1)$ extra space (alphabet size is fixed).
Hints
Think about maintaining a window of characters. What invariant should the window satisfy, and when do you need to shrink it?
Use a sliding window with two pointers. Keep a frequency count of each character in the window and track how many distinct characters are present.
When the distinct count exceeds $K$, advance the left pointer until you drop below the limit. Both pointers move forward only, giving $O(n)$ total work.
Worked Solution
How to Think About It: This is a classic sliding window problem. You maintain a window $[l, r]$ that represents the current candidate substring. As you expand the right boundary, you might add new distinct characters. When the number of distinct characters exceeds $K$, you shrink from the left until you are back to at most $K$ distinct characters. The key observation is that both $l$ and $r$ only move forward, so the total work is $O(n)$.
Algorithm:
1. Maintain an array count[26] tracking the frequency of each character in the current window. 2. Maintain a variable distinct counting how many characters have non-zero frequency. 3. Use two pointers l = 0 and r = 0. Expand r one step at a time: - Add S[r] to the window: if count[S[r]] was 0, increment distinct; increment count[S[r]]. - While distinct > K: remove S[l] from the window (decrement count, if it hits 0 decrement distinct), then increment l. - Update best = max(best, r - l + 1). 4. Return best.
Code:
```python def longest_substring_k_distinct(s: str, k: int) -> int: if k == 0 or not s: return 0
count = [0] * 26 distinct = 0 best = 0 l = 0
for r in range(len(s)): idx = ord(s[r]) - ord('a') if count[idx] == 0: distinct += 1 count[idx] += 1
while distinct > k: left_idx = ord(s[l]) - ord('a') count[left_idx] -= 1 if count[left_idx] == 0: distinct -= 1 l += 1
best = max(best, r - l + 1)
return best ```
Complexity: - Time: $O(n)$. Each character is added to the window once (when r advances) and removed at most once (when l advances). Both pointers make a single pass. - Space: $O(1)$ extra. The count array has fixed size 26 (alphabet size), independent of $n$.
Answer: Use a sliding window with two pointers and a fixed-size character frequency array. Both pointers traverse the string once, giving $O(n)$ time and $O(1)$ space. Track the distinct count to know when to shrink the window.
Intuition
The sliding window / two-pointer technique is one of the most important patterns for substring and subarray problems. The core idea is that if you have a "valid" window and you expand it, it might become invalid -- but you can always fix it by shrinking from the other side. As long as both pointers only move forward, the total work is linear regardless of how many shrink steps happen inside the loop.
This exact problem shows up in practice when you need to find segments of data with bounded diversity -- for example, finding the longest run of trades involving at most $K$ different instruments, or the longest stretch of log entries from at most $K$ distinct sources. The fixed-alphabet assumption is what gives you $O(1)$ space; if the alphabet were unbounded, you would need a hash map and the space would be $O(K)$.