Aaron Hakala

The power of two random choices

The major insight is that even a small amount of choice can lead to drastically different results in load balancing.

You have a hundred servers and need to distribute incoming requests across them. The intuitive approach is to find the least loaded server and send the request there. That gives perfect distribution, but it requires global knowledge: a central view of every server's current load. In a distributed system, that view is expensive to maintain and inevitably stale.

Stale global state is dangerous. If queue lengths are only updated periodically, "always go to the shortest queue" can perform worse than picking a server at random. Everyone sees the same stale snapshot, everyone rushes to the same apparently-short queue, and you get a thundering herd. "Use the best information available" sounds right, but globally-visible stale state creates correlated decisions, which is worse than randomness.

A surprisingly good middle ground

Instead of checking every server, pick two at random and send the request to the less loaded one. That's it. No global coordination, no shared state to go stale, no thundering herd. Each caller sees a different pair of candidates, so their decisions are naturally decorrelated. The jump from one to two choices is where nearly all the gain lives. A third probe only shaves a constant factor. This asymmetry is the core surprise of the result.

This principle shows up everywhere: hash tables with two hash functions, task scheduling with limited queue information, circuit routing, and distributed memory emulation. You don't need to find the best server. You just need to avoid the worst one.

In practice

The result translates directly to real systems. The implementation cost is negligible: one extra random probe, one comparison. Yet the effect under heavy load is dramatic. Say your servers are at 90% capacity. With the "find the best" approach using slightly stale data, you get herding and hot spots. With two random choices, load stays evenly spread, and wait times drop by an order of magnitude. The closer you get to saturation, the bigger the win. NGINX uses a variant of this as its "least connections among two" strategy. It's the default in many service meshes.

The takeaway

When designing any distributed assignment scheme (load balancers, hash tables, cache placement, task schedulers) probe two random candidates instead of one. The cost is trivial. The improvement is not.