Every success has siblings and often you only meet one

Distributional ThinkingProbability

Mar 5, 20269 min

There are things you need to do to be successful. Focus on one thing and excel at it. Don't dilute your attention. Don't lose the purity of your vision. Keep it simple. Strip out everything that isn't essential. Never compromise on values. Don't chase money. Build the thing right and the money follows.

Advice that shows up again and again, first promoted by Jan Koum. Maybe you've heard of him. He built a small little app called WhatsApp, which he sold to Facebook in 2014 for nineteen billion dollars. You want to be as successful as him? Then focus, simplify, refuse to compromise, stay pure.

Or maybe not. In 2009, when Koum started WhatsApp, at least five other teams were building the same thing. All of them ran the same playbook. Six teams, same principles, same window, same opportunity. Only one of them sold for nineteen billion. The others got acquired for parts, absorbed into other companies or shut down. So either the same set of principles worked one time and failed five times or the principles were not the sole cause.

If you ran the Koum strategy again from the same starting conditions, with the same founder, the same skill, the same values, the same simplicity obsession, you would get a distribution of outcomes. The nineteen-billion-dollar outcome is one draw from that distribution. Probably a really extreme, fat-tailed one. Fat-tailed means extremely rare but huge in impact. More on it later. Pluchino, Biondo and Rapisarda (2018) made this concrete in a simulation. They built a population with normally distributed talent and let agents accumulate success through random encounters with lucky and unlucky events. The biggest winners were people of moderately above-average talent who happened to encounter many lucky events. Talent set the ceiling. The path through the field of accidents decided where inside it you landed.

When most people look at the story of Jan Koum, they immediately jump to the conclusion that the inputs that produced it must contain a transferable lesson. Read the principles, copy the principles, get closer to the result. This is how almost all advice from successful people is portrayed and consumed. We see what happened and our pattern-matching storytelling machinery goes to work on it.

But almost nobody is asking "What else could have happened?" If I ran this again from the same starting conditions, what would the next draw probably look like? What was the shape of the range it came from? Were the good and bad draws roughly symmetric? Asking these questions is Distributional Thinking. It says the outcome you saw was one draw from a hidden range of outcomes that could have unfolded from the same starting conditions. Holding the unseen draws in your head alongside the visible one.

Where intuition was trained

Say the outcome is IQ. Draw 102 and you are slightly above the mean. Draw 75 and you are in the bottom 5%. Draw 138 and you are in the top 2%. All of these are draws from a Gaussian, the bell shape. Heights within a sex sit there too. So do measurement errors in a careful experiment.

Gaussian Distribution

A symmetric distribution. The mean and the median sit in the same place, so the average lands where intuition expects it.

The Gaussian is the shape your intuition was trained on. Symmetric. The mean and the median agree. Extremes are rare and shrink fast as you move away from the centre. As far as you have any control over the outcome, which you obviously don't with IQ or height, the best strategy is to plan around the mean and expect variance to shrink with sample size.

The trap is what happens when you take this intuition outside the few domains where it actually applies.

Moving beyond symmetry

Bell curves are tidy. The mean sits in the middle, the median sits in the same place and "average" means roughly what your intuition thinks it means.

Now picture something different. You ask one hundred people what they earn. Most cluster somewhere reasonable but a handful earn five to ten times the others. A different shape entirely, built by different underlying logic.

Lognormal Distribution

A skewed distribution. A long right tail pulls the mean above the median, so the average sits higher than the typical case.

A lognormal distribution is what you get when the thing you are measuring is built by multiplication rather than addition. Height is additive. Your final height is the sum of many small genetic and developmental contributions, each pushing a little up or down. A salary can be multiplicative. A 10% raise on top of a 10% raise compounds. When small effects multiply instead of adding, the distribution stops being symmetric. It develops a long right tail and the mean drifts away from the median.

This is where the trouble starts. In a Gaussian, half the data sits above the mean and half below. In a lognormal, most of it sits below. The right tail does the heavy lifting. A handful of very high salaries pull the average up to a number that almost nobody actually experiences. The mean is a real number in the sense that you get it by adding everything up and dividing. It also describes almost nobody.

It shows up in project completion times, time to recover from illness, length of phone calls, adult body weights, time to product-market fit, etc.

Are you in the tail or not?

You just learned to distrust the mean. Now learn to distrust the median too.

In a lognormal world, most outcomes sit below the mean but they still cluster. A typical salary, a typical recovery time. These things exist. You can point at it and say "this is what most people experience." In a Pareto world, that sentence stops making sense.

Pareto Distribution

Most outcomes pile up near the peak, where the median sits. But the rare giant draws out in the tail drag the mean far to its right, out where almost nothing actually lands.

Picture book sales. Most published books sell a few hundred copies. A small number sell a few thousand. A tiny number sell millions. If you take the median you get something like four hundred copies, which is technically true and completely useless, because the entire economics of publishing is decided by the handful of books that sell a thousand times more than that. The median describes the typical book. The market is run by the atypical ones.

A specific dynamic of the Pareto distribution is that outcomes feed on themselves. A book that sells well gets recommended, which makes it sell more, which gets it onto bestseller lists, which makes it sell more again. A startup that raises one good round attracts better talent, which produces better results, which makes the next round easier. A city that grows attracts more people, which makes it grow faster. The mechanism is not multiplication of small independent effects, which is what made the lognormal. It is feedback. Success compounds into more success and the gap between the top and the rest opens without a ceiling. Barabási and Albert (1999) named this mechanism preferential attachment. New nodes joining a network connect with probability proportional to how many connections existing nodes already have. The result is a power-law degree distribution, generated by feedback rather than by any underlying difference in node quality.

The shape this produces has no meaningful "typical." The top 20% holds 80% of the outcomes. Inside that top 20%, the same rule applies again. The top 4% holds 64%. Inside that, the top 0.8% holds about half of everything. The structure repeats at every scale. There is no level at which the distribution settles down and starts "behaving."

The universe we live in is mostly ruled by such dynamics. Mass of stars, earthquake magnitudes, populations of cities, war casualties, startup outcomes, paper citations, file sizes on the internet, returns from venture capital, followers on social platforms. Clauset, Shalizi and Newman (2009) tested two dozen real-world datasets claimed to follow Power Laws and found the claim held in most. Where the fit fails, it tends to fail toward something even heavier-tailed than Pareto.

The planning consequence is different again and harder. In a Gaussian world you plan around the mean. In a lognormal world you plan around the median. In a Pareto world neither number is the question. The question is which side of the distribution you are on. If your outcome is determined by a Pareto process, the average outcome and the median outcome are both irrelevant to you. What matters is whether you are in the tail or not. The tail is where almost all the value lives. A venture fund does not make money on its average investment. It makes money on the one in fifty that returns the whole fund. This is also why "expected value" thinking, which works fine in a Gaussian and limps along in a lognormal, breaks down completely here. The expected value of a Pareto outcome is dominated by events so rare that you cannot estimate their probability from your sample.

Accept that most attempts will look like failures by Gaussian standards. The question is not "Will this be average?" but "Is this the kind of thing where being in the top fraction of a percent is possible? And what does it cost me?" Under any power-law domain the right move is the same. Maximize exposure to the tail.

There is much more to cover, as these have only been the foundational shapes. More groups as well as finer distinctions remain. Gaussian basin includes Gaussian, Binomial, Chi-squared and others. Poisson family includes Poisson, Geometric, Negative binomial. Extreme Value family includes Gumbel, Fréchet, Generalized Pareto. Beta family includes Beta(2, 2), Beta(0.5, 0.5) arcsine, Beta(2, 5), Beta(0.7, 3). Heavy-tailed but symmetric includes Student-t df=3, Laplace, Logistic, Hyperbolic secant. Topics for future essays.

What you are holding in your hands

The story you read at the start was not wrong. Jan Koum did focus. He did refuse to compromise. He did keep the thing simple. The mistake is in what we do with that information. We treat the visible draw as the recipe and assume that running the recipe again will produce something close to the result. But Koum's outcome lives in the tail of a Pareto distribution. The principles he followed describe what got him into the game. They do not describe why he, rather than one of the other five teams, walked out with nineteen billion.

This is the part most advice is quiet about. Most advice-givers are probably not dishonest people, but they genuinely cannot see the draws that did not happen. They lived inside one and the others are invisible to them. So they reach for the inputs they can name and offer those as the explanation.

Distributional thinking is holding the visible draw and the invisible ones in your head at the same time. You ask what shape the distribution has, where the action lives inside it and whether your strategy is exposing you to the part where the action actually happens. In a Gaussian world, the answer is a sensible average and a steady plan. In a lognormal world, it is the median and a respect for how heavy the tail can pull. In a Pareto world, it is a single question. Are you positioned where the rare event LuckThe Geometry of LuckRead the essay can find you and can you afford to wait Path DependencyEvery decision is the death of a thousand possibilitiesRead the essay until it does.

Most of what people call strategy is really just the inputs that improve your odds of being in the room when the draw happens. The draw itself is not in your hands. What is in your hands is whether you are still in the room when it does.

References

Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.

Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703.

Pluchino, A., Biondo, A. E., & Rapisarda, A. (2018). Talent versus luck: The role of randomness in success and failure. Advances in Complex Systems, 21(03–04), 1850014.

Where intuition was trained

Moving beyond symmetry

Are you in the tail or not?

What you are holding in your hands

References

The Geometry of Luck

Why Long Losing Streaks Are Completely Normal

Every decision is the death of a thousand possibilities