Probability
made simple,
made vivid
You don't need a maths degree to understand distributions. Every concept is explained in plain language first — the formulas are just the shorthand on top.
Building blocks: axioms, random variables, PMF/PDF/CDF, moments, and convergence.
From coin flips to waiting times. Live parameter sliders and plain-English descriptions.
Overlay any two distributions on the same axes. Gold vs Teal.
"Why does everything look like a bell curve?" Watch the answer emerge live.
Estimate π with darts. Sample any distribution. Bootstrap confidence intervals.
Six complete worked problems: insurance, queues, genetics, finance, and more.
The Language of Chance
Understanding what probability distributions actually are — before a single formula
A probability distribution is just a recipe that tells you how likely different outcomes are. Roll a die: each face has a 1-in-6 chance. That simple description is a distribution.
Think of probability as a bag of weights. Each coloured marble represents an outcome — the fraction of that colour is its probability. A distribution is just the complete description of what's in the bag.
In 1933, Andrei Kolmogorov reduced all of probability theory to three simple rules. Everything else follows from these.
From just these three rules, mathematicians have derived centuries of probability theory — like how all of geometry follows from a handful of axioms.
All Possible Outcomes
Rolling a die: Ω = {1,2,3,4,5,6}. The universe of everything that could happen.
How Likely Is A?
A number in [0,1]. P=0 means impossible. P=1 means certain. P=0.5 is a fair coin.
Cumulative Probability
F(x) = "What's the chance of getting ≤ x?" Always goes from 0 to 1.
A Number from Randomness
A rule that assigns a number to each outcome. "The die face showing" is a random variable.
There are two kinds of random quantities: things you count (whole numbers — you can't have 2.7 children) and things you measure (any value on a scale — temperature can be 37.3°C). These need different mathematical tools.
Probability Mass Function (PMF)
Like a bar graph. Each bar shows exactly how probable that specific value is.
Examples: Coin flips, goals in a match, emails per hour, die face.
Probability Density Function (PDF)
Like a smooth curve where the area under it equals probability.
Examples: Height, weight, waiting time, temperature, stock returns.
The gold curve is the PDF — its height is density. The teal curve is the CDF — it accumulates area from left to right.
Want P(X ≤ 1.96)? Read off the CDF at x = 1.96. For a Standard Normal, that's ≈ 97.5%. The CDF is the most practically useful function in all of statistics.
A distribution's shape can be summarised with four numbers. Mean = centre, Variance = spread, Skewness = lean, Kurtosis = tail weight. Together, these "moments" give you a thumbnail sketch of any distribution.
Mean (μ) — The Centre
Where does the distribution "balance"? The expected value of X.
Variance (σ²) — The Spread
How far from the mean do values typically stray? σ is in the same units as X.
Skewness — The Lean
Positive: long tail right (income). Negative: long tail left. Normal = 0.
Kurtosis — Tail Weight
Excess kurtosis > 0: "fat tails" — extreme events more common than Normal predicts.
When you take more and more samples, things start to behave predictably. "Convergence" describes the different ways a sequence of random experiments can "settle down" to a fixed answer.
Almost Sure (Strongest) — With probability 1, values eventually get stuck at the limit and stay there. The Strong Law of Large Numbers says your sample mean converges this way.
Mean Square — On average, squared distance to the limit shrinks to zero. Useful in signal processing and estimation theory.
In Probability — The probability of being far from the limit vanishes. Excursions become increasingly rare. The Weak LLN lives here.
In Distribution (Weakest) — The histogram shape converges to a fixed shape. This is all the Central Limit Theorem requires.
Chebyshev's Inequality: no matter what distribution (with finite mean & variance), at most 1/k² of probability lives more than k standard deviations from the mean. Universal guarantee — but conservative.
Test scores: mean = 70, sd = 10. Chebyshev says: at most 25% of students score below 50 or above 90 (k=2). A universal guarantee.
8 Essential Shapes
Select a distribution to explore its shape, parameters, and real-world uses.
See the Difference
Overlay any two distributions on the same axes. Gold = A, Teal = B.
Different distributions model different physical processes. Comparing side-by-side reveals why choosing the right model matters.
KL divergence measures how different distribution B is from A. Zero = identical. Higher = more different.
Two Profound Theorems
The Central Limit Theorem and Law of Large Numbers are why statistics works at all
The CLT says: no matter what distribution you're sampling from (finite mean & variance), the distribution of sample averages will look like a bell curve — if you take enough samples.
Measure heights of 30 random people; record average. Repeat 2000 times. Those averages will follow a bell curve — even if individual heights don't. Averaging smooths out asymmetry.
Try n=1 with Exponential — you see a skewed histogram. Increase n to 30 — it becomes a near-perfect bell curve. The most important result in all of statistics.
The LLN says: the more data you collect, the closer your sample average gets to the true population mean.
A casino doesn't gamble — each bet is random but across millions, the LLN guarantees their edge accumulates. Random on the small scale, predictable at scale.
Randomness in Action
Watch probability theory come to life through simulation
Throw darts at a square. A circle fits inside. The fraction landing inside ≈ π/4. As darts grow, estimate converges to π. This is Monte Carlo simulation.
Computers only generate uniform random numbers. Yet we can simulate any distribution by feeding uniform numbers through the inverse CDF. This works because F⁻¹(U) has exactly the desired distribution.
You have 30 data points and want a confidence interval — but only one sample! Bootstrapping pretends your sample is the population, resamples thousands of times, and uses the spread to estimate the CI.
Where Distributions Live
Six real problems, complete with working solutions
Test Your Understanding
12 questions — scenarios, properties, and formulas.