Interactive Learning Guide · v2

Probability
made simple,
made vivid

You don't need a maths degree to understand distributions. Every concept is explained in plain language first — the formulas are just the shorthand on top.

Our promise: Every distribution gets a one-sentence description, a real-world analogy, and a "when to use it" guide — before a single formula appears.
8
Dists
7
Modules
Sims
12
Quiz Qs
📊 Explore Distributions
🎲 Run Simulations
🧠 Take the Quiz
🧱
01
The Language of Chance

Building blocks: axioms, random variables, PMF/PDF/CDF, moments, and convergence.

Foundations
📊
02
8 Essential Shapes

From coin flips to waiting times. Live parameter sliders and plain-English descriptions.

Distributions
⚖️
03
See the Difference

Overlay any two distributions on the same axes. Gold vs Teal.

Compare
📈
04
Two Profound Theorems

"Why does everything look like a bell curve?" Watch the answer emerge live.

CLT & LLN
🎲
05
Randomness in Action

Estimate π with darts. Sample any distribution. Bootstrap confidence intervals.

Simulations
🌍
06
Where It Actually Matters

Six complete worked problems: insurance, queues, genetics, finance, and more.

Real World
01 — Foundations

The Language of Chance

Understanding what probability distributions actually are — before a single formula

A probability distribution is just a recipe that tells you how likely different outcomes are. Roll a die: each face has a 1-in-6 chance. That simple description is a distribution.

Think of probability as a bag of weights. Each coloured marble represents an outcome — the fraction of that colour is its probability. A distribution is just the complete description of what's in the bag.

Kolmogorov's Three Rules (1933)

In 1933, Andrei Kolmogorov reduced all of probability theory to three simple rules. Everything else follows from these.

Rule 1 — Non-Negative
\[P(A)\geq 0\]
Probabilities can't be negative. A −20% chance of rain is nonsense.
Rule 2 — Total Probability = 1
\[P(\Omega)=1\]
Something always happens. All outcomes together have probability exactly 1.
Rule 3 — Additive
\[P(A\cup B)=P(A)+P(B)\]
Chance of heads OR tails = sum of individual chances (when mutually exclusive).

From just these three rules, mathematicians have derived centuries of probability theory — like how all of geometry follows from a handful of axioms.

Ω — Sample Space

All Possible Outcomes

Rolling a die: Ω = {1,2,3,4,5,6}. The universe of everything that could happen.

P(A) — Probability

How Likely Is A?

A number in [0,1]. P=0 means impossible. P=1 means certain. P=0.5 is a fair coin.

F(x) — The CDF

Cumulative Probability

F(x) = "What's the chance of getting ≤ x?" Always goes from 0 to 1.

X — Random Variable

A Number from Randomness

A rule that assigns a number to each outcome. "The die face showing" is a random variable.

There are two kinds of random quantities: things you count (whole numbers — you can't have 2.7 children) and things you measure (any value on a scale — temperature can be 37.3°C). These need different mathematical tools.

Discrete — Counting

Probability Mass Function (PMF)

Like a bar graph. Each bar shows exactly how probable that specific value is.

PMF: probability at each exact value
\[p_X(x)=P(X=x),\quad\sum_{x}p_X(x)=1\]

Examples: Coin flips, goals in a match, emails per hour, die face.

Continuous — Measuring

Probability Density Function (PDF)

Like a smooth curve where the area under it equals probability.

Area under PDF gives probability
\[P(a \leq X \leq b)=\int_a^b f_X(x)\,dx,\quad\int_{-\infty}^{\infty}f_X(x)\,dx=1\]

Examples: Height, weight, waiting time, temperature, stock returns.

PDF (gold) vs CDF (teal) — Standard Normal

The gold curve is the PDF — its height is density. The teal curve is the CDF — it accumulates area from left to right.

Want P(X ≤ 1.96)? Read off the CDF at x = 1.96. For a Standard Normal, that's ≈ 97.5%. The CDF is the most practically useful function in all of statistics.

A distribution's shape can be summarised with four numbers. Mean = centre, Variance = spread, Skewness = lean, Kurtosis = tail weight. Together, these "moments" give you a thumbnail sketch of any distribution.

The Four Moments
1st Moment

Mean (μ) — The Centre

Where does the distribution "balance"? The expected value of X.

Formula
\[\mu = E[X] = \int x\,f(x)\,dx\]
2nd Moment

Variance (σ²) — The Spread

How far from the mean do values typically stray? σ is in the same units as X.

Shortcut
\[\sigma^2 = E[X^2] - (E[X])^2\]
3rd Moment

Skewness — The Lean

Positive: long tail right (income). Negative: long tail left. Normal = 0.

Rule of thumb
\[\text{Mean} > \text{Median} \Rightarrow \text{positive skew}\]
4th Moment

Kurtosis — Tail Weight

Excess kurtosis > 0: "fat tails" — extreme events more common than Normal predicts.

Reference
\[\text{Normal: excess kurtosis} = 0\]

When you take more and more samples, things start to behave predictably. "Convergence" describes the different ways a sequence of random experiments can "settle down" to a fixed answer.

Four Ways Randomness Can "Settle Down"

Almost Sure (Strongest) — With probability 1, values eventually get stuck at the limit and stay there. The Strong Law of Large Numbers says your sample mean converges this way.

Mean Square — On average, squared distance to the limit shrinks to zero. Useful in signal processing and estimation theory.

In Probability — The probability of being far from the limit vanishes. Excursions become increasingly rare. The Weak LLN lives here.

In Distribution (Weakest) — The histogram shape converges to a fixed shape. This is all the Central Limit Theorem requires.

Hierarchy: Almost Sure ⟹ In Probability ⟹ In Distribution.
⚠️ Cauchy exception: Its mean is undefined. Neither LLN nor CLT applies — the sample average of a million Cauchy values is still Cauchy.

Chebyshev's Inequality: no matter what distribution (with finite mean & variance), at most 1/k² of probability lives more than k standard deviations from the mean. Universal guarantee — but conservative.

Chebyshev Explorer
Works for ANY Distribution with Finite Variance
\[P(|X-\mu|\geq k\sigma)\leq\frac{1}{k^2}\]

Test scores: mean = 70, sd = 10. Chebyshev says: at most 25% of students score below 50 or above 90 (k=2). A universal guarantee.

2.0
Chebyshev bound
Normal actual
How conservative
02 — Distributions

8 Essential Shapes

Select a distribution to explore its shape, parameters, and real-world uses.

03 — Compare

See the Difference

Overlay any two distributions on the same axes. Gold = A, Teal = B.

Different distributions model different physical processes. Comparing side-by-side reveals why choosing the right model matters.

Distribution A (Gold)
Distribution B (Teal)
PDF Overlay
KL Divergence

KL divergence measures how different distribution B is from A. Zero = identical. Higher = more different.

04 — CLT & LLN

Two Profound Theorems

The Central Limit Theorem and Law of Large Numbers are why statistics works at all

The CLT says: no matter what distribution you're sampling from (finite mean & variance), the distribution of sample averages will look like a bell curve — if you take enough samples.

Measure heights of 30 random people; record average. Repeat 2000 times. Those averages will follow a bell curve — even if individual heights don't. Averaging smooths out asymmetry.

Central Limit Theorem
\[\sqrt{n}\,\frac{\bar{X}_n-\mu}{\sigma}\;\xrightarrow{d}\;\mathcal{N}(0,1)\quad\text{as }n\to\infty\]

30
2000

Try n=1 with Exponential — you see a skewed histogram. Increase n to 30 — it becomes a near-perfect bell curve. The most important result in all of statistics.

The LLN says: the more data you collect, the closer your sample average gets to the true population mean.

A casino doesn't gamble — each bet is random but across millions, the LLN guarantees their edge accumulates. Random on the small scale, predictable at scale.

Weak Law
\[\bar{X}_n\xrightarrow{P}\mu\]
Convergence in probability
Strong Law
\[\bar{X}_n\xrightarrow{a.s.}\mu\]
Almost sure convergence

Try "Cauchy — NO convergence!" Watch paths wander forever. No finite mean → LLN fails.
05 — Simulations

Randomness in Action

Watch probability theory come to life through simulation

Throw darts at a square. A circle fits inside. The fraction landing inside ≈ π/4. As darts grow, estimate converges to π. This is Monte Carlo simulation.

Monte Carlo Estimator
\[\hat{\pi}_n=4\cdot\frac{1}{n}\sum_{i=1}^{n}\mathbf{1}[x_i^2+y_i^2\leq 1]\;\xrightarrow{a.s.}\;\pi\]
Error shrinks as 1/√n — the curse of Monte Carlo.
Inside Outside
Estimated π
Error
Total Darts
0

Computers only generate uniform random numbers. Yet we can simulate any distribution by feeding uniform numbers through the inverse CDF. This works because F⁻¹(U) has exactly the desired distribution.

Probability Integral Transform
\[U\sim\mathrm{Uniform}(0,1)\;\Longrightarrow\;X=F^{-1}(U)\sim F_X\]
2000
Quantile Function F⁻¹(u)
Select a distribution above

You have 30 data points and want a confidence interval — but only one sample! Bootstrapping pretends your sample is the population, resamples thousands of times, and uses the spread to estimate the CI.

Bootstrap Principle
\[\hat{\theta}^*_b=T(X^*_b),\quad\mathrm{CI}_{95\%}:\!\left[\hat{\theta}^*_{(2.5\%)},\;\hat{\theta}^*_{(97.5\%)}\right]\]
2000
06 — Real World

Where Distributions Live

Six real problems, complete with working solutions

07 — Quiz

Test Your Understanding

12 questions — scenarios, properties, and formulas.

Score
0
Streak
0
Correct
0
Accuracy
Select an answer to begin.