Interactive Learning Guide

Regression
you can feel

Every concept has a live calculator. Change a number, see the answer update instantly. No static examples — just direct interaction with the maths.

Try this now: Go to Simple Linear → click on the chart to add data points, watch the line fit live. Then go to Beyond Linear → drag the polynomial degree slider to 10 and watch it overfit.

5

Live Calculators

∞

Real-time Fitting

7

Modules

12

Quiz Questions

📐

01 — Foundations

What is Regression?

OLS objective, correlation vs causation, t-tests. Includes an R² & significance calculator.

↗

📈

02 — Simple Linear

Click-to-Fit Canvas

Click to add points, right-click to remove. Full OLS stats update in real time. Prediction calculator included.

↗

🧮

03 — Multiple Regression

Salary & VIF Calculators

Live salary predictor with partial effects. Multicollinearity VIF explorer. Model comparison tool.

↗

🔍

04 — Assumptions

Violation Simulator

Generate data with specific violations, see how residual plots look and what goes wrong with inference.

↗

🚀

05 — Beyond Linear

Logistic, Poly, Ridge/Lasso

Drag the degree slider, tune regularisation λ. Overfitting vs underfitting made visceral.

↗

🩺

06 — Diagnostics

Four Live Diagnostic Plots

Switch between dataset types and watch all four R-style diagnostic plots update simultaneously.

↗

01 — Foundations

What is Regression?

The core idea, OLS, correlation vs causation, and inference — with a live significance calculator

Regression asks: "If I know X, how well can I predict Y?" You have data, you draw the best possible straight line through it, and that line becomes your model. Given a new X, you read off the predicted Y. Everything else in regression is refinement of this simple idea.

You're a coffee shop owner. Hotter days → more iced coffees sold. You plot temperature vs cups sold. Regression finds the best line — the one that minimises total prediction error. Now you can order the right amount of coffee for tomorrow based on the weather forecast.

The Regression Framework

Y — Response

What you're predicting

The dependent variable / outcome / target. House price, test score, blood pressure. Goes on the vertical axis.

X — Predictor

What you're using to predict

The independent variable / feature / covariate. House size, hours studied, dosage. Goes on the horizontal axis.

β — Coefficient

The effect size

How much Y changes per unit of X. The slope. This is usually what you care about — it quantifies the relationship.

ε — Residual

What the model gets wrong

Actual Y minus predicted Ŷ. Good models have small, random residuals. Patterned residuals signal a missing variable or wrong model form.

The Linear Model

\[Y = \beta_0 + \beta_1 X + \varepsilon\]

β₀ = intercept (Y when X=0). β₁ = slope (ΔY per unit ΔX). ε = everything the model doesn't capture.

OLS (Ordinary Least Squares) finds the line that minimises the sum of squared vertical distances from each point to the line. "Squared" punishes big errors more than small ones, and makes the loss function smooth so calculus can find the exact minimum.

OLS Loss Function Explorer

Drag the slope and intercept to see how RSS changes. OLS finds the unique (β₀, β₁) that minimises RSS.

Intercept β₀2.0

Slope β₁1.00

Your RSS—

OLS RSS (min)—

Excess over OLS—

Current Line vs OLS Best Fit

—

Closed-Form Solution

\[\hat{\beta}_1=\frac{\sum(x_i-\bar{x})(y_i-\bar{y})}{\sum(x_i-\bar{x})^2},\quad\hat{\beta}_0=\bar{y}-\hat{\beta}_1\bar{x}\]

OLS has an exact algebraic solution — no iteration needed. The optimal line always passes through (x̄, ȳ).

Correlation (−1 to +1) measures linear association strength. Regression gives a predictive equation. But neither implies causation — that requires experiments, instrumental variables, or other causal methods.

Correlation Calculator — Live

Adjust the sliders to generate data with a specific correlation. See how the regression line and R² change.

True r0.70

Sample n50

Sample r—

R² (variance explained)—

Slope β₁—

Interpretation—

Remember: r = 0 means no linear relationship — not "no relationship". X and X² can have r ≈ 0 but a perfect non-linear relationship. And even r = 0.9 doesn't mean X causes Y.

🧮Statistical Inference Calculator

Enter your regression output — get p-values, confidence intervals, and a plain-English verdict instantly.

β̂₁ (estimated slope)2.5

SE(β̂₁)0.80

Sample size n30

t-statistic—

p-value (two-sided)—

95% CI lower—

95% CI upper—

Significant at 5%?—

Plain-English Verdict

—

Adjust the sliders to explore how β̂₁, SE, and n jointly determine significance.

02 — Simple Linear

One line, live

Click the canvas to add data — OLS fits instantly with full statistics and a prediction calculator

Click anywhere on the chart to add data points. The OLS regression line fits immediately. Right-click to remove a point. Watch what happens to slope, R², and the prediction calculator when you add an outlier.

Live Regression Canvas

Left-click: add point · Right-click: remove nearest · Drag: move point

OLS Results — Live

Fitted Equation

Add ≥ 2 points to fit

Prediction Calculator

Predict at X =5.0

Predicted Ŷ—

95% Prediction Interval—

95% Confidence Interval—

PI vs CI: Prediction interval (wider) covers a single new observation. Confidence interval (narrower) covers the mean of Y at that X.

03 — Multiple Regression

Many predictors, one model

Live salary calculator, partial effects decomposition, and VIF multicollinearity explorer

This model predicts salary from four inputs. Every slider updates the prediction, equation, and waterfall chart instantly. The chart shows each variable's contribution — the "partial effect" of each predictor while controlling for the others.

💼Salary Prediction — Multiple Regression Model

Experience (yrs)5

Education (yrs post-18)4

City size

Management role

Live Equation

—

Predicted Salary—

95% Prediction Interval—

Model R²0.79

Contribution Breakdown

VIF (Variance Inflation Factor) measures how much a predictor is explained by the others. VIF = 1 is perfect independence. VIF > 10 means severe multicollinearity — your coefficient estimates become wildly unstable.

📡Multicollinearity VIF Calculator

Drag the correlation between predictors. Watch how VIF, standard errors, and coefficient stability all respond.

Corr(X₁, X₂)0.30

VIF (X₁ and X₂)—

SE inflation factor—

Effective n reduction—

Severity—

What this means

—

Multiple Regression — Core Theory

The Full Model

\[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p + \varepsilon\]

Each βⱼ = effect of Xⱼ on Y, holding all other Xₖ constant ("ceteris paribus").

Matrix Form — OLS Solution

\[\hat{\boldsymbol{\beta}} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{Y}\]

X is the n×(p+1) design matrix. This single formula gives all p+1 coefficients simultaneously.

Adjusted R²

Penalised goodness of fit

R² always rises when you add variables. Adjusted R² penalises each extra parameter. Use this when comparing models of different sizes.

F-test

Are any predictors useful?

Tests H₀: all β₁=...=βₚ=0. A significant F-test means at least one predictor is useful — but doesn't tell you which.

04 — Assumptions

When does it break?

Simulate specific violations — see the residual plots and quantify how inference goes wrong

⚗️Assumption Violation Simulator

Violation type

Severity0.50

Sample size n60

True slope (β₁)2.00

Estimated β̂₁—

Reported SE—

True SE (robust)—

SE inflation—

Residuals vs Fitted

Select a violation type above.

1. Linearity

E[ε|X] = 0. Residuals should show no pattern vs fitted values.

STATUS: OK

2. Independence

Residuals uncorrelated. Violated by time series or clustered data.

STATUS: OK

3. Homoscedasticity

Constant error variance across all X values. No fan-shaped plots.

STATUS: OK

4. Normal Errors

Residuals ≈ Normal. Needed for exact inference; CLT relaxes this for large n.

STATUS: OK

05 — Beyond Linear

When a line isn't enough

Logistic regression, polynomial overfitting, and regularisation — all live

Logistic regression is for binary outcomes (yes/no, pass/fail). A sigmoid function squashes the linear prediction into a probability between 0 and 1. The threshold you choose trades off false positives vs false negatives.

🎯Logistic Regression — Live Classification

True β₀ (intercept)0.8

True β₁ (slope)1.2

Decision threshold0.50

Accuracy—

Sensitivity (recall)—

Specificity—

Precision (PPV)—

Move the threshold to explore the sensitivity/specificity tradeoff.

Odds Ratio Interpretation

—

Polynomial regression adds X², X³ etc. as predictors — still "linear regression" (linear in parameters). The danger: high degree = overfitting. The model memorises training noise. Watch R² → 1 on training data while the curve goes wild.

〜Overfitting Explorer — Polynomial Degree

Degree1

Noise σ1.0

Show test data(held-out 30%)

Train R²—

Test R²—

Train RMSE—

Test RMSE—

Ridge (L2) shrinks all coefficients toward zero. Lasso (L1) can shrink some to exactly zero — automatic variable selection. Higher λ = more shrinkage = less overfitting but more bias. The sweet spot is found by cross-validation.

🎛️Ridge vs Lasso — Live Regularisation

λ (penalty)0.0

Method Predictors

Coefficient values as λ increases — Ridge shrinks, Lasso zeros out

Train vs Test error — find the λ that minimises test error

Coefficients set to 0 (Lasso)—

Train RMSE—

Test RMSE—

Total shrinkage—

What is happening

—

06 — Diagnostics

Is my model any good?

All four R diagnostic plots, live. Switch dataset type and watch them all update simultaneously.

🩺Four-Plot Diagnostic Dashboard

Dataset

Residuals vs Fitted

—

Normal Q-Q

—

Scale-Location

—

Leverage vs Std Residuals (Cook's D)

—

07 — Quiz

Test Your Understanding

12 questions — interpretation, theory, scenarios, and diagnostics

Score

0

Streak

0

Correct

0

Accuracy

—

Select an answer to begin.

Regressionyou can feel

What is Regression?

What you're predicting

What you're using to predict

The effect size

What the model gets wrong

One line, live

Many predictors, one model

Penalised goodness of fit

Are any predictors useful?

When does it break?

When a line isn't enough

Is my model any good?

Test Your Understanding

Regression
you can feel