← LOGBOOK LOG-396
COMPLETE · MATHEMATICS ·
STATISTICSMEANVARIANCESTANDARD-DEVIATIONCLTFOUNDATIONSPROBABILITY

Statistics — Mean, Variance, Standard Deviation, and the CLT

Descriptive statistics, measures of spread, and the Central Limit Theorem — why averages behave so predictably.

Two Branches

Descriptive statistics: summarise and describe data you have. Inferential statistics: draw conclusions about a population from a sample.

They use the same tools but ask different questions. Descriptive statistics is about the data in hand; inferential statistics is about what you can conclude beyond it.


Measures of Centre

Mean (arithmetic average):

x̄ = (x₁ + x₂ + ... + xₙ) / n = Σxᵢ / n

Sensitive to outliers — one extreme value pulls the mean significantly.

Median: the middle value when sorted. For even n, average the two middle values.

Resistant to outliers — the median of {1, 2, 3, 4, 100} is 3; the mean is 22. For skewed data (income, house prices), median is the more representative “typical” value.

Mode: the most frequently occurring value. Can be multiple (bimodal, multimodal) or none (all unique). More useful for categorical data than numerical.

When to use which

  • Symmetric distribution: mean ≈ median, either works
  • Right-skewed (long tail right): mean > median — use median for “typical”
  • Left-skewed: mean < median
  • Categorical data: mode

Measures of Spread

Range: max − min. Simple but very sensitive to outliers.

Interquartile range (IQR): Q3 − Q1, the spread of the middle 50%.

  • Q1: 25th percentile (median of lower half)
  • Q2: 50th percentile (median)
  • Q3: 75th percentile (median of upper half)
  • IQR = Q3 − Q1

Robust to outliers. Used in box plots.

Variance:

Population: σ² = Σ(xᵢ − μ)² / N
Sample:     s² = Σ(xᵢ − x̄)² / (n−1)

Average squared deviation from the mean. Squaring ensures positive values and penalises large deviations more than small ones.

Why n−1 for samples (Bessel’s correction): Using n underestimates the true population variance because the sample mean is closer to sample data than the true population mean is. Dividing by n−1 corrects this — it makes s² an unbiased estimator of σ².

Standard deviation: σ (population) or s (sample) = √variance. Same units as the data. The typical distance from the mean.


The Normal Distribution and Standard Deviations

For normally distributed data, standard deviations have predictable meaning:

x̄ ± 1s contains ≈ 68% of data
x̄ ± 2s contains ≈ 95% of data
x̄ ± 3s contains ≈ 99.7% of data

An observation 3 standard deviations from the mean is unusual (happens ~0.3% of the time). 6 standard deviations (“six sigma”) is extraordinarily rare.

Standardising (z-score):

z = (x − x̄) / s

Converts a value to “number of standard deviations from the mean.” Allows comparison across different scales.


The Central Limit Theorem

The CLT: if you take sufficiently large random samples from any population with finite mean μ and variance σ², the distribution of sample means will be approximately normal:

X̄ ~ N(μ, σ²/n)

The mean of the sampling distribution is μ. The standard deviation is σ/√n — called the standard error.

What this means:

  • Doesn’t matter what shape the original population has — uniform, skewed, bimodal
  • The distribution of sample means becomes normal as n grows
  • With n ≥ 30, the approximation is usually good enough

Why it’s powerful: you can apply the well-understood normal distribution to real-world data even when the underlying distribution is unknown. Almost all classical statistical tests are built on this.

Example: Roll a die (uniform distribution) 30 times, record the mean. Do this thousands of times. The distribution of those means is approximately N(3.5, 35/36/30) — a normal distribution, despite the original uniform distribution.


Standard Error vs Standard Deviation

These are frequently confused:

Standard deviation (s): how spread out individual data points are. A property of the data.

Standard error (SE = s/√n): how spread out sample means are. A property of your estimate.

As n increases, SE decreases (√n in denominator). Larger samples → more precise estimates of the mean. This is why bigger studies give tighter results.


Confidence Intervals

A 95% confidence interval means: if you repeated this sampling procedure many times, 95% of the intervals constructed this way would contain the true population mean.

CI = x̄ ± z* × (s/√n)

For 95%: z* = 1.96. For 99%: z* = 2.576.

Sample mean = 50, s = 10, n = 100
SE = 10/√100 = 1
95% CI = 50 ± 1.96 × 1 = (48.04, 51.96)

Common misinterpretation: “95% probability that the true mean is in this interval” is wrong. The true mean is fixed — it’s either in the interval or not. The 95% refers to the procedure, not any particular interval.


Correlation

Pearson correlation coefficient r measures linear association between two variables:

r = Σ(xᵢ − x̄)(yᵢ − ȳ) / (n−1)sₓsᵧ

Range: −1 to +1.

  • r = 1: perfect positive linear relationship
  • r = −1: perfect negative linear relationship
  • r = 0: no linear relationship (but could be nonlinear)

Correlation ≠ causation. Two variables can correlate because one causes the other, because a third variable causes both (confounding), or by pure chance. The number alone tells you nothing about mechanism.

Anscombe’s Quartet: four datasets with identical means, variances, and correlations (r ≈ 0.816) but completely different shapes — one linear, one curved, one with an outlier. Always plot your data.


Outliers

An outlier is a value unusually far from the rest. Common rule: x is an outlier if it falls more than 1.5 × IQR below Q1 or above Q3.

Effect on statistics:

  • Mean and variance: heavily affected by outliers
  • Median and IQR: robust to outliers

What to do with outliers: investigate before removing. An outlier might be:

  • A data entry error → correct or remove
  • A genuine extreme observation → keep
  • The most interesting finding in the dataset → investigate further

Removing outliers because they’re inconvenient is a form of data manipulation.


Distributions in Practice

StatisticBest withSensitive to
MeanSymmetric dataOutliers
MedianSkewed dataNothing much
Standard deviationNormal dataOutliers
IQRAny shapeNothing much

The choice of statistic is itself a modelling decision — it encodes assumptions about the shape of your data.