Probability Distributions — Bench

What a Distribution Is

A probability distribution describes all possible values a random variable can take and how likely each is.

Discrete distributions: the variable takes specific, countable values (integers). Described by a probability mass function (PMF): P(X = x).

Continuous distributions: the variable can take any value in a range. Described by a probability density function (PDF): f(x), where probability is area under the curve, not the height.

P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx

For any continuous distribution, P(X = exactly x) = 0 — you can only ask about ranges.

Key Properties

Expected value (mean):

Discrete:   E[X] = Σ x · P(X = x)
Continuous: E[X] = ∫ x · f(x) dx

Variance — average squared deviation from the mean:

Var(X) = E[(X − μ)²] = E[X²] − (E[X])²

Standard deviation: σ = √Var(X) — same units as X.

Discrete Distributions

Bernoulli

One trial, two outcomes — success (p) or failure (1−p).

P(X = 1) = p
P(X = 0) = 1 − p
E[X] = p,  Var(X) = p(1−p)

The building block. Everything else is made of Bernoulli trials.

Binomial — B(n, p)

n independent Bernoulli trials, count the successes.

P(X = k) = C(n,k) × pᵏ × (1−p)ⁿ⁻ᵏ
E[X] = np
Var(X) = np(1−p)

When to use: fixed number of independent trials, each with the same success probability. Number of heads in 20 flips, defective items in a batch, correct answers guessed on a test.

10 coin flips, p = 0.5. P(exactly 6 heads)?
P(X=6) = C(10,6) × 0.5⁶ × 0.5⁴ = 210 × (1/1024) ≈ 0.205

Geometric

Number of Bernoulli trials until the first success.

P(X = k) = (1−p)^(k−1) × p
E[X] = 1/p

When to use: waiting for the first success. Number of calls until a sale, number of attempts until a password guess succeeds.

Poisson — Pois(λ)

Number of events in a fixed interval when events occur independently at a constant average rate λ.

P(X = k) = (λᵏ × e⁻λ) / k!
E[X] = λ
Var(X) = λ    (mean equals variance — defining property)

When to use: counts of rare, independent events in time or space. Emails per hour, accidents per week, mutations per genome, customers per minute at a queue.

Average 3 calls per hour. P(exactly 5 calls in one hour)?
P(X=5) = (3⁵ × e⁻³) / 5! = (243 × 0.0498) / 120 ≈ 0.101

Poisson as a limit of Binomial: when n is large and p is small (rare events), Binomial(n,p) ≈ Poisson(np). Useful approximation.

Continuous Distributions

Uniform — U(a, b)

Every value in [a, b] equally likely.

f(x) = 1/(b−a)  for a ≤ x ≤ b
E[X] = (a+b)/2
Var(X) = (b−a)²/12

When to use: genuinely no preference among values in a range. Random number generators, rounding errors.

Exponential — Exp(λ)

Time between events in a Poisson process. The continuous analogue of the geometric distribution.

f(x) = λe^(−λx)  for x ≥ 0
E[X] = 1/λ
Var(X) = 1/λ²

Memoryless property: P(X > s + t | X > s) = P(X > t). The distribution has no memory — it doesn’t matter how long you’ve been waiting, the remaining wait has the same distribution. The exponential is the only continuous distribution with this property.

When to use: time until next event (next failure, next arrival, next earthquake). Service times, radioactive decay, component lifetimes.

Normal (Gaussian) — N(μ, σ²)

The bell curve. Parameterised by mean μ and standard deviation σ.

f(x) = (1/σ√2π) × exp(−(x−μ)²/2σ²)
E[X] = μ
Var(X) = σ²

Standard normal N(0,1): mean 0, standard deviation 1. Every normal can be converted to it:

Z = (X − μ) / σ

The 68-95-99.7 rule:

μ ± 1σ: 68.3% of the distribution
μ ± 2σ: 95.4%
μ ± 3σ: 99.7%

When to use: sums and averages of many independent variables (Central Limit Theorem). Measurement errors, heights, IQ scores, financial returns (approximately). The normal is the right model when many small independent effects add up.

Log-Normal

If ln(X) is normally distributed, X is log-normal. Skewed right, values strictly positive.

When to use: quantities that grow multiplicatively — incomes, city populations, stock prices, biological quantities. When you’re taking logs and the result looks normal, the original is log-normal.

Power Law

P(X > x) ∝ x⁻α. Heavy tail — extreme events are far more likely than the normal distribution predicts.

When to use: wealth distribution, city sizes, earthquake magnitudes, internet traffic, word frequencies (Zipf’s law), social network degrees. Any domain with “winner takes most” dynamics.

The signature: on a log-log plot, a power law appears as a straight line.

Choosing a Distribution

Situation	Distribution
Count of successes in n trials	Binomial
Count of rare events in an interval	Poisson
Trials until first success	Geometric
Time between Poisson events	Exponential
Sum of many independent effects	Normal
Multiplicative growth, positive skew	Log-Normal
Winner-takes-most, heavy tail	Power Law
No prior preference over a range	Uniform

The question to ask: what generates this quantity? Many small additive effects → Normal. Independent rare events → Poisson. Time between events → Exponential. Multiplicative processes → Log-Normal.

The CDF — Cumulative Distribution Function

The CDF F(x) = P(X ≤ x) gives the probability of being at or below x.

P(a ≤ X ≤ b) = F(b) − F(a)

For the normal distribution, probabilities are read from standard normal tables (or computed with erf). Most statistical software handles this directly. The key values to know:

P(Z ≤ 1.645) ≈ 0.95     (one-tailed 95%)
P(Z ≤ 1.96)  ≈ 0.975    (two-tailed 95%, each tail 2.5%)
P(Z ≤ 2.576) ≈ 0.995    (two-tailed 99%)