Probability Basics — Events, Independence, and Conditional Probability
The rules of probability — how to assign and combine probabilities, what independence means, and how conditioning changes everything.
What Probability Is
Probability assigns a number between 0 and 1 to an event, measuring how likely it is to occur.
- P = 0: impossible
- P = 1: certain
- P = 0.5: equally likely to occur or not
Sample space (S): the set of all possible outcomes. Event (A): a subset of the sample space — outcomes where A occurs.
Roll a die: S = {1, 2, 3, 4, 5, 6}
Event A = "roll an even number" = {2, 4, 6}
P(A) = 3/6 = 1/2
The Axioms of Probability
All of probability theory follows from three axioms (Kolmogorov, 1933):
- P(A) ≥ 0 for any event A
- P(S) = 1 (something must happen)
- If A and B are mutually exclusive: P(A ∪ B) = P(A) + P(B)
Everything else — conditional probability, independence, Bayes — is derived from these.
Basic Rules
Complement rule:
P(not A) = P(Aᶜ) = 1 − P(A)
Often easier to calculate the complement and subtract from 1.
Addition rule (general):
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Addition rule (mutually exclusive): if A and B can’t both happen:
P(A ∪ B) = P(A) + P(B)
Multiplication rule (general):
P(A ∩ B) = P(A) × P(B | A)
Multiplication rule (independent): if A and B don’t affect each other:
P(A ∩ B) = P(A) × P(B)
Mutually Exclusive vs Independent
These are different — and frequently confused.
Mutually exclusive: A and B cannot both occur. P(A ∩ B) = 0.
- Rolling a 3 and rolling a 5 on one die — can’t happen together
Independent: knowing A occurred doesn’t change the probability of B.
- Rolling a 3 on the first die doesn’t affect the second die
Mutually exclusive events with nonzero probability are never independent — if A occurs, you know B definitely didn’t, so B’s probability went from P(B) to 0. Knowing A gave you information about B.
Conditional Probability
P(B | A) is the probability of B given that A has occurred. We update the probability based on new information.
P(B | A) = P(A ∩ B) / P(A)
Roll a die. Given it's even, what's P(it's a 4)?
P(even) = 3/6 = 1/2
P(4 and even) = P(4) = 1/6
P(4 | even) = (1/6) / (1/2) = 1/3
This makes sense — among the three even outcomes {2, 4, 6}, one is a 4.
The Law of Total Probability
If B₁, B₂, …, Bₙ partition the sample space (exhaustive and mutually exclusive):
P(A) = Σ P(A | Bᵢ) × P(Bᵢ)
You can calculate P(A) by conditioning on each possible “cause”:
A disease affects 1% of the population.
Test is 95% accurate for sick, 90% accurate for healthy.
What's P(test positive)?
P(+) = P(+|sick)×P(sick) + P(+|healthy)×P(healthy)
= 0.95×0.01 + 0.10×0.99
= 0.0095 + 0.099
= 0.1085
Independence — Formal Definition
A and B are independent if:
P(A ∩ B) = P(A) × P(B)
Equivalently: P(B | A) = P(B) — knowing A gives no information about B.
Testing independence:
P(A) = 0.4, P(B) = 0.3, P(A ∩ B) = 0.12
0.4 × 0.3 = 0.12 ✓ → independent
Multiple independent events: for n independent events:
P(A₁ ∩ A₂ ∩ ... ∩ Aₙ) = P(A₁) × P(A₂) × ... × P(Aₙ)
The birthday problem: with 23 people in a room, P(at least two share a birthday) > 50%. Counterintuitive — because you’re counting all pairs, not comparing to one specific person.
P(all different) = 365/365 × 364/365 × 363/365 × ... × 343/365
≈ 0.493
P(at least one match) ≈ 0.507
Conditional Independence
A and B are conditionally independent given C if:
P(A ∩ B | C) = P(A | C) × P(B | C)
Two events can be independent unconditionally but dependent given C, or vice versa. This distinction matters enormously in Bayesian networks and causal reasoning.
Simpson’s Paradox: a treatment can appear better in every subgroup but worse overall — because the subgroup sizes differ. Conditioning on the right variable flips the conclusion. This is why controlling for confounders in statistics is non-negotiable.
Probability Trees
A tree diagram maps out all possible outcomes with their probabilities at each branch. Multiply along branches (AND), add across branches (OR).
Bag A: 3 red, 2 blue. Bag B: 1 red, 4 blue.
Pick a bag randomly, then a ball.
P(red) = P(red|A)×P(A) + P(red|B)×P(B)
= (3/5)(1/2) + (1/5)(1/2)
= 3/10 + 1/10 = 4/10 = 0.4
Expected Value
The expected value E[X] is the probability-weighted average outcome:
E[X] = Σ xᵢ × P(X = xᵢ)
Roll a fair die:
E[X] = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6) = 3.5
Expected value is not necessarily a possible outcome. The “expected” number of heads in 10 flips is 5 — a perfectly sensible answer.
Linearity of expectation: E[X + Y] = E[X] + E[Y], always — even when X and Y are not independent. This is one of the most useful properties in probability.
The Gambler’s Fallacy
Past independent events don’t affect future ones. If a fair coin lands heads 10 times in a row, the next flip is still 50/50. The coin has no memory.
The fallacy is real and persistent — casinos display roulette history boards precisely because people believe it. Probability of independent events doesn’t change based on history.
What is true: over a very large number of flips, the proportion of heads converges to 0.5 (Law of Large Numbers). But “due” doesn’t exist.