Matrices — Multiplication, Determinants, and Inverses
What matrices are, how matrix multiplication works, what determinants measure, and how to invert a matrix.
What a Matrix Is
A matrix is a rectangular array of numbers arranged in rows and columns. An m×n matrix has m rows and n columns.
A = [1 2 3] (2×3 matrix)
[4 5 6]
B = [7 8] (3×2 matrix)
[9 0]
[1 2]
The entry in row i, column j is written Aᵢⱼ.
Matrices represent: systems of linear equations, linear transformations, data tables, graphs, and more. They’re the central object of linear algebra.
Addition and Scalar Multiplication
Addition: add corresponding entries. Matrices must have the same dimensions.
[1 2] + [5 6] = [6 8]
[3 4] [7 8] [10 12]
Scalar multiplication: multiply every entry by the scalar.
3 × [1 2] = [3 6]
[3 4] [9 12]
Matrix Multiplication
Multiplying A (m×n) by B (n×p) gives C (m×p). The inner dimensions must match.
The entry Cᵢⱼ is the dot product of row i of A with column j of B:
Cᵢⱼ = Σₖ Aᵢₖ Bₖⱼ
[1 2] × [5 6] = [1×5+2×7 1×6+2×8] = [19 22]
[3 4] [7 8] [3×5+4×7 3×6+4×8] [43 50]
Matrix multiplication is not commutative: AB ≠ BA in general (and BA may not even be defined).
It is associative: (AB)C = A(BC).
Distributive: A(B + C) = AB + AC.
Special Matrices
Identity matrix I: square matrix with 1s on the diagonal, 0s elsewhere. Acts as the multiplicative identity: AI = IA = A.
I₂ = [1 0] I₃ = [1 0 0]
[0 1] [0 1 0]
[0 0 1]
Zero matrix: all entries zero. Additive identity.
Diagonal matrix: non-zero entries only on the main diagonal.
Symmetric matrix: A = Aᵀ (equal to its transpose). Aᵢⱼ = Aⱼᵢ.
Transpose: flip rows and columns. (Aᵀ)ᵢⱼ = Aⱼᵢ.
[1 2 3]ᵀ = [1 4]
[4 5 6] [2 5]
[3 6]
The Determinant
The determinant is a single number computed from a square matrix. It encodes key information about the matrix.
2×2:
det([a b]) = ad − bc
[c d]
3×3 (cofactor expansion along first row):
det([a b c])
[d e f] = a(ei − fh) − b(di − fg) + c(dh − eg)
[g h i]
What the determinant means
Geometrically: the absolute value of the determinant is the scale factor for areas (2D) or volumes (3D) under the transformation the matrix represents.
- |det(A)| = 2: the transformation doubles areas
- |det(A)| = 0.5: halves areas
- det(A) = 0: the transformation collapses space into a lower dimension (all area/volume becomes zero)
Sign: det > 0 means orientation is preserved (no flip); det < 0 means orientation is reversed.
Key properties
det(AB) = det(A) × det(B)
det(Aᵀ) = det(A)
det(I) = 1
det(cA) = cⁿ det(A) for n×n matrix
Row operations and determinants:
- Swap two rows: det changes sign
- Multiply a row by c: det multiplies by c
- Add a multiple of one row to another: det unchanged
Singular vs Invertible
A square matrix is singular (non-invertible) if det(A) = 0. It’s invertible (non-singular) if det(A) ≠ 0.
Equivalent conditions for a matrix to be invertible:
- det(A) ≠ 0
- The columns are linearly independent
- Ax = 0 has only the trivial solution x = 0
- A has full rank
- The transformation is bijective (one-to-one and onto)
All these conditions are equivalent — they’re different ways of saying the same thing.
The Inverse
The inverse A⁻¹ of a square matrix A satisfies:
A A⁻¹ = A⁻¹ A = I
2×2 inverse:
A = [a b], A⁻¹ = 1/(ad−bc) × [ d −b]
[c d] [−c a]
Swap the diagonal, negate the off-diagonal, divide by the determinant.
Why it matters: the matrix equation Ax = b has solution x = A⁻¹b (when A is invertible). This is the matrix version of solving a linear system.
In practice: computing A⁻¹ explicitly is expensive. For solving Ax = b, Gaussian elimination is faster. The inverse is conceptually important but computationally avoided.
Gaussian Elimination
The standard algorithm for solving linear systems. Transform the augmented matrix [A|b] using row operations into row echelon form, then back-substitute.
Solve: x + 2y = 5
3x + 4y = 11
Augmented: [1 2 | 5]
[3 4 | 11]
R2 → R2 − 3R1:
[1 2 | 5]
[0 −2 | −4]
R2 → R2 / −2:
[1 2 | 5]
[0 1 | 2]
Back-substitute: y = 2, x = 5 − 2(2) = 1
Row echelon form: zeros below the diagonal. Reduced row echelon form (RREF): zeros above and below each leading 1. RREF directly reads off the solution.
Rank
The rank of a matrix is the number of linearly independent rows (= number of linearly independent columns). It’s the “true” dimensionality of the matrix.
- Full rank: rank = min(m, n) — maximum possible
- Rank deficient: rank < min(m, n) — columns (or rows) are linearly dependent
For a square n×n matrix:
- Rank n: invertible
- Rank < n: singular
Rank-nullity theorem: rank(A) + nullity(A) = n, where nullity is the dimension of the null space (solutions to Ax = 0).
Matrix as Data
In machine learning, data is a matrix: n rows (observations), p columns (features). Every operation — normalisation, dimensionality reduction, regression — is matrix algebra.
Linear regression in matrix form: β̂ = (XᵀX)⁻¹Xᵀy. One formula, any number of features. The normal equations emerge directly from minimising squared error, and the solution is a matrix inverse.
PCA, SVD, neural network weight updates — all matrix operations. Understanding matrices is understanding why these algorithms work.