The Scientific Method — What It Actually Is — Bench

The Textbook Version That Isn’t True

Most people who have been through a science education learned some version of the following: the scientific method consists of (1) observing a phenomenon, (2) forming a hypothesis, (3) making a prediction, (4) designing an experiment, (5) collecting data, and (6) accepting or rejecting the hypothesis based on the data. The procedure is mechanical, objective, and cumulative. Science marches steadily toward truth.

This account is not wrong exactly — individual experiments do have roughly this structure. But as a description of how science actually works, how knowledge is produced and evaluated across communities of scientists over decades, it misses almost everything important. It describes the formal reports scientists write, not the messy and deeply social process by which they decide what to believe and pursue.

The philosophy of science in the twentieth century was largely an extended argument about what actually distinguishes science from non-science, and how scientific knowledge actually grows. The answers are stranger and more interesting than the textbook version.

Popper and Falsifiability

Karl Popper’s problem was the demarcation problem: what distinguishes scientific claims from non-scientific ones? The obvious answer — scientific claims are verified by evidence — runs into immediate difficulties. Any sufficiently flexible theory can be made consistent with any evidence by adding auxiliary hypotheses. Freudian psychology, Marxist historical theory, astrology — all of them can accommodate apparently contradictory evidence by positing hidden factors, incomplete data, or misinterpretation.

Popper’s criterion: a claim is scientific if it is falsifiable — if it specifies conditions under which it would be shown to be false. Einstein’s general relativity predicted that light would be bent by gravity by a specific amount. This prediction was testable: if the deflection were not observed during a solar eclipse, the theory would be falsified. The prediction was confirmed in 1919. But the point is not just that it was confirmed — it’s that it could have been falsified. A theory that can accommodate any outcome regardless of what’s observed is not making a real claim.

Falsifiability does real work. Freudian theory can explain courage as an expression of id, or of ego, or of sublimated drives — the theory is flexible enough to accommodate any human behavior, which means it predicts nothing specifically enough to be wrong. This is not a feature. It is a sign that the theory is not doing the epistemological work a scientific theory should do.

Popper’s approach has been enormously influential and is probably the dominant popular understanding of what makes a theory scientific. It’s also incomplete.

The Problem Popper Missed

Experiments don’t test individual hypotheses in isolation. They test networks of assumptions. When you fire charged particles into a detector and record the outputs, you’re assuming your detector works as intended, your beam calibration is correct, your data acquisition software isn’t corrupting signals, your statistical analysis is valid, and dozens of other things besides the hypothesis you’re ostensibly testing.

When an experiment produces a result that contradicts a hypothesis, the logical structure is: if H and A1 and A2 and A3 … then O; O is false; therefore H or A1 or A2 or A3… is false. The contradiction could be located in the hypothesis or in any of the auxiliary assumptions. Scientists routinely conclude that the auxiliary assumptions were wrong rather than the main hypothesis — and they’re often right to do so. The history of science is full of cases where experimental anomalies were explained by problems with the instruments, not problems with the theory.

This is Duhem-Quine underdetermination. It means that Popper’s picture of falsification as a decisive logical blow to a theory misses how science actually works. Theories are not abandoned when a single experiment contradicts them. They survive anomalies routinely.

Kuhn and the Structure of Scientific Revolutions

Thomas Kuhn’s 1962 book The Structure of Scientific Revolutions shifted the philosophy of science from logic to history. Kuhn’s question: how does science actually change over time? His answer differed radically from both Popper and from the cumulative-progress picture.

Kuhn introduced the concept of a paradigm: the set of exemplary problems, methods, assumptions, and standards shared by a scientific community at a given time. Normal science operates within a paradigm — scientists extend and apply the paradigm’s framework, solve the puzzles it defines, and treat anomalies as problems to be resolved within the framework rather than as threats to the framework itself.

Paradigm shifts — Kuhn called them scientific revolutions — happen when anomalies accumulate to the point where they can no longer be accommodated by the existing framework. When the anomalies reach a critical mass, a crisis develops. The old paradigm is replaced by a new one that recasts the fundamental questions, provides new standards of what counts as a good explanation, and redraws what is even in the domain of the science.

The transitions Kuhn analyzed — Copernican to Newtonian astronomy, phlogiston theory to chemical theory, Newtonian to Einsteinian mechanics — share a pattern: the new paradigm is incommensurable with the old one. The two paradigms use different concepts, ask different questions, and evaluate evidence by different standards. Scientists don’t just change their minds about specific claims; they reconceive the entire problem space.

The implication that disturbed many readers: scientific change is not purely rational. The transition between paradigms involves judgment, social dynamics, the generational replacement of scientists who were trained in the old paradigm, and persuasion as much as proof. Science is a social enterprise, and its social structure shapes how knowledge grows.

Lakatos and the Research Programme

Imre Lakatos tried to rescue a normative account of scientific rationality from Kuhn’s historical realism. His solution was the concept of the research programme: a core theoretical commitment protected by a “protective belt” of auxiliary hypotheses that absorb anomalies.

A research programme is progressive if its modifications to the protective belt consistently lead to new predictions that are confirmed. It is degenerating if its modifications are consistently post-hoc — explaining results already known rather than predicting new ones. Scientists are not irrational for protecting core commitments; they are doing the right thing as long as the programme is progressive. When it becomes degenerative — when it can only explain, never predict — it is rational to abandon it.

Lakatos’s framework explains both why scientists are right to resist abandoning well-developed theories in the face of isolated anomalies (this is normal defensive behavior for a progressive programme) and why they are right to eventually abandon them (when the programme becomes degenerative and a competitor programme becomes progressive).

What This Means for Physics

Physics is the science with the most precisely confirmed predictions in human history. The Standard Model of particle physics predicts the anomalous magnetic moment of the electron to eleven decimal places of agreement with experiment. General relativity’s predictions for gravitational wave signatures have been confirmed by LIGO to extraordinary precision. These are not lucky guesses — they are the product of highly progressive research programmes that have been expanding their predictive reach for a century.

But physics also has open problems that push against the boundaries of what the scientific method can resolve cleanly. String theory is the most prominent: it remains untestable at currently achievable energies, which makes it hard to falsify in any practical sense. Whether this makes it unscientific (Popper) or simply a research programme at an early stage whose progressiveness must be judged over a longer timescale (Lakatos) is a live debate.

Dark matter is not directly observed but is inferred from gravitational effects. Quantum mechanics is confirmed to extraordinary precision but its interpretation — what the theory actually says is happening in physical reality — is genuinely contested, with multiple interpretations (Copenhagen, many-worlds, pilot-wave, relational) each consistent with all existing experimental evidence.

The scientific method is not a simple procedure for extracting facts from nature. It is a set of evolved practices that work well in many domains, that have internal tensions which philosophers have been probing for a century, and that break down in characteristic ways at the edges of current knowledge.

What You Can Trust, and Why

None of this is an argument for relativism about science. The systematic production of reliable knowledge through controlled experiment, quantitative prediction, and intersubjective reproducibility is a genuine epistemic achievement. The track record is not in question.

What the philosophy of science establishes is why you can trust it and under what conditions you should update. You can trust results that have been replicated across multiple independent labs using different methods. You should be more skeptical of single-study findings with marginal statistical significance and high-stakes implications. You should understand that established theories are not overturned by isolated anomalies — but that accumulated anomalies really do eventually force revision, even in physics.

The demarcation between science and pseudoscience is not a bright line — it is a gradient of epistemic practices, and the gradient matters. Science doesn’t guarantee truth; it implements a set of practices that make error detection more likely and systematic bias less likely than alternatives. The textbook version leaves this out, which is why understanding it is both humbling and clarifying.