I’ve read a fair amount of recent posts that discuss falsification. As this is not exactly a new experience nor even limited to discussions here, I thought I’d take a minute to discuss the limitations of falsification, what role it plays in scientific research, and why it fails as a criterion to determine what questions fall under the purview of science.
Most questions posed by researchers are at least seemingly falsifiable. The first problem, however, is that no question is asked in isolation. Research questions arise in the context of theory, they are tested according to theory and theoretical concerns, and the findings are interpreted in the context of theory. So how does this play into research? Here are some ways with details and examples for those who want them
1) You can’t falsify levels of statistical significance
2) You can’t falsify when failure to replicate can be caused by so many different things.
3) If the only options are all wrong, it can be extremely difficult to falsify because you can continually confirm and falsify
4) Failures to replicate and other methods of falsification can be explained (and explained away!)
Most questions posed by researchers are at least seemingly falsifiable. The first problem, however, is that no question is asked in isolation. Research questions arise in the context of theory, they are tested according to theory and theoretical concerns, and the findings are interpreted in the context of theory. So how does this play into research? Here are some ways with details and examples for those who want them
1) You can’t falsify levels of statistical significance
In particle physics, for example, researchers have for years now borrowed the fundamental methodology originally developed in the 30s for the social and behavioral sciences: null hypothesis significance testing (NHST). The famous Higgs discovery, for example, was actually an attempt to reach a rather arbitrary level of statistical significance as is done all of the time in the social sciences. The researchers propose a null hypothesis H (the signal detected is noise) and an alternative (the signal is not noise) and assume the null to be true.
The relevant problem is that whether you are determining the efficacy of a medication by comparing a placebo group with the treatment group or determining if you’ve detected a particle or background noise, your conclusion is a probabilistic statement about the likelihood of a particular statistical outcome given certain assumptions. The outcome can’t be falsified (it’s what happened), and all you’ve tested is how likely your assumption is given the output- also unfalsifiable.
The relevant problem is that whether you are determining the efficacy of a medication by comparing a placebo group with the treatment group or determining if you’ve detected a particle or background noise, your conclusion is a probabilistic statement about the likelihood of a particular statistical outcome given certain assumptions. The outcome can’t be falsified (it’s what happened), and all you’ve tested is how likely your assumption is given the output- also unfalsifiable.
2) You can’t falsify when failure to replicate can be caused by so many different things.
We are told in high school and often in college of scientific experiments that seemingly appear out of nowhere, are tested in isolation, and the results clear. Examples include Galileo’s tests of motion, Millikan’s tests of the electron, Thomson & Rutherford’s work on atomic structure, and so on. The problem is that as our knowledge has increased, research questions have become more and more difficult to separate from a myriad of assumptions. In order to e.g., determine whether cognition “massively modular” or domain general and embodied, we have to make assumptions about the tools we use to image the brain, the experimental designs employed, and the relationship between various (often divergent or outright incompatible) findings from neuroimaging, behavioral studies, etc., from fields as diverse as linguistics or even philosophy on the one hand and computer science and computational neuroscience on the other. That’s why competing, incompatible theories can exist side-by-side for decades, each with mountains of research in support of each, and no end in sight. Every issue with every experiment can be attributed to a myriad of different sources of potential error. One can blame equipment, complexities in high-dimensional data, issues with participants, problematic interpretations of outcomes, bad choice of statistical tests, and on, and on.
For example, Millikan’s tests of the electron supported his conclusions about the charge of the electron by essentially dismissing variant measurements as not actually being measurements of electrons. Research in modern particle physics relies heavily on mathematical structure because it deals with systems that cannot be observed and are instead specified, even derived, by equations and confirmed by them in a manner that wouldn’t have been considered proper physics or even science a century ago. It is so difficult to work out how double-blind studies can be biased by placebo effects that many studies actually just test placebos to figure out how the extent and nature of their biasing effects.
For example, Millikan’s tests of the electron supported his conclusions about the charge of the electron by essentially dismissing variant measurements as not actually being measurements of electrons. Research in modern particle physics relies heavily on mathematical structure because it deals with systems that cannot be observed and are instead specified, even derived, by equations and confirmed by them in a manner that wouldn’t have been considered proper physics or even science a century ago. It is so difficult to work out how double-blind studies can be biased by placebo effects that many studies actually just test placebos to figure out how the extent and nature of their biasing effects.
3) If the only options are all wrong, it can be extremely difficult to falsify because you can continually confirm and falsify
This is what happened with light. A century of research before Einstein’s 1905 paper on the photoelectric effect confirmed that it was a wave. Einstein’s work showed it was in fact a particle. The problem was that it was neither, because neither particles nor waves exist. We were lucky that here we were dealing with an incredibly simple system and unbelievably clearly contradictory results, because otherwise we’d still have a division between supporters of “particle theory of light” and those of “wave theory of light”.
4) Failures to replicate and other methods of falsification can be explained (and explained away!)
This happened with planetary discovery. The discovery of Uranus seemed to indicate that Newton’s mechanics and theory of gravitation were wrong: the planet wasn’t travelling as predicted. However, rather than abandon the entire theory, or insist that errors of measurement were the issue, it was speculated that something else could explain both the observations and why they weren’t as predicted without ridding us of Newton’s laws. Turns out the problem was the assumption of 7 planets, and the issue vanished with the discovery of Neptune. A similar issue happened with Mercury and a similar attempt was made. It failed (Newtonian gravity was wrong, but we didn’t know that until many years later, thanks to Einstein).
5) Much research in many sciences is inherently unfalsifiable.
In short, falsification is problematic as a tool, useful as a guideline, and fails as a distinguishing criterion.
5) Much research in many sciences is inherently unfalsifiable.
String theory in its various forms, multiverse theory, supersymmetry, and a host of other work in cosmology and theoretical physics is and will remain (perhaps forever) untestable. However, it is very much related to extremely important research and helps to guide and stimulate discovery.
In short, falsification is problematic as a tool, useful as a guideline, and fails as a distinguishing criterion.