3/12/2016

the no-miracles argument may not commit the base-rate fallacy

Certain philosophers argue that the No-Miracles Argument for realism (Colin Howson, Peter Lipton), the Pessimistic Induction against realism (Peter Lewis), or both arguments (P.D. Magnus and Craig Callender) commit the base-rate fallacy. I am not sure these objections are correct, and will try to articulate the reason for my doubt here.

I need to give some set-up; many readers will be familiar with some or all of this. So you can skip the next few paragraphs if you already know about the base-rate objection to the No-Miracles Argument and the Pessimistic Induction.

I suspect many readers are familiar with the base-rate fallacy; there are plenty of explanations of it around the internet. But just to have a concrete example, let’s consider a classic case of base-rate neglect. We are given information like the following, about a disease and a diagnostic test for this disease:

(1) There is a disease D that, at any given time, 1 in every 1000 members of the population has: Pr(D)=.001.

(2) If someone actually has disease D, then the test always comes back positive: Pr(+|D)=1.

(3) But the test has a false positive rate of 5%. That is, if someone does NOT have D, there is a 5% chance the test still comes back positive: Pr(+|~D)=.05.

Now suppose a patient tests positive. What is the probability that this patient actually has disease D?
Someone commits the base-rate fallacy if they say the probability is fairly high, because they discount or ignore the information about the ‘base rate’ of the disease in the population. Only 1 in 1000 people have the disease. But for every 1000 people who don’t have it, 50 people will test positive. You have to use Bayes’ Theorem to get the exact probability that someone who tests positive has the disease; the probability turns out to be slightly under 2%.

In the context of the No-Miracles and Pessimistic Induction arguments, the objection is that both arguments ignore a relevant base rate. For example, the No-Miracles argument says:

(A) Pr (T is empirically successful | T is approximately true) = 1

(B) Pr (T is empirically successful | ~ (T is approximately true)) <<1. Inequality (B) is supposed to capture the ‘no-miracles intuition’: the probability that a false theory would be empirically successful is so low that it would be a MIRACLE if that theory were empirically successful. Hopefully you can see that (A) corresponds to (2) in the original, medical base-rate fallacy example, and (B) corresponds to (3). Empirical success is analogous to a positive test for the truth of a theory, and the no-miracles intuition is that the false-positive rate is very low (so low that a false positive would be a miracle). The base-rate objection to the No-Miracles argument is just that the No-Miracles argument ignores the base rate of true theories in the population of theories. In other words, in the NMA, there is no analogue of (1) in the original example. Without that information, even a very low false-positive rate cannot license the conclusion that an arbitrary empirically successful theory is probably true. (And furthermore, that base rate is somewhere between extremely difficult and impossible to obtain: what exactly is the probability that an arbitrary theory in the space of all possible theories is approximately true?)

OK, that concludes the set-up. Now I can state my concern: I am not sure the objectors’ demand for the base rate of approximately true theories in the space of all possible theories is legitimate. Why? Think about the original medical example again. There, we are simply GIVEN the base rate, namely (1). But how would one acquire that sort of information, if one did not already have it? Well, you would have to run tests on large numbers of people in the population at large, to determine whether or not they had disease D. These tests need not be fast-diagnosing blood or swab tests; they might involve looking for symptoms more ‘directly,’ but they will still be tests. And this test, which we are using to establish the base rate of D in the population, will still presumably have SOME false positives. (I’m guessing that most diagnostic tests are not perfect.) But if there are some false positives, and we don’t yet know the base rate of the disease in the population, then—if we follow the reasoning of the base-rate objectors to the NMA and the PI—any conclusion we draw about the proportion of the population that has the disease is fallacious, for we have neglected the base rate. But on that reasoning, we can never determine the base rate of a disease (unless we have an absolutely perfect diagnostic test), because of an infinite regress.

In short: if the NMA commits the base-rate fallacy, then any attempt to discover a base rate (when detection tools have false positives) also commits the base-rate fallacy. But presumably, we do sometimes discover base rates (at least approximately) without committing the base-rate fallacy, so by modus tollens, the NMA does not commit the base-rate fallacy.

NMA does not commit the base rate fallacy, because it does not ignore AVAILABLE evidence about the base rate of true theories in the population of theories. In the medical example above, the base rate (1) is available information; under-weighing generates the fallacy. In the scientific realism case, however, the base rate is not available. If we did somehow have the base rate of approximately true theories in the population of all theories (the gods of science revealed it to us, say), then yes, it would be fallacious to ignore or discount that information when drawing conclusions about the approximate truth of a theory from its empirical success, i.e. the NMA would be committing the base-rate fallacy. But unfortunately the gods of science have not revealed that information to us. Not taking into account unavailable information is not a fallacy; in other words, the base-rate fallacy only occurs when one fails to take into account available information.

I am not certain about the above. I definitely want to talk to some more statistically savvy people about this. Any thoughts?


3 comments:

Protagoras said...

You can generally get evidence for the false positive rate by repeating the test and seeing how often the repeats replicate, or comparing results of multiple tests on the same person. In the case of diseases, diagnoses generally become more accurate as the disease progresses and more symptoms appear, so there are more accurate tests (later tests) to use to estimate base rates and apply to our interpretation of less accurate (earlier) tests. To have none of this available would be to be in a situation where one and only one test is in any way indicative of the presence of a particular disease (it couldn't even have any symptoms). And in that case I don't think we would be in a position to evaluate the accuracy of a test for the stealth disease.

Greg Frost-Arnold said...

Thanks for that, Aaron! I'm trying to think through the analogue for scientific theories -- is your thought that empirical success as the mark of approximate truth is "a situation where one and only one test is in any way indicative of the presence of a particular disease"? Or is each further confirmed prediction (or 'confirmed novel prediction') like a later diagnostic test? Or neither of those?

Protagoras said...

I wasn't quite sure where to go with it, but having thought about it further, I'm inclined to think something like this. Pretty much the only way to evaluate a test is in comparison to other tests, which provide us with things like evidence for base rates. But one could treat all the tests taken together as one test. In that case, evaluating the whole would seem to be impossible.

If I have correctly described the situation, NMA is trying to make an argument about the whole of science, and the base rate complaint is one way of pointing out that we can't really do that, because the fact that we're looking at science as a whole means it is in principle impossible to get the kind of independent evidence that would be required for us to evaluate that general argument.

But perhaps this is all my biases. I guess my own view is that the problem of induction has no solution; one can only ignore it. And NMA purports to give a solution, and I'm interpreting the base rate argument as an attempt to remind us that no, there really isn't a solution. But I'm not an expert on the details of NMA vs. base rate, so I may be missing subtleties of the particular issue and mistakenly assimilating it to broader issues from which it is separate.