social predictivism

21
ERIC BARNES SOCIAL PREDICTIVISM* ABSTRACT. Predictivism holds that, where evidence E confirms theory T, E confim~s T more strongly when E is predicted on the basis ofT and subsequently confirmed than when E is known in advance of T's formulation and 'used', in some sense, in the fornmlation of T. Predictivism has lately enjoyed some strong supporting arguments from Maher (1988. 1990, 1993) and Kahn, kandsberg, and Stockman (1992). Despite the many virtues of the analyses these authors provide it is my view that they (along with all other authors on this subject) have failed to understand a fundamental truth about predictivism: the existence of a scientist who predicted T prior to the establishment that E is true has epistemic import for T (once E is established) only in connection with information regarding the social milieu in which the T-predictor is located and information regarding how the T-predictor was located. The aim of this paper is to show that predictivism is ultimately a social phenomenon that requires a social level of analysis, a thesis 1 deem 'social predictivism'. 1. INTRODUCTION Predictivism holds that, where evidence E confirms theory T, E confirms T more strongly when E is predicted on the basis of T and subsequently confirmed than when E is known in advance of T's formulation and 'used', in some sense, in the formulation of T. Predictivism has lately enjoyed some strong supporting arguments from Maher (1988, 1990, 1993) and Kahn, Landsberg and Stockman (1992). Despite the many virtues of the analyses these authors provide it is my view that they (along with all other authors on this subject 1 ) have failed to understand a fundamental truth about predictivism: the existence of a scientist who predicted T (hereafter, a 'T- predictor') prior to the establishment that E is true has epistemic import for T (once E is established) only in connection with information regarding the social milieu in which the T-predictor is located and information about how the T-predictor was located within the milieu. The aim of this paper is to show that predictivism is ultimately a social phenomenon that requires a social level of analysis, a thesis I deem "social predictivism". It is not my claim that the social dimension of predictivism has gone completely unnoted. As will be explained below, Maher requires an assumed value for the prior probability that an arbitrary scientist's method of theory construction is 'reliable' - but the assessment of this prior proba- bility presumably will be an assessment of the available heuristic methods Erkennmis 45: 69--89, 1996. @ 1996 Kluwer Academic Publishers. Printed in the Netherlands.

Upload: eric-barnes

Post on 06-Jul-2016

216 views

Category:

Documents


3 download

TRANSCRIPT

ERIC BARNES

SOCIAL PREDICTIVISM*

ABSTRACT. Predictivism holds that, where evidence E confirms theory T, E confim~s T more strongly when E is predicted on the basis ofT and subsequently confirmed than when E is known in advance of T's formulation and 'used', in some sense, in the fornmlation of T. Predictivism has lately enjoyed some strong supporting arguments from Maher (1988. 1990, 1993) and Kahn, kandsberg, and Stockman (1992). Despite the many virtues of the analyses these authors provide it is my view that they (along with all other authors on this subject) have failed to understand a fundamental truth about predictivism: the existence of a scientist who predicted T prior to the establishment that E is true has epistemic import for T (once E is established) only in connection with information regarding the social milieu in which the T-predictor is located and information regarding how the T-predictor was located. The aim of this paper is to show that predictivism is ultimately a social phenomenon that requires a social level of analysis, a thesis 1 deem 'social predictivism'.

1. INTRODUCTION

Predictivism holds that, where evidence E confirms theory T, E confirms T more strongly when E is predicted on the basis of T and subsequently confirmed than when E is known in advance of T's formulation and 'used', in some sense, in the formulation of T. Predictivism has lately enjoyed some strong supporting arguments from Maher (1988, 1990, 1993) and Kahn, Landsberg and Stockman (1992). Despite the many virtues of the analyses these authors provide it is my view that they (along with all other authors on this subject 1 ) have failed to understand a fundamental truth about predictivism: the existence of a scientist who predicted T (hereafter, a 'T- predictor') prior to the establishment that E is true has epistemic import for T (once E is established) only in connection with information regarding the social milieu in which the T-predictor is located and information about how the T-predictor was located within the milieu. The aim of this paper is to show that predictivism is ultimately a social phenomenon that requires a social level of analysis, a thesis I deem "social predictivism".

It is not my claim that the social dimension of predictivism has gone completely unnoted. As will be explained below, Maher requires an assumed value for the prior probability that an arbitrary scientist's method of theory construction is 'reliable' - but the assessment of this prior proba- bility presumably will be an assessment of the available heuristic methods

Erkennmis 45: 69--89, 1996. @ 1996 Kluwer Academic Publishers. Printed in the Netherlands.

70 SOCIAL PREDICTIVISM

within the scientific community, or an assumption of the relative frequency of reliable methods among all the methods in use. This is clearly a kind of social judgement, mirrored by KLS's assumption (also explained below) of a prior probability that an arbitrary scientist is 'talented'. KLS's (1992, Section 3) also sketches a social planning scenario in which a scientist's choice to accommodate or attempt to predict data carries information about the scientist's level of talent (though I argue below that their analysis of this point suffers from its failure to appreciate the thesis of this paper). The latter point does not address the issue of how social factors influence the epistemic import of successful prediction but simply reflects our suspicion that predictors are confident-- often rightly s o - of their own scientific abil- ities; the former point does address this issue, but only begins to scratch the surface of the extent to which predictivism turns on social factors.

In what follows I briefly summarize the analysis of predictivism pro- vided by Maher and KLS. I then explain how both Maher and KLS tail to apprehend the social dimension of predictivism and identify how facts about the scientific community bear on the epistemic significance of the fact that a T-predictor existed in advance of the demonstration of confirming evidence E.

2. MAHER'S METHOD-BASED ANALYSIS OF PRED1CTIVISM

Let's consider the example Maher ponders in his 1988 article. Consider a subject - the predictor - who predicts the outcome of 100 coin flips; his sequence of predicted flip outcomes is theory T. Thereafter the coin is flipped 99 times, and each flip results in just the predicted outcome - we deem this conjunction of the apparently random initial 99 outcomes E. Maher instructs us to consider what probability we would attach to the claim that the entire sequence of outcomes (T) generated by the predictor is true, given that she successfully predicted E.

Maher now constructs a different scenario. Another subject- the accom- modator - is initially presented with the results of the first 99 flips. He then constructs the same theory T, based on his accommodation of the initial flips together with his prediction of the outcome of the 100th flip. We are now to consider what probability we would attach to T now, where T is based on just the 'accommodation' of evidence E, viz., the construction of T in the presence of the knowledge that an adequate theory must fit E. Clearly, the only reasonable reply is that T is substantially less confirmed in this case than in the former case, despite the fact that it would seem to be just evidence E that is offered in support of T in each case. This is a clear illustration of the predictivist thesis.

ERIC BARNES 7 ]

The 100th flip prediction is so much more believable when it follows the successful prediction of the initial 99 flips because, according to Maher, the successful prediction of the 99 flips constitutes persuasive evidence that the predictor 'has a reliable method' of making predictions of coin flip outcomes. T 's consistency with E in the case of the accommodator provides no particular evidence that the accommodator's method of making predictions is rel iable- thus we have no particular reason to trust her 100th prediction. I would argue that the gist of Maher's analysis should be put as follows. Following Maher's (1988), we assume for simplicity that any predictor of coin flips is following a predictive method which is wholly reliable or is following a method no more reliable than a random method. 2 When we are confronted with a predictor who has predicted T in advance of his witnessing any flip outcomes, we are faced with the following dilemma once E is observed: either the T-predictor is following a reliable method of making predictions or is following a no better than random method but has luckily predicted T (and thus E) nonetheless. But if we judge that the prior probability of a randomly selected predictor following a reliable method is (though presumably small) much greater than the probability that a randomly selected predictor is both unreliable and will happen to generate T nonetheless, then we will judge that the predictor probably has a reliable method for making coin flip predictions. So what ultimately drives the inference that the predictor's predictive method is reliable is our antecedent judgement that

(%) P(R) >> P(ML,/~R)P(~R). where R asserts that the predictive method in use is wholly reliable, ~ R that the method is no better than random, and ME that the relevant method predicts/predicted E. For example, suppose P(R) = 0.05 while P(ME/,-oR)P(~R) = 0.001; this entails that an E-predictor is, upon the demonstration of the truth of E, 50 times more likely to be following a reliable method than not (for it is fifty times more likely that an arbitrary predictor's method is reliable - and thus will predict E if E is true - than that an arbitrary predictor will happen to predict E though his method is unreliable). But in that any predictor's method must (on our above assump- tion) be either wholly reliable or unreliable, it follows that the probability that a successful E-predictor's method is reliable is approximately 0.98.

On the other hand, since the accommodator did not predict E, we are not faced with the dilemma of explaining a successful prediction, hence are free to point out that the probability of the accommodator's predictive method being reliable is just the value of P (R) (assumed above to be 0.05).

The truth of predictivism, Maher argues, depends heavily on various assumptions, one of which is that we who assess the epistemic significance

'72 SOCIAL PREDICTIVISM

of predicted or accommodated evidence mustered by a scientist not know how reliable or unreliable the scientist's predictive method is. If we know that a scientist is following a wholly reliable predictive method and has endorsed T, then we know that T is true whether or not T has been confirmed; if we know the method is no better than random, then we know that T's future predictions have a very small chance of being true whether or not T has yielded successful predictions in the past (for we know that any successful predictions were the result of luck). Indeed Maher's point is that predictivism holds true just because successful prediction provides evidence for predictive method reliability - where the method's reliability is antecedently known prediction has no special epistemic status.

3. KLS'S TALENT-BASED ANALYSIS

Although the technical methods employed by KLS's (1992) differ from Maher's, and their aims differ on several points, there is a fundamental sim- ilarity between their understandings of the ground of predictivism. Where Maher considers predictive methods of unequal reliability, KLS consider scientists of unequal 'talent', where a scientist's talent is a measure of the likelihood that a theory that scientist proposes is a 'good' theory (in one of several senses, cf. p. 505). Where Maher idealizes that any method used is either wholly reliable or no better than a random method, KLS idealize that there are just two types of scientists: type i scientists are more 'talented' than type j scientists. This holds, KLS assume, whether or not they are 'looking first' (viz., accommodating evidence in the construction of a the- ory) or 'theorizing first' (viz., proposing the theory and then checking its empirical consequences). The basic point is that when a scientist theorizes first and makes a prediction that is subsequently confirmed, this raises the probability that the scientist is an/-type rather than a j-type scientist, thus raising the probability that the proposed theory is true more than if the scientist had looked first and constructed the theory with the acquired evidence in hand. The analogy between this analysis and Maher's is clear: successful prediction, ~br KLS, raises the probability that a scientist is talented, but KLS's very broad sense of talent is similar to Maher's very broad sense of 'using a reliable method': both Maher and KLS argue for a high credibility for theories when they are proposed by scientists who have demonstrated predictive skill.

ERIC BARNES 73

4. ACCOMMODATOR MEETS PREDICTOR

Peter Lipton considers in his (1991, Chp. 8) the following scenario, which we translate here into the terms of Maher's coin flip example: imagine an accommodator - Alice - who is presented with E (the initial 99 flip outcomes) and constructs on its basis theory T (which conjoins E with a prediction that the 100th flip will result in the outcome heads). Alice now considers how likely is theory T given everything she knows-- insofar as the 100th flip prediction was essentially a guess on her part, she assigns the probability Pl to T (presumably about 0.5). Alice now happens to encounter a T-predictor, Priscilla, who informs Alice that she predicted 7' in advance of seeing the outcomes constituting E -- Priscilla, upon learning that E has held true, has assigned the probabilityp2 to T (presumably close to 1). Lipton inquires which probability for T l~I or t~2 should Alice and Priscilla agree on?

The answer is clear: they should agree on P2. For on Maher's method- based analysis, there is substantial evidence that Priscilla is following a reliable method of coin flip prediction, thus it is very likely that T is true given that Priscilla endorsed T and thus successfully predicted E. (For KLS, Priscilla's successful prediction raises the probability that she is talented.) The fact that Alice had previously attached the lower value p! to T does not undermine the epistemic significance of Priscilla's successful prediction of E.

Let us now consider Alice the accommodator in isolation again we assume she does not know whether a T-predictor like Priscilla exists or not, so Alice still assigns T probability Pl. Now it would seem reasonable at this point for Alice to concede that if she were to learn ofa T-predictor's existence (i.e. learn that there was a predictor who predicted T in advance of E 's becoming known), she should raise her estimate of T's probabilit? to P2 (assuming she were not to acquire any other evidence in favor of or against T itself). We deem this counterfactual claim '(A)':

(A) If Alice were to learn that a T-predictor existed prior to the demonstration of E, Alice should raise her estimate of T'> probability from Pl to P2 (assuming Alice were not to acquire any other evidence relevant to T's truth).

It would seem to be clearly in accordance with Maher's and KLS's analy- sis to accept (A) (though neither explicitly consider this precise claim). But while (A) has intuitive appeal, it falls apart on closer inspection. The failing of(A) is its assumption that the epistemic significance for the ratio- nal degree of belief in T of the T-predictor's existence can be evaluated

74 SOCIAL PREDICTIVISM

independently of various facts about the relevant portion of the scientific community in which the T-predictor works. In the remainder of this paper we consider various ways in which this proves true. For the sake of simplic- ity and concreteness we will continue to think in terms of Maher's coin flip example, bearing in mind that this example of course differs from actual scientific examples in any number of ways. It is my belief, as Maher's, that reflection on this simple example will pave the way toward a deeper understanding of the methodological premium on predictive success so widely endorsed by the scientific community.

5. THE COMMUNITY OF PREDICTING SCIENTISTS

We consider now tile portion of the scientific community that has been working on the same problem that Alice devotes herself to: the identifi- cation of the true theory of the 100 coin flip outcomes of our example. More precisely, let us consider the portion of this community that made predictions regarding the outcomes of the 100 flips prior to the first coin flip we deem this 'the predicting community' for this empirical domain. Now suppose that Alice (who is not herself a member of the predicting community as she did not make a prediction before witnessing the first 99 flips) learns that there are exactly N-many coin flipologists in this predict- ing community, thus N-many predictions of the flip outcomes were made (some of which may be identical we assume only one prediction per pre- dictor). Now the question becomes what is the episternic significance for T of the supposition that there was a T-predictor (i.e. at least one) amongst tile predicting community prior to the establishment of E? The relevant point here is that this epistemic significance is surely not independent of the number N. For tile larger is N, the greater the probability that there will be at least one T-predictor within the N-many predicting coin flipologists even i f T is false, for the probability becomes greater than some unreliable predictor will happen to predict T. This point establishes that (A) is too simple to be an adequate characterization of the epistemic significance for T of a T-predictor's existence.

We recall Alice's position: she witnesses E (the outcome of the first 99 flips) and then constructs T by conjoining E with the prediction that the 100th outcome will be a heads at this point she regards T's probability as about 0.5, since the heads prediction is a guess, so we stipulate that, for Alice, P(T) = 0.5. Let us define 'D ' as 'There is at least one T-predictor in the predicting community'. The problem is to define P(T/D) for arbitrary N. For some N Bayes' theorem of course provides that:

ERIC BARNES 75

(1) P ( T / D ) = P ( T ) P ( D / T ) P{1) /T)P(T) + P(D/~.oT)P(~.oT)

We recall that for Alice, P(T) =- P ( ~ T ) = 0.5. So (1) reduces in this context immediately to:

(2) P ( T / D ) = P ( D / T ) P ( s P ( D / ~ T )

Now P ( D / T ) will be equal to the probability that-- on our assumption that T is true - not all of the N-many predicting scientists fail to predict T. We can determine P ( D / T ) thus by subtracting the probability that all fail to predict T (assuming T) from 1 (i.e., P ( D / T ) = l - P ( ~ D / T ) ) . Now the prior probability that an arbitrary predictor following discovery method M will happen to predict theory T (assuming T) is P(MT'/T); the prior probability that an arbitrary predictor following M will predict some theory other than T (assuming T) is P(M~,T/T). In that the probability that one predictor will fail to predict T is independent of the probability of what any other arbitrary predictor will predict, the probability that all N many predictors will predict some theory other than T (assuming T) is P(M./I /T):V, so the probability that at least one predictor will predict T (assuming T) is 1 - P ( M ~ , 1 / T ) x. Likewise, the probability that at least one predictor will predict T assuming -~T is I - P(M.,-v/-,.T) '\ Substituting in (2} we derive:

t3) P ( T / D ) = 1 - P(M_-v/T) x

(1 - N) + (1 - x )

Let us pause here to convince ourselves of the following fact: i f N = 1 so that there is only one member of the predicting community, tile P ( T / D ) is very close to 1. However, as N increases indefinitely, P(T /D) drops toward the original prior probability of T, 0.5. The intuition in back of this result is straightforward: if the predicting community contains only a '-/'-predictor, and E7 is confirmed, it is (as Maher argues) very likely that the predictor is following a reliable predictive method, thus very likely that T is true. But as the size of the predicting community gets larger, the fact that there is at least one T-predictor in that community gets less and less epistemically significant for T. For increasing N it becomes more probable that some unreliable predictor will predict T thus the degree of new confirmation provided by the assurance of a T-predictor's existence falls off as N increases (though the probability o f T should not drop below the original value of P(T) of 0.5, as the information that at least one T-predictor exists should never disconfirm T).

76 SOCIAL PREDICTIVISM

To convince ourselves of this, let us plug some plausible values into (3) and plot P(T/D) against N. The two terms in need of values are P(A,I~,r/T) - 1 - P ( M T , / T ) and P(M~,r/~T) = 1 - P ( M T / ~ T ) (these equalities hold since we assume each method M generates some theory about the coin flip outcomes), so let us stipulate plausible val- ues for P(MT,/T) and P(J~II/~T). Our assumption (%) that P(R) >> P(Mt,:/~R)P(~FI) asserts that it is much more likely that an arbitrary method is reliable than that a method will happen to generate E while itself unreliable. This assumption, however, entails that P(M,r/T) >> P(MT/~T)/ 'or arbitrary method M for i fT is true then T is much more likely to be predicted by an arbitrary method than if T is false. This is because i f T is true then it will be predicted by any method which is either reliable or is unreliable but happens to predict T but i f T is false then T can only be predicted by an unreliable method that happens to predict T. But again, (%) entails that it is much more likely that an arbitrary method is reliable than an unreliable method will happen to generate /~ (which constitutes 99/100's o fT) , so P(Mv/T) >> P(MT,/~T).

Let us pause at this point to take note of a small technical point: if we stick to Maher's example in which T contains one hundred conjuncts, then P(ellT/~T) will be virtually equal to the probability that a random heads/tails generator will generate a particular sequence of 100 outcomes -but this probability will be the mind bogglingly small (0.5)100 or 7.89 x 10 31. This number is, I think, too small to mirror real world analogues of the coin flip example and will prove computationally unmanageable in the calculations to follow. I propose that at this point we redefine T to consist of an apparently random sequence of 10 (not 100) flip outcomes (the last outcome being heads); k.; is thus reconstrued as the initial 9 outcomes. Below we make assumptions that respect this modified version of Maher's coin flip example. We will work with this example throughout the rest of this paper.

(4) Assumptions: P(MT/T) 0.05

t '( :tl,r/~T) -0 .001 -~

So, I)(M_,r/T) ~- 0.95 and P(M~<r/'~T) = 0.999 (given ourassumption that any method will generate some theory). Substituting into (3) we obtain:

(0.95) ,\ (5) P ( 7 ' / D ) =

(1 0.95 :v) +- (1 - 0,999 :\)

So for N = 1, P(T/D) - 0.98; P(T/D) clearly drops toward 0.5 i~br increasing N, as Figure I shows. [The data points are as follows: for

P(T/D)

1

0 . 7 5

0 . 5 0

0 . 2 5

0

ERIC BARNES

5 0 0 1 0 0 0 1 5 0 0 2 0 0 0 2 5 0 0 3 0 0 0 N

~gum I.

77

N = 1: P(T/D) = 0.98; 500: 0.72; 1000: 0.61; 1500: 0.56: 2000= 0.54: 2500 = 0.52; 3000 = 0.51 .)

Now let us vary the information Alice receives in the following way: instead of learning that there is at least one T-predictor among the N-many predicting scientists, let us assume she is informed that there is exactly one T-predictor in the predicting community. What should her updated probability for T be given this information? Intuitively, for sufficiently small N the existence of a T-predictor should still count as strong evidence for T once E is established. But for growing N, the existence of but one predictor will increasingly count against T! This is primarily because if T is true, all reliable predictors will predict T - but then if T is true and there is only one T-predictor, this entails that there is at most one reliable predictor in the predicting community, a claim increasingly improbable for sufficiently large N (assuming of course that P(R) > 0). If T is false, the existence of a unique predictor is more understandable for large N, as a unique T-predictor is compatible in this event with there being any number of reliable predictors in the community.

Let 'D! ' assert that the predicting community contains exactly one T- predictor. We seek the value of P(T/D!) for arbitrary N. Again, given Alice's situation, P(T) = P(~T) = 0.5, so Bayes' Theorem gives

(5) P(T/D!) = P(D!/T)

P(D!/T) + P(D!/~,T)

Assuming a predicting community of N-many members, any scenario in which some particular predictor is the unique T-predictor and the remain- ing N - 1 predictors predict some theory other than T will have a probability equal to P(MT/T)[P(M~T/T)] x-1 . In that there are N-many possible states in which a unique predictor exists (corresponding to the possibility that each of the N predicting scientists is the unique T-predictor), the probability that there is a unique T-predictor if T is true is N[P(MT/T)P(M~T/T)N-I]. By identical reasoning, the probability

78 SOCIAL PREDICTIVISM

P(T/D!) 1

0.75

0.5

0.25

0

N:I N:50 N=I00 N--150

Figure 2.

that there is a unique T-predictor if T is false is just N[P(MT/~T)P(M~T/~T)N-I] . So, substituting in (5), and cancelling the N's , we have

(6) P(T /D! )=

P( MT /T) [ P( M~T /T )N- I l P(MT/T) [IP(M~T/T)] N- 1 + P(MT/~T)[P(M~T/~T)] N-1

To show how P(T/D!) varies with increasing N let us plug in some appropriate numbers and plot the conditional probability against N. We use the same assumptions as given in (4).

So for some N,

(7) (0.05)(0.95) 1~ l

P(T/D!) = 0.05(0.95).~ I + 0.001(0.999)N-1

In Figure 2 we represent P(T/D!) against rising N. (The data points are N = 1 : P(T/D!) = 0.98; 50: 0.81, 100: 0.256, and 150: 0.027.) The point to appreciate is that the information that there is a unique T-predictor can have widely varying epistemic import for T depending on the size of N - while the case o f N = 1 is indistinguishable from the case o f N = 1 where there is at least one T-predictor, P(T/D!) falls off astonishingly rapidly for increasing N (given our Assumptions as described in (4)). For if T were true there would almost certainly be more than one T-predictor on our Assumptions - the absence of additional T-predictors thus counts against T for large N. 4

ERIC BARNES 79

6. A PREDICTIVE PARADOX?

Let us consider again Alice. We recall that we have modified our assump- tions about her evidence and theory: she has witnessed E (the initial, appar- ently random, sequence of 9 coin flip outcomes) and conjoined E with the hypothesis that the 10th flip will turn up heads (a guess on her part); she has thus proposed theory T, and attaches to T probability Pl (about 0.5). Now let us suppose that Alice randomly encounters the T-predictor Priscilla, and by reasoning explained above Alice raises her degree of belief in T to P2 (close to 1). Now let us suppose that Alice subsequently learns that Priscilla was part of a vast predicting community for the coin flip problem - in fact she was one of 3000 predictors for this problem! Now it might seem as though Alice should lower her degree of belief in T to 0.51 (see Figure 1), given that P(T/D) = 0.51 for N = 3000. But this is terribly counterintuitive - why should it matter with respect to Priscilla's putative reliability that there happened to be 2999 other predictors pondering the same problem?

In fact, it doesn't matter, and Alice should, given the way the above scenario was described, continue to attach probability P2 to T. The reason is that Alice encountered Priscilla 'at random', viz., as the result of what we imagine to be something like Alice's random sampling of the predicting community. This being so, it is nonetheless vastly more likely that Alice is a reliable predictor than that she is unreliable but happened to predict T nonetheless, as explained above, hence very likely that T is true. How- ever, suppose that Alice discovered Priscilla's existence not by a random encounter but as the result of a systematic search through all 3000 predic- tors for a T-predictor. In this event, Alice's discovery of Priscilla amounts to nothing more than the information that there is at least one T-predictor among 3000, and Alice should simply set her degree of belief in T on D equal to P(T/D) for N = 3000 (0.51). The reason for the epistemic asym- metry between Alice's random and non-random sampling methods derives from the Assumptions we made in (4): if the sampling method is a single random selection, it is extremely unlikely that the method will produce a T-predictor if T is false (given that P(MT/~T) is so low). However, if T is true, the probability of it producing a T-predictor is comparative- ly much greater (since P(MT/T) >> P(MT/,.~T)). Since the random selection of a T-predictor is much less surprising assuming T rather than ~ T , it thereby confirms T. But if the sampling method is exhaustive and thus non-random, then the method will reveal any T-predictor's existence including those whose methods are unreliable no matter how few there are of these in the predicting community. Hence it turns out that the epistemic significance of a T-predictor's existence depends not only on how large

80 SOCIAL PREDICTIVISM

the predicting community is, but on the method by which a T-predictor's existence is revealed to the rest of the community.

Let us consider another problem: imagine Alice again, prior to learning of any T-predictor's existence, still with degree of belief pl in T. As before, Alice encounters Priscilla at r a n d o m - and learning of Priscilla that she is a T-predictor prompts Alice to raise her degree of belief in T to P2. Now in this case, Alice is informed that Priscilla is one of a member of 3000 predictors for the coin flip problem, but also informed that in fact Alice is the unique T-predictor in this community. Now, extrapolating from the analysis presented in Figure 2, it looks as though Alice should lower her degree of belief to a tiny probability, as P(T/D!) will clearly be extremely low (Figure 2 gives P(T/D!) as 0.027 for N = 150; for N = 3000 the value will be vastly lower). But should it matter visa vis Priscilla's apparent reliability that she was a member of a community of 3000 predictors, one in which she was the only T-predictor? The answer, surprisingly, is yes! For by analysis presented above, the failure of other predictors to predict T constitutes strong evidence against T for sufficiently large N, and the realization that there are no other T-predictors counts as new and powerful evidence against T for such N. The data graphed in Figure 2 is of course based on the assumed values for P(MT/T) and P(MT/~T) given in (4) - but the same basic point will hold for any non-zero values assumed for these conditional probabilities such that P(MT/T) > P(MT/,,~T); though different assumptions may shift the position and slope of the graph, it will remain a downward curve drifting toward 0 for increasing N.

The reason, again, that the realization that Priscilla is a unique T- predictor in a community of 3000 forces Alice to lower her degree of belief in T, while the realization that Priscilla was simply one of possibly more T-predictors in the community does not, is that the latter realization, unlike the former, is compatible with there being any number of other T-predictors in the community - thus this realization does not constitute evidence against T, as does the former.

7. COUNTERPREDICTORS

Let us return to Alice one last time, and imagine that she randomly encoun- ters Priscilla the T-predictor, whereupon Alice raises her degree of belief in T from Pl to P2 once E is established. But now let us suppose that Alice randomly encounters another predictor, who is addressed as 'Countess'. Now in advance of the demonstration of E Countess predicted T ~, where T I asserts E in combination with the prediction that the 10th flip would be tails; T ~ thus agrees with T on the initial 9 predictions (which will

ERI(; BARNES 8 1

constitute E) but differs on the last flip prediction. The point, of course, is that upon the demonstration of E Countess has demonstrated no less pre- dictive skill than Priscilla, and thus E counts equally in favor of T and T'. Countess is an example of what I deem a 'counterpredictor' with respect to the T-predictor Priscilla; she is a predictor of a theory inconsistent with T but which, like T, accords with E.

What is the probability o f t conditional on the existence of a T-predictor and a T'-predictor once E is known? Once E? is known, P(T V T') -- 1, but the total evidence here clearly favors neither theory over the other, so of course the probability of T in this case is 0.5. For the same reason 7-' will deserve the same conditional probability if there is an equal number of T-predictors and counterpredictors no matter what this number is (a proof of this claim is analogous to the proof provided below).

Suppose that Alice happens to encounter Priscilla and Countess, but then happens to meet another T-predictor. What is the updated probability of T on the existence of two T-predictors but just one T'-predictor? Upon meeting Priscilla and Countess, Alice's new probability for T is 0.5; this situation is thus epistemically equivalent (as regards the probability of T) to one in which Alice has not encountered any predictors. After meeting the second T-predictor, Alice should presumably be tree to regard this T-predictor's existence as confirming T just as strongly as if Alice had not met the first two predictors, tbr the counteracting predictions of Priscilla and Countess do not in any way undermine the significance of the second T-predictor's existence, so it seems her probability for T in this event should just be P2.

Let's attempt to sharpen the above analysis. We consider three randomly encountered members of the predicting community we deem A, B, and C: A and t? are T-predictors (which we symbolize as At and Bt) while C is a T'-predictor (Ct'). We seek the value of P(T/AtBtCt ' ) in a context in which each predictor successfully predicted E. Bayes theorem gives:

(8) P(T/AtBtCt ' ) = P(T)P(A tB tCt ' /T ) P(AtBtCt ' )

Since the predictors' predictive behaviours are mutually independent, we have

(9) P(AtBtCt ' /T ) = [P(At /T)P(Bt /T)]P(Ct ' /T)

Let's consider the three conditional probabilities on the right hand side of (9). P(At /T) , e.g., is the probability that predictor A will predict T on the assumption that T is t r u e - this is clearly intended to be the probability

82 SOCIAL PREDICT1VISM

that A will predict the true theory given the fact that A has demonstrat- ed considerable predictive skill over the initial 9 predictions. We must therefore be careful in determining which information we include in the background knowledge on the basis of which our conditional probabilities are calculated, for if we include the fact that A has generated T, P(At/T) will have value l, a value which does not reflect the intended probability. Let us therefore stipulate that the background knowledge includes only the fact that A, B, and C have generated their initial 9 predictions which collectively constitute E (predictions on which they all agree) and that E has been shown true. Now P(At/T) is equal to the probability that A is reliable (and hence will surely predict T if T is true) plus the probability that A is unreliable but will nonetheless just happen to predict the true theory T - where both probabilities take account of the fact that A has successfully predicted E. We let HA assert that the method used by A is reliable; P ' denotes the probability function that takes the background knowledge to include E and the fact that A, B and C predicted E - but excludes the information regarding the predictions made by each for the 10th flip:

(10) P'(At/T) = P'(At/TRA)P'(RA) + P'(At/T~RA) • - P'(RA)).

P'(At/TRA) is 1, since any reliable method will generate the true theory; P'(At/T~RA) is 0.5, since if A's method is no better than a random method then we should assume that it is as likely that A will predict heads for the 10th flip as tails. We thus require only a value lbr P ' (RA) (the probability that a method that has successfully predicted E is reliable). Surprisingly, we have come this far in our analysis without fixing the value of this probability. The only constraints on this value in our current context is the pair of Assumptions made in (4). These Assumptions entail P'(RA) ---- 0.96. 5 Now the probability that method A will predict T on the assumption that T is true is as follows:

(11) P'(At/T) = P'(At/TRA)P'(RA) + P'(At/T~R ~) • -

= (1)(0.96) + (0.5)(0.04)

= 0.98

By identical reasoning, P'(Bt/T) = 0.98 as well. Now P'(Ct/T) = 0.98 as well, so where P'(C~t/T) is the probability that C will predict some

ERIC BARNES 83

theory other than T (assuming T), P ' (C~ t /T ) = 0.02 (since C will certainly predict either T or some theory other than T). Given that C has already predicted E, C will predict T ' if and only if C predicts some theory other than T, hence P ' (C~ t /T ) = P'(Ct ' /T) , so P'(Ct ' /T) = 0.02. Substituting into (9),

(12) P' (AtBtCt ' /T ) = [(0.98)0.98]0.02

= 0.019208

Now of course

(13) P'(AtBtCt ' ) = P ' (A tBtCt ' /T)P ' (T)

+P' (AtBtCt ' /~T)[ I - P' (T)]

But again, given our knowledge of E, we know ,-~T =- T', so

(14) P'(AtBtCt ' ) = P ' (A tBtCt ' /T )P ' (T)

+P'(At13tCt'/T')[1 - P'(T)]

Now by reasoning structurally identical to that that running from (9) to (12), we know that

(15) P'(AtBtCt ' /T ' ) = [P'(At/T')P'(BUT')]P'(Ct ' /T ' )

= [0.02(0.02)10.98

= 0.000392

So, substituting into (14), we have

(16) P'(AtBtCF) = (0.019208)(0.5) + (0.000392)(0.5)

= 0.0098

Substituting now into (8),

(17) P'(T/AtBtCt ' ) = 0.5(0.019208) 0.0098

-: 0.98

84 SOCIAL PREDICTIVISM

Thus the probability o f T on the condition that A and/7 have predicted 77. while C has predicted T ~, is identical to the probability o f t conditioned only on the existence of a single T-predictor. This result squares with intuition explained above: the Tr-predictor together with one of the T- predictors mutually annihilate each other's epistemic significance regard- ing T, but the remaining T-predictor counts no less impressively for T. In fact, it is a straightforward generalization of the reasoning applied above to show that the probability of T on the random encounter of :c-many T-predictors and z - ~u~ T'-predictors (given 'u, < z) is identical to the probability of T on z - '~z, T-predictors where no T'-predictors have been sighted. 6

8. APPLICATIONS

It goes without saying that the illustrations of the social dimension of pre- dictivism provided above are highly idealized. For one thing, surely the actual process of constructing and testing a theory is significantly disanal- ogous to the construction and testing of theories of coin flips. Readers worried about such disanalogies should consider the attempt of Howson and Franklin (1991) to show that such disanalogies make problems fbr Maher's explication ofpredictivism; this attempt, however, has in my view been satisfactorily answered by Maher (1993), though tile point surely merits further consideration. For another thing, the assumption that every predictor is following a method that is either completely reliable or no better than a random method is particularly strong. Nonetheless, this assumpti~m could be relaxed (lbllowing Maher (1990)) without compromising at all ihe spirit of the conclusions reached above: e.g., the existence of large nmn- bers of predictors would nonetheless increase the probability that some unreliable (or moderately or highly unreliable) predictors will make sur- prisingly successful predictions and thus have tile same sort of impact on the epistemic significance of at least one (or exactly one) T-predictor.

What requires illumination is the way in which these results apply to the actual history and methodology of science. Consider the following: Immanuel Velikovsky concocted a bizarre theory of the history of the solar system (Velikovsky 1950, 1955) which, he argued, had ttle virtue of explaining the remarkable amount of similarity between the stories of var- ious ancient religious and mythological texts. Velikovsky's theories were universally denounced by the scientific community, but it was nonetheless granted that he managed to make some surprisingly successful predic- tions - and there were no clear falsifications of his theory (owing in part to Velikovsky's willingness to adopt ad hoc hypotheses when problems

ERIC BARNES 85

threatened). Contrary to then prevailing opinion, Velikovsky successfully predicted that Venus would be hot, that Jupiter would emit radio waves, and that the earth has a magnetosphere. The willingness of the community to discount these successful predictions can be explained in part on the basis of the community's judgement that his methods of theory construc- tion seemed patently unreliable. But the lbllowing point is surely true as well: if we are to widen the definition of 'scientist' so far as to include the likes of Velikovsky, we will thereby include in the scientific community just about anyone who ever cared to make a bold prediction based on some kind of evidence. It is scarcely possible that none of this vast community of predictors will ever be lucky enough to watch their bold predictions come true. The willingness of the scientific community to denounce such appar- ently successful prognosticators as Jeane Dixon and Nostradamus can be straight/brwardly explained in terms of social predictivism, whatever other explanations we might adduce as well.

One reason it is difficult to find clear historical illustrations of the epistemic relevance of the size of the community of predictors is that the actual number of scientists try, ing to construct theories of any particular empirical domain is typically quite small. Scientific research is a highly specialized activity, not only because successful research requires much training but because scientists have less incentive to tackle problems that a large number of other scientists are already working on, since the chance of succeeding where others fail is smaller when the playing field is large. So the epistemic relevance of the size of the predicting community has, I would argue, in part simply remained invisible in the history of science. This invisibility, however, should not diminish the philosopher's interest in social predictivism insof:ar as it is her aim to construct a complete theory of confirmation.

The actual relevance of the size of the predicting community is played out in part in the determination of what degree of novelty is required Ibr a prediction to count as sufficiently bold. E.g., the fact that there are known to be relatively few scientists attempting to construct theories (and thus make novel predictions) pertaining to some empirical domain entails thai predictions need not be so bold to count as successful demonstrations as they would need to be if there were a huge number of scientists constructing more theories of the domain. The tact that no one knows precisely how many predictors are at work in a particular domain need not affect this point

the rougher the estimate of the number, the rougher the corresponding determination of the required level of predictive boldness for an impressive experimental demonstration of a theory.

86 SOCIAL PREDICTIVISM

Does the fact that there is only one predictor of some theory among a large predicting community ever actually work to disconfirm the theory? It surely does. Suppose that among the very large community of economists who attempt to predict the degree of growth in the American economy next year only one predicts that there will be no growth - all others predict growth of some degree, At the end of the first quarter of the next year the lone economist's bold prediction is surprisingly vindicated - no growth is observed! Nonetheless, the fact that none of the other economists (some of whom must be among the most reliable alive) endorsed the hypothesis that there would be zero growth for the entire year works to accord low proba- bility to this hypothesis despite the lone economist's successful prediction -- growth will almost surely pick up in the remaining quarters.

The above results also find application to an issue discussed by KLS. KLS present a simple social planner's problem:

Imagine a planner who would like to build a bridge, and is seeking a scientific theory to guide its design. If the theory is true, the bridge will stand and if the theory is false the bridge will flail. The planner can direct the activities of a fixed population of scientists, and can require them a~l to either look first [i.e. acquire data and accommodate it in theory construction] or theorize first [endorse a theory, and then test its prediction@ What should he do'? (1992, p. 511)

KLS go on to pose the following solution to the problem:

If there are many researchers, each working independently and each theorizing first, then the probability that all of their theories will be rejected is very small. Therefore, since the planner requires only one surviving theory, he should order researchers to theorize first. This increases the probability that the bridge will be built and stand, at the cost of only a very small risk that no bridge can be built at all. (1992, p. 512)

While the proposed solution is tempting, it reveals a clear failure to appre- ciate the relationship between the size of the predicting community and the epistemic significance of the existence of a successful predictor. KLS seem unaware that a large predicting community is naturally more likely than a smaller one to contain a scientist who endorses a theory that will survive rigorous testing but is nonetheless false (like a T~-predictor if T turns out true). A failure to appreciate the social nature of predictivism is not merely a theoretical failure - it could have disastrous consequences for bridge walkers!

9. CONCLUSION

Maher argued at the conclusion of his (1988) that the truth of predictivism had remained hard to see for so long because so many philosophers were

ERIC BARNES 87

conv inced that facts about the me thod by which a theory is cons t ruc ted

cou ld have no re levance to the degree o f conf i rmat ion a theory en joyed on

the available ev idence - Ma he r ' s me thod based analysis o f predict ivism,

a long with K L S ' s ta lent-based analysis , belies this convict ion. I would

argue ana logous ly that the truth o f social predic t iv ism has p roved l ikewise

hard to see because there is a widespread assumpt ion a m o n g phi losophers

o f a Bayes i an ilk that has b locked our view: that assumpt ion is that social

factors will not prove to have a substant ive bear ing on the nature o f the

ev idence relation, wha tever their role in de termining the actual course o f

scientific deve lopment . This paper has refuted that assumption.

NOTES

" For comments and criticisms 1 am indebted to Doug Ehring, Mark Heller, Jean Kazez, Patrick Maher, and Alastair Norcross. Special thanks are due to Wayne Woodword for help with the proof in Section 7.

While in my view it is Maher and KLS who have been most successful in arguing for the truth of predictivism and in explaining its ground, there is a considerable literature on this subject that interested readers should consult (some of which denies the truth of predictivism). For some recent examples of this literature besides papers mentioned in the text of this paper see Schlesinger (1987), Eells (1987), Howson (1990), Brush (1993), and Collins (1994). For further analysis on Maher's program see Barnes (forthcoming). z Though this assumption is quite strong, it has the virtue of considerably simplifying the exposition of Maher's thesis (and the argument he provides for it in his (1988)) - and the same holds for the analysis to follow in this paper. Moreover, Maher shows in his (1990) that this assumption can be relaxed and the argument that the predictivist thesis is true (under certain conditions) remains sound even while allowing that there is a continuum of degrees of reliability of scientific methods. 3 Actually, P ( M T / ~ T ) will be approximately equal to 0.000928, but we round to 0.001 for computational ease, and because this does not distort any subsequent outcome. The stipulation that P( ,MT/T) = 0.05 is made to respect the requirement that P ( M T / T ) .~> P ( M z . / ~ T ) but is otherwise arbitrary. Consider:

(a) P ( M j , / T ) ~- 0.05 = P ( M : r / T R ) P ( R ) + P ( M T / T ~ R ) [ I - P(R)]

But P ( M T / T R ) = 1, as any reliable method will generate the true theory, and P ( M , r / T ~ R ) is the probability that a random generating method will predict a par- ticular sequence of 10 heads/tails outcomes, which is 0.5 ~~ or 0,000976. Inserting these values in (a) and rearranging gives P(R) = 0.049072. Now,

(b) P ( M T / ~ T ) = P (MT/ ' , ~TR)P(R) + P ( M j , / ~ T ~ R ) [ 1 - P(R)].

But of course P ( M T : / ~ T R ) = 0, as reliable methods never generate false theories. Fur~ thermore P(MT/ , ,~T~R) = P ( M , r / T ~ R ) -- 0.000976, since ifa method is unreliable it is just as likely to predict T whether or not T is true, using the above value for P(R) entails P ( M T / ~ T ) is 0.000928.

88 SOCIAL PREDICTIVISM

4Readers/nay indeed be astonished at the steep slope of the graph in Figure 2 as compared with Figure 1 - t h e reason for the difference can be found in our Assumptions made in (4), which establish that if" T is true, about 5 out of one hundred predictors should predict T, but i f T is false, only about 1 out of a thousand should do so. Thus for, say, N = 200, there will probably be about 10 T-predictors i f T is true but most likely none i f T is false. So the information that 'at least one T-predictor exists' will tend to be tightly correlated with the existence of other T-predictors and thus confirm T rather highly for this community size. But if we are assured that ONLY one T-predictor exists out of 200, this will make T's probability exceedingly low, for this information blocks the inference from the existence of" the one T-predictor to the existence of others. 5 As the probability function P takes only Mrc as background knowledge (where M is any method that has predicted E, such as that used by ,4, B or (7), we have

P ( R / E ) = e(/~)t)(E//~)

P(F /R)P(R) + P(E/~,~)[~ - V(R)]

But P ( E / R ) = 1 where Ml,: is background, and P(R) = 0.049 (see footnote 3). P ( E / ~ R ) is the probability that a random heads/tails generator will generate a particular sequence of 9 consecutive outcomes, which is (0.5) ') or 0.00195. This gives P ( R / E ) = 0.9635. But P' (R) = P ( R / E ) by definition of P ' . ~' It may well seem surprising to some that, given our Assumptions in (4), the existence of, say, 1001 T-predictors and 1000 T'-predictors should be epistemically equivalent for our rational degree of belief in T to a scenario in which a single randomly encountered T predictor exists, but this is precisely the case. Thc temptation to think otherwise is based, I suspect, on the fact that if there really were 1001 T-predictors and 1000 T'-predictors prior to the demonstration of El, we would be strongly inclined to deny the principle on which the Assumptions made in (4) werc based - specifically, to deny the principle (%) that P(R) >2> P(3It,~/~R)II - P(I?)]. For the almost even distribution of 7 and T ' - predictors shows that among those predictors who predict some theory that entails E, it is not the case that many more of them are reliable than unreliable - which in tt,rn implies that (%) is probably ti~lse. If we abandon (%) then of course we lose our motivation for (4), but the Assumptions in (4) are the basis for the current result. If we retain (4) then the 1001/I 000 T/T'-predictors result is amazingly unlikely, but nonetheless would establish the probability of 7' to be 0.98.

REFERENCES

Barnes, E.: (forthcoming), 'Thoughts on Maher's Predictivism', fi)rlhcoming in Philosot~hv q/Science.

Brush, S.: 1993, 'Prediction and Fheory Evaluation: Cosmic Microwaves and the Revival of the Big Bang', Perspectives on Science !, 5654;02.

Collins, R.: 1994. "Against the Epistemic Value of Prediction Over Accommodation', Nous 28, 21(g224.

Eel/s, E.: 1990, "Bayesian Problems of Old Evidence', in Savage, W. (ed.), Sciento~ic Fheo- ries, Minnesota Studies in the Philosophy of Science, Vol. XIV, University of Minnesota Press, Minneapolis, pp. 205 223.

ERIC BARNES 89

Howson, C.: 1990, 'Fitting Your Theory to the Facts: Probably Not Such a Bad Idea After All', in Savage, W. (ed.), Scientific Theories, Minnesota Studies in the Philosophy of Science, Vol. XIV, University of Minnesota Press, Minneapolis, pp. 224-244.

Howson, C, and A. Franklin: 1991, 'Maher, Mendeleev and Bayesianism', Philosophy qf Science 58, 574-585.

Kahn, J. A., S, E. Landsberg, and A. C. Stockman: 1992, 'On Novel Confirmation', British Journal for the Philosophy of Science 43, 503--516.

Lipton, P.: 1991, Inference to the Best Explanation, Routledge and Kegan Paul, London and New York.

Maher, P.: 1988, 'Prediction, Accommodation and the Logic of Discovery', PSA 1988, l, 273-285.

Maher, R: 1990, 'How Prediction Enhances Confirmation', in J. M. Dunn and A, Gupta (eds.), Tnah or Consequences: Essays in Honor of Nuel Belnap, Kluwer, Dordrecht, pp. 327-343.

Maher, P.: 1993, "Howson and Franklin on Prediction', Philosophy of Science 60, 329-340. Schlesinger, G.: 1987 'Accommodation and Prediction', Australasian dournah)j'Philosoptlv

65, 33-42. Velikovsky, l.: 1950, ~brlds in Collision, Doubleday, New York. Velikovsky, I.: 1955, Earth in Upheaval, Doubleday, New York.

Manuscript received August 3 l, 1995

Department of Philosophy Southern Methodist University Dallas, TX 75 218 U.S.A.