Conditional Probabilities and the Probabilities of Conditionals

By | June 16, 2006

A conditional probability is represented P(A | B), read “the probability of A given B”, and is (by definition) equivalent to P(A & B) / P(B). To see how this works, consider a simple example. Let’s take a fair die and roll it. What is the probability that we will role and even number? Clearly, it’s 1/2. And what is the probability that we will role a 6? Clearly, it’s 1/6. Now, what is the probability that we’ll roll a 6 given that we roll an even number? In other words, what is P(roll a 6 | roll an even number)? By definition of conditional probability this is equivalent to P(roll a 6 and roll an even number) / P(roll an even number). This equals (1/6)/(1/2), which equals 1/3.

Some philosophers (including myself a few years back) have thought that conditional probabilities could be used to represent the probabilities of conditionals. This view is commonly referred to as Stalnaker’s hypothesis (SH), after Robert Stalnaker, a prominent philosopher who explicitly proposed the idea. In other words,

(SH) P(If p then q) = P(q | p).

Unfortunately, while Stalnaker’s hypothesis is prima facie plausible, it is demonstrably false, as famously shown by David Lewis (an achievement that went a long way toward establishing Lewis’s reputation as one of the smartest philosophers on the planet until his death in 2001). I won’t try to run through Lewis’s proof here, but suffice to say that it is now universally agreed upon by scholars working on conditionals that Stalnaker’s hypothesis fails.

But if the probabilities of conditionals cannot be equated with conditional probabilities, what can they be equated with? Intuitively, it seems like the notion of the probability of a conditional ought to make sense and that there should, in principle at least, be some way of estimating a value for large classes of conditionals, if not all. I think this is right, though I should mention that not everyone agrees. Philosopher Ernest Adams has famously defended the view that there are no probabilities of conditionals. In other words, he holds that there is nothing that can be called the value of P(If p then q). He goes on to propose that the conditional probability P(q | p) measures the assertibility (but not the probability) of the conditional if p then q.

I think Adams is wrong and that there’s a very straightforward way of thinking about the probabilities of conditionals. Consider if p then q. If p entails q, then it seems obvious that P(If p then q) ought to equal one. Similarly, if p and q are mutually incompatible (p entails ~q), then it seems obvious that P(If p then q) ought to equal zero. But if those two cases clearly have probabilities, then it’s hard to see why cases in which p is compatible with but does not entail q should not have probabilities. Suppose we think of this like an argument with p as a premise and q as the conclusion (a natural model because every argument can be written as a conditional). Since p does not entail q, we have an enthymeme:

p
(unstated premise)
∴ q

Now, what proposition do we need for an unstated premise to make this argument valid? Let’s call this enthymematic premise the deductive complement of p in relation to q and represent it by X. X’s job is to supply any information in q that is not already in p, such that (p+X) entails q. My proposal, then, is this (I’ll call it Rhoda’s hypothesis, RH):

(RH) (a) If p entails q then P(If p then q)=1. (b) If p entails ~q, then P(If p then q)=0. (c) If p entails neither q nor ~q, then P(If p then q) = P(X), where X is whatever information is contained in q that is not already contained in p.

The notion of information ‘contained in’ a proposition may be explicated via the notion of logical entailment. Propositions p and q contain the same information if and only if they have exactly the same entailments. If p entails q, then anything entailed by q is also entailed by p. X is equivalent to the conjunction of all propositions entailed by q that are not entailed by p.

In some cases, X will be logically equivalent to ‘If p then q’, but it will never be logically stronger than that, and often it will be logically weaker. I’ll leave it as an exercise for the reader to explain why.

7 thoughts on “Conditional Probabilities and the Probabilities of Conditionals

  1. Richard

    I would’ve thought that if p and q are mutually incompatible then P(If p then q) should equal P(~p), rather than zero. (After all, p&q is just one way for the conditional to come true. ~p is another.) Unless those aren’t material conditionals you’re talking about? But then what are their truth conditions?

    Reply
  2. Alan Rhoda

    Hi Richard,

    You’re right in suspecting that I’m not thinking of material conditionals. I would adopt something along the lines of a relevance logic approach since I think most ordinary language conditionals presuppose that the premises are relevant to the conclusion in virtue of their content and not just in virtue of their truth values.

    Every analysis of conditionals, however, agrees that a true antecedent and a false consequent suffices to render the conditional unequivocally false. That’s part of reason why I think P(If p then q), where p and q are incompatible, should equal zero. After all, if they are incompatible, then p entails ~q, not q. So P(If p then ~q) equals one. Given that and conditional excluded middle (which I accept), P(If p then q) has to equal zero, unless somehow P(If p then (q v ~q)) could sum to more than one.

    Reply
  3. oudeisoudamou

    Dear Alan,

    Interesting hypothesis. Could I get you to illustrate how RH (c) will work in some concrete examples?

    First of all, do you accept the equivalence : prob[if a, then b]= [if a, then prob b]?

    Suppose, for example, I make the prediction β€œvery probably if in Phoenix the overnight min temp > 83 F, the next day’s high > 110 F.” This on the basis of my long-time personal experience with summer weather here, but without consulting any official weather data. And my prediction turns out to be correct except for a very cases, after we check the weather records for the last 50 years.

    Or suppose I’m doing stylometrics and I argue β€œ very probably if Plato wrote the VII Epistles, then he also wrote the II Epistle.”

    I’m curious to see how you’d construct X in these and assign the prob X.

    { I may have inadvertently sent you a first draft of this comment. Please delete it. }

    Reply
  4. Mike Theemling

    Have to mull this over further (at last a subject that I have SOME knowhow about) but you are assuming that we are dealing with Classical prob and stats here (e.g. the sum of all prob = 1).

    In Bayesian statistics, it is perfectly permissible to have the sum of all prob be anything else. less than one, greater than 1 (sometimes infinity), maybe negative (not sure about this one).

    What’s strange however, is that in a way Bayesian statistics would probably be a better fit into philosophy than classical stats. Wonder if Alan considered investigating that.

    Reply
  5. Alan Rhoda

    Mike,

    Thanks for your comment. I know you know a lot more about mathematical statistics than I do, but it surprises me that you’d say that Bayesian statistics allow the sum of all probabilities to be different than one. I’ve read philosophical discussions of Bayesianism and have been led to believe that Bayesians accept the standard (Kolmogorov) axioms of probability theory. The second axiom requires that all probabilities sum to one, so if (some) Bayesians reject that, might we not conclude that they are no longer dealing with the concept of ‘probability’ but perhaps some analogically related concept?

    Reply
  6. Alan Rhoda

    Phil,

    Good questions. I’m going to try to answer them in a follow-up blog post that I hope to get to tomorrow (6/20).

    Reply
  7. Mike Theemling

    Alan,

    In Bayesian statistics, your Priori distribution may very well be a non-standard pdf.

    Common ones are f(x) = k where k is some constant with the domain as thw entire Real line. That means of course that the area under the curve is inf, which is of course “not allowed” in classic statitistics.

    Still, despite this, Bayesians claim that using this approach “works” and even most classical statisticians don’t entirely dismiss it

    How this relates to the philosophical disucssion, essentially the crux of Bayesian statistics is to use “past information” in order to determine the probability of a future event. In effect, you are saying “Because I know p, this would imply q” or something along those lines.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *