Truth Tables and Bayes
I want to share a motivation for defining conditional probabilities which involves some elementary logic and no measure theory.
Binary Logic
Start by considering logical formulas and their truth values:
- Starting from two formulas \(A, B\), we can form more complex formulas such as \(A \rightarrow B\) and \(A \land B\).
- Denote the set of all possible formulas by \(\mathcal{F}.\) This set can be constructed from a set of atomic propositions \(\{ A_1, \dots, A_n \}\) by combining them - possibly more than once - as described in the previous point.
- Binary truth values can be modeled by a truth function \[t: \mathcal{F} \rightarrow \{ 0, 1 \}\] which can only take the values \(0\) and \(1\) for a given formula.
- The truth function of a formula can be computed from the constituent parts of this formula, e.g. \(t(A \land B) = t(A) t(B)\) or \(t(A \rightarrow B) = t(B) + (t(A)-1)(t(B)-1)\).
- One particular identity is the following: \[t(A \rightarrow B) t(A) = t(A \land B). \tag{1}\]
- A proof of this identity is the following truth table:

-
Proof of Equation 1 via a truth table
- Note that by proving Equation 1 we have also proved the equivalence \[A \land (A \rightarrow B) \iff A \land B\].
Probability Logic
Let’s see what happens, if we assign probability values to formulas which take real number values in the interval \([0, 1]\):
- Denote the generalization of the truth function \(t: \mathcal{F} \rightarrow \{0, 1\}\) by \(p: \mathcal{F} \rightarrow [0, 1]\), where \(\mathcal{F}\) denotes all valid formulas.
- We assign non-binary probability values to \(A \rightarrow B\) and \(A \land B\), which we denote by \(p(B \mid A)\) and \(p(A \cap B)\). Here we adhere to the convention that when dealing with probabilities, we write \(B \mid A\) for \(A \rightarrow B\) and \(A \cap B\) for \(A \land B\).
- The obvious extension of Equation 1 is just \[p(B \mid A) p(A) = p(A \cap B), \tag{2}\] which is one of several equivalent ways of writing Bayes’ rule.
- By rewriting Equation 2, we obtain the usual definition of conditional probabilities: \[p(B \mid A) = \frac{p(A \cap B)}{p(A)}. \tag{3}\]
- Note that Equation 2 is the only way to relate \(p(A), p(B | A), p(A \cap B)\) if we want this relation to be linear in each term and at the same time consistent with binary truth values.
- By being consistent with binary truth values, I just mean that if we restrict \(p(A), p(B | A), p(A \cap B)\) to binary values from \(\{ 0, 1 \}\), then we want to recover Equation 1.
- If we don’t ask for linearity we can also take each term to some positive power such as \(p(A)^3p(B | A) = p(A \cap B)^8\) and we will still fulfill consistency with Equation 1.
Final Thoughts
We have seen that Bayes’ rule can be seen as the linear extension of a formula from propositional logic.
Cox’s theorem goes to great lengths to prove the rules of conditional probability from assumptions which seem a bit involved to me. I think the proposed consideration can offer - at least in part - an alternative to Cox’s theorem for motivating conditional probabilities that is easy to understand.