Definition and intuition

Definition

Conditional Probability If and are events with , then the conditional probability of given , denoted by , is defined as:

Note

It is important to interpret the event appearing after the vertical conditioning bar as the evidence that we have observed or that is being conditioned on: is the probability of given the evidence , not the probability of some entity called , there is no such event as .

Example

For any event , . Upon observing that has occurred, our updated probability for is .

Note

When we calculate conditional probabilities, we are considering what information observing one event provides about another event, not whether one event causes another.

Bayes’ rule and the law of total probability

Theorem

Probability of the intersection of two events For any events and with positive probabilities:

Applying this theorem repeatedly, we can generalize to the intersection of events:

Theorem

Probability of the intersection of events

For any events with :

The commas denote intersections, e.g., is . In fact, this is theorems in one, since we can permute however we want without affecting the left-hand side.

Theorem

Bayes’ Rules

Definition

Odds

The odds of an event are:

We can also convert from odds back to probability:

Theorem

Odds form of Bayes’ rule

For any events and with positive probabilities, the odds of after conditioning on are:

In words, this says that the posterior odds are equal to the prior odds times the factor , which is known in statistics as the likelihood ratio.

Theorem

Law of total probability

Let be a partition of the sample space (i.e., the are disjoint events and their union is ), with for all . Then:

Conditional probabilities are probabilities

Property

When we condition on an event , we update our beliefs to be consistent with this knowledge, effectively putting ourselves in a universe where we know that occurred. Within our new universe, however, the laws of probability operate just as before.

  • Conditional probabilities are between and .
  • If are disjoint, then
  • Inclusion-exclusion:

Note

When we write , it does not mean that is an event and we’re taking its probability; is not an event. Rather, is a probability function which assigns probabilities in accordance with the knowledge that has occurred, and is a different probability function which assigns probabilities without regard for whether has occurred or not. When we take an event and plug it into the function, we’ll get a number, ; when we plug it into the function, we’ll get another number, , which incorporates the information (if any) provided by knowing that occurred.

Conditional probabilities are probabilities, and all probabilities are conditional.

Theorem

Bayes’ rule with extra conditioning

Provided that and , we have:

Theorem

LOTP with extra conditioning

Let be a partition of . Provided that for all , we have:

Strategy

We often want to condition on more than one piece of information, and we now have several ways of doing that. For example, here are some approaches for finding :

  1. We can think of as the single event and use the definition of conditional probability to get:

This is a natural approach if it’s easiest to think about and in tandem. We can then try to evaluate the numerator and denominator. For example, we can use LOTP in both the numerator and the denominator, or we can write the numerator as (which would give us a version of Bayes’ rule) and use LOTP to help with the denominator. 2. We can use Bayes’ rule with extra conditioning on to get:

This is a natural approach if we want to think of everything in our problem as being conditioned on .

Independence of events

Definition

Independence of two events

Events and are independent if:

If and , then this is equivalent to

and also equivalent to .

Note

Note that independence is a symmetric relation: if is independent of , then is independent of .

Note

Independence is completely different from disjointness. If and are disjoint, then , so disjoint events can be independent only if or . Knowing that occurs tells us that definitely did not occur, so clearly conveys information about , meaning the two events are not independent (except if or already has zero probability).

Property

If and are independent, then and are independent, and are independent, and and are independent.

Definition

Independence of three events

Events , , and are said to be independent if all of the following equations hold:

Note

If the first three conditions hold, we say that , , and are pairwise independent. Pairwise independence does not imply independence.

Hint

We can define independence of any number of events similarly. Intuitively, the idea is that knowing what happened with any particular subset of the events gives us no information about what happened with the events not in that subset.

Definition

Independence of many events

For events to be independent, we require any pair to satisfy (for ), any triplet to satisfy (for distinct), and similarly for all quadruplets, quintuplets, and so on. For infinitely many events, we say that they are independent if every finite subset of the events is independent.

Definition

Conditional independence

Events and are said to be conditionally independent given if .

Note

It is easy to make terrible blunders stemming from confusing independence and conditional independence. Two events can be conditionally independent given , but not independent given . Two events can be conditionally independent given , but not independent. Two events can be independent, but not conditionally independent given . In particular, does not imply ; we can’t just insert “given ” everywhere, as we did in going from LOTP to LOTP with extra conditioning. This is because LOTP always holds (it is a consequence of the axioms of probability), whereas may or may not equal , depending on what and are.

Coherency of Bayes’ rule

Property

An important property of Bayes’ rule is that it is coherent: if we receive multiple pieces of information and wish to update our probabilities to incorporate all the information, it does not matter whether we update sequentially, taking each piece of evidence into account one at a time, or simultaneously, using all the evidence at once.

Conditioning as a problem-solving tool

Strategy

Condition on what you wish you knew

when we encounter a problem that would be made easier if only we knew whether happened or not, we can condition on and then on , consider these possibilities separately, then combine them using LOTP.

Strategy

Condition on the first step

In problems with a recursive structure, it can often be useful to condition on the first step of the experiment. which we call first-step analysis.

Pitfalls and paradoxes

Warning

  • confusion of the prior probability with the posterior probability
  • Prosecutor’s fallacy: confusing P with P
  • The defense attorney’s fallacy: failing to condition on all the evidence
  • Simpson’s paradox: The importance of thinking carefully about whether to aggregate data: