Definition and intuition
Definition
Conditional Probability If and are events with , then the conditional probability of given , denoted by , is defined as:
Note
It is important to interpret the event appearing after the vertical conditioning bar as the evidence that we have observed or that is being conditioned on: is the probability of given the evidence , not the probability of some entity called , there is no such event as .
Example
For any event , . Upon observing that has occurred, our updated probability for is .
Note
When we calculate conditional probabilities, we are considering what information observing one event provides about another event, not whether one event causes another.
Bayes’ rule and the law of total probability
Theorem
Probability of the intersection of two events For any events and with positive probabilities:
Applying this theorem repeatedly, we can generalize to the intersection of events:
Theorem
Probability of the intersection of events
For any events with :
The commas denote intersections, e.g., is . In fact, this is theorems in one, since we can permute however we want without affecting the left-hand side.
Theorem
Bayes’ Rules
Definition
Odds
The odds of an event are:
We can also convert from odds back to probability:
Theorem
Odds form of Bayes’ rule
For any events and with positive probabilities, the odds of after conditioning on are:
In words, this says that the posterior odds are equal to the prior odds times the factor , which is known in statistics as the likelihood ratio.
Theorem
Law of total probability
Let be a partition of the sample space (i.e., the are disjoint events and their union is ), with for all . Then:
Conditional probabilities are probabilities
Property
When we condition on an event , we update our beliefs to be consistent with this knowledge, effectively putting ourselves in a universe where we know that occurred. Within our new universe, however, the laws of probability operate just as before.
- Conditional probabilities are between and .
- If are disjoint, then
- Inclusion-exclusion:
Note
When we write , it does not mean that is an event and we’re taking its probability; is not an event. Rather, is a probability function which assigns probabilities in accordance with the knowledge that has occurred, and is a different probability function which assigns probabilities without regard for whether has occurred or not. When we take an event and plug it into the function, we’ll get a number, ; when we plug it into the function, we’ll get another number, , which incorporates the information (if any) provided by knowing that occurred.
Conditional probabilities are probabilities, and all probabilities are conditional.
Theorem
Bayes’ rule with extra conditioning
Provided that and , we have:
Theorem
LOTP with extra conditioning
Let be a partition of . Provided that for all , we have:
Strategy
We often want to condition on more than one piece of information, and we now have several ways of doing that. For example, here are some approaches for finding :
- We can think of as the single event and use the definition of conditional probability to get:
This is a natural approach if it’s easiest to think about and in tandem. We can then try to evaluate the numerator and denominator. For example, we can use LOTP in both the numerator and the denominator, or we can write the numerator as (which would give us a version of Bayes’ rule) and use LOTP to help with the denominator. 2. We can use Bayes’ rule with extra conditioning on to get:
This is a natural approach if we want to think of everything in our problem as being conditioned on .
Independence of events
Definition
Independence of two events
Events and are independent if:
If and , then this is equivalent to
and also equivalent to .
Note
Note that independence is a symmetric relation: if is independent of , then is independent of .
Note
Independence is completely different from disjointness. If and are disjoint, then , so disjoint events can be independent only if or . Knowing that occurs tells us that definitely did not occur, so clearly conveys information about , meaning the two events are not independent (except if or already has zero probability).
Property
If and are independent, then and are independent, and are independent, and and are independent.
Definition
Independence of three events
Events , , and are said to be independent if all of the following equations hold:
Note
If the first three conditions hold, we say that , , and are pairwise independent. Pairwise independence does not imply independence.
Hint
We can define independence of any number of events similarly. Intuitively, the idea is that knowing what happened with any particular subset of the events gives us no information about what happened with the events not in that subset.
Definition
Independence of many events
For events to be independent, we require any pair to satisfy (for ), any triplet to satisfy (for distinct), and similarly for all quadruplets, quintuplets, and so on. For infinitely many events, we say that they are independent if every finite subset of the events is independent.
Definition
Conditional independence
Events and are said to be conditionally independent given if .
Note
It is easy to make terrible blunders stemming from confusing independence and conditional independence. Two events can be conditionally independent given , but not independent given . Two events can be conditionally independent given , but not independent. Two events can be independent, but not conditionally independent given . In particular, does not imply ; we can’t just insert “given ” everywhere, as we did in going from LOTP to LOTP with extra conditioning. This is because LOTP always holds (it is a consequence of the axioms of probability), whereas may or may not equal , depending on what and are.
Coherency of Bayes’ rule
Property
An important property of Bayes’ rule is that it is coherent: if we receive multiple pieces of information and wish to update our probabilities to incorporate all the information, it does not matter whether we update sequentially, taking each piece of evidence into account one at a time, or simultaneously, using all the evidence at once.
Conditioning as a problem-solving tool
Strategy
Condition on what you wish you knew
when we encounter a problem that would be made easier if only we knew whether happened or not, we can condition on and then on , consider these possibilities separately, then combine them using LOTP.
Strategy
Condition on the first step
In problems with a recursive structure, it can often be useful to condition on the first step of the experiment. which we call first-step analysis.
Pitfalls and paradoxes
Warning
- confusion of the prior probability with the posterior probability
- Prosecutor’s fallacy: confusing P with P
- The defense attorney’s fallacy: failing to condition on all the evidence
- Simpson’s paradox: The importance of thinking carefully about whether to aggregate data: