Day 12
Review
Why do we study probability in statistics?
Probability
A number between \(0\) and \(1\) that tells us how likely a given “event” is to occur
Probability equal to \(0\) means the event cannot occur
- \(P(x)=0\)
Probability equal to \(1\) means the event must occur
- \(P(x)=1\)
Probability equal to \(1/2\) means the event is as likely to occur as it is to not occur
- \(P(x)=0.5\)
Probability close to 0 (but not equal to 0) means the event is very unlikely to occur
- The event may still occur, but we’d tend to be surprised if it did
Probability close to 1 (but not equal to 1) means the event is very likely to occur
- The event may not occur, but we’d tend to be surprised if it didn’t
Terminology
Experiment (in context of probability):
- An activity that results in a definite outcome where the observed outcome is determined by chance
Sample space:
- The set of ALL possible outcomes of an experiment; denoted by \(S\)
- Flip a coin once
\[S = \{H, T\}\]
- Randomly select a person and then determine blood type
\[S = \{A, B, AB, O\}\]
*Event**
A subset of outcomes belonging to sample space \(S\)
A capital letter towards the beginning of the alphabet is used to denote an event
- i.e. \(A\), \(B\), \(C\), etc.
- Suppose we flip a coin twice
\[S = \{HH, HT, TH, TT\}\]
- Let \(A\) be the event we observe at least one tails
\[A = \{HT, TH, TT\}\]
- Let \(B\) be the event we observe at most one tails
\[B = \{HH, HT, TH\}\]
Simple event: An event containing a single outcome in the sample space \(S\)
\[S = \{HH, HT, TH, TT\}\]
\(A = \text{we observe two heads} = \{HH\}\)
- Simple event
Compound event: An event formed by combining two or more events (thereby containing two or more outcomes in the sample space \(S\))
\[S = \{HH, HT, TH, TT\}\]
\(B = \text{we observe a head in the first or in the second flip} = \{HT, TH, HH\}\)
- Compound event
Probability Methods
Subjective Probability
- Probability is assigned based on judgement or experience
- i.e. expert opinion, personal experience, “vibe math”
Classical Probability
- Make some assumptions in order to build a mathematical model from which we can derive probabilities
- It’s not vibe math but it can definitely feel like it
\[P(A) = \frac{\text{number of outcomes in event } A}{\text{total number of outcomes in } S}\]
Relative or Empirical Probability
- Think of the probability of an event as the proportion of times that the event occurs
\[P(x) \approx \frac{\text{number of times x is observed}}{\text{number of samples}}\]
Law of Large Numbers
- As the size of our sample (i.e., number of experiments) gets larger and larger:
- The relative frequency of the event of our interest gets closer and closer to the true probability
Questions?
**Goals for Today:
Introduce simple probability set theory
Define fundamental rules and laws for probability mathematics
Probability
Basic Concepts of Probability II
You decide you want to go to Dirty Dawgs this Friday to try and pretend you don’t have midterms next week. You’ve heard that a decent number of your friends who’ve gone in the past 3 weeks have ended up sick with some kind of Flu/Covid. This doesn’t stop you.
How can you measure the likelihood that you end up sick as well, and what does that measurement tell us about the possible outcomes of your night out?
Probability model: Assigns a probability to each possible event constructed from the simple events in a particular sample space describing a particular experiment
For a finite sample space with \(n\) simple events, e.g. \(S = \{E_1, E_2, \dots, E_n\}\):
- The probability model assigns a number \(p_i\) to event \(E_i\) where \(P(E_i) = p_i\) so that:
\[0 \leq p_i \leq 1\]
\[p_1 + p_2 + \dots + p_n = 1 \ \ \text{(as a consequence, }P(S) = 1\text{)}\]
For an equally-likely probability model, the probability of observing \(E_i\) is:
\[P(E_i) = p_i = \frac{1}{n}\]
If \(A\) is an event in an equally-likely sample space \(S\) and contains \(k\) outcomes, then:
\[P(A) = \frac{\text{No. of outcomes in } A}{\text{No. of outcomes in } S} = \frac{k}{n}\]
You take a count of your friends and their friends who’ve been to Dawgs in the past 3 weeks and whether they were sick the next day. You list anyone who wasn’t sick as “Not Sick”, anyone who was briefly sick the next day as “Maybe Sick”, anyone who was sick for three or more days after as “Likely Sick”, and anyone who ended up at LaFene getting antivirals/antibiotics as “Definitely Sick”.
\[ \begin{array}{|c|c|c|c|c|c|} \hline \textbf{Status} & \textbf{Not Sick} & \textbf{Maybe Sick} & \textbf{Likely Sick} & \textbf{Definitely Sick} & \textbf{Total}\\ \hline \text{Count} & 7 & 13 & 20 & 10 & 50 \\ \hline \end{array} \]
You run an experiment with this data of selecting one individual and figuring out what their “Sick Status” was after going to Dirty Dawgs
- You make a random selection so that everyone has the same chance of being selected
- What kind of sample is this?
The sample space \(S\) is the set of all 50 individuals
- The probability model for this experiment is given as follows:
\[ \begin{array}{|c|c|c|c|c|c|} \hline \textbf{Status} & \textbf{Not Sick} & \textbf{Maybe Sick} & \textbf{Likely Sick} & \textbf{Definitely Sick} & \textbf{Total}\\ \hline \text{Count} & 0.14 & 0.26 & 0.4 & 0.20 & 1.00 \\ \hline \end{array} \]
If \(A\) is an event in \(S\), then the event where \(A\) does not occur is called the complement of \(A\)
Denote the complement of \(A\) by \(A^c\) – read this as “A-complement”
Complement Rule
\[P(A^c) = 1 - P(A) \quad \text{or} \quad P(A) = 1 - P(A^c)\]
This rule is useful when \(P(A)\) is difficult to calculate but \(P(A^c)\) is easy (or vice versa)
Suppose we roll a fair 6-sided die twice, then \(S\) contains 36 equally-likely outcomes in the form of 36 ordered pairs, i.e. \((1, 1), (1, 2), \dots, (6, 5), (6, 6)\)
Let \(A\) be “roll doubles”
- Then \(P(A) = \frac{6}{36} = \frac{1}{6}\)
- \(A^c\) is the event we “do not roll doubles”, and:
\[P(A^c) = 1 - P(A) = 1 - \frac{1}{6} = \frac{5}{6}\]
We could have counted the number of non-doubles in \(S\), but this requires more effort
This rule can also let us circumvent excess arithmetic
What’s the probability that you won’t end up “Definitely Sick”?
\[ \begin{array}{|c|c|c|c|c|c|} \hline \textbf{Status} & \textbf{Not Sick} & \textbf{Maybe Sick} & \textbf{Likely Sick} & \textbf{Definitely Sick} & \textbf{Total}\\ \hline \text{Count} & 0.14 & 0.26 & 0.4 & 0.20 & 1.00 \\ \hline \end{array} \]
\[0.14 + 0.26 + 0.40 = 0.80\]
\[1.00 - 0.20 = 0.80\]
Unions and Intersections
The union of two events \(A\) and \(B\), denoted \(A \cup B\), are all outcomes that belong to \(A\), \(B\), or both
- Saying \(A \cup B\) is equivalent to saying “A or B”
The intersection of two events \(A\) and \(B\), denoted \(A \cap B\), are all outcomes that belong to both \(A\) and \(B\)
- Saying \(A \cap B\) is equivalent to saying “A and B”
In rolling a die once, consider events \(A\) and \(B\):
\(A\): Roll an even number: \(\{2, 4, 6\}\)
\(B\): Roll a number greater than 4: \(\{5, 6\}\)
\[A \cup B = A \text{ or } B = \{2, 4, 5, 6\}\]
\[A \cap B = A \text{ and } B = \{6\}\]
You’re not certain that Dirty Dawgs is the cause of the issue. You decide to double check by expanding you sample with 150 individuals who spent most of their time at other bars.
\[ \begin{array}{|c|c|c|c|c|c|} \hline \textbf{} & \textbf{Not Sick} & \textbf{Maybe Sick} & \textbf{Likely Sick} & \textbf{Definitely Sick} & \textbf{Total}\\ \hline \text{Dawgs} & 7 & 13 & 20 & 10 & 50 \\ \hline \text{Yard Bar} & 17 & 20 & 7 & 13 & 54 \\ \hline \text{Kaw's} & 7 & 6 & 13 & 9 & 35 \\ \hline \text{Tubby's} & 0 & 19 & 20 & 19 & 58 \\ \hline \text{Total} & 31 & 58 & 60 & 51 & 200 \\ \hline \end{array} \]
You select an individual at random
- What’s the probability that person went to Yard Bar?
- What’s the probability that person went to Dirty Dawgs or Yard Bar?
- Whats the probability that person went to Tubby’s and was definitely sick?
Mutual Exclusivity
Two events \(A\) and \(B\) are mutually exclusive if they do not share any common outcomes
Roll a die:
\(A\): Roll a 1 or a 2: \(\{1, 2\}\)
\(B\): Roll an even number: \(\{2, 4, 6\}\)
\(C\): Roll a 3, 4, or 5: \(\{3, 4, 5\}\)
Events \(A\) and \(C\) are mutually exclusive:
A \(1\) was rolled
- Thus, none of the events in \(C\) could have occurred
Events \(A\) and \(B\) are not mutually exclusive:
\[A \text{ and } B = \{2\}\]
Addition rule for mutually exclusive events
Typically:
\[P(A\cup B)=P(A)+P(B)-P(A\cap B)\]
Given:
\[A \text{ and } B \Rightarrow \text{ Mutually Exclusive}\]
\[P(A \text{ or } B) = P(A) + P(B)\]
\[P(A \cup B) = P(A) + P(B)\]
\[ \begin{array}{|c|c|c|c|c|c|} \hline \textbf{} & \textbf{Not Sick} & \textbf{Maybe Sick} & \textbf{Likely Sick} & \textbf{Definitely Sick} & \textbf{Total}\\ \hline \text{Dawgs} & 7 & 13 & 20 & 10 & 50 \\ \hline \text{Yard Bar} & 17 & 20 & 7 & 13 & 54 \\ \hline \text{Kaw's} & 7 & 6 & 13 & 9 & 35 \\ \hline \text{Tubby's} & 0 & 19 & 20 & 19 & 58 \\ \hline \text{Total} & 31 & 58 & 60 & 51 & 200 \\ \hline \end{array} \]
What events in this sample space could be considered mutually exclusive?
Let \(A = \{\text{Someone went to Tubby's and didn't get sick}\}\)
Let \(B = \{\text{Someone went to Kaw's}\}\)
Find \(P(A\cup B)\)
Conditional Probability
A conditional probability of an event is a probability obtained with the additional information that some other event has already occurred
\(P(A|B)\) denotes the conditional probability of event \(A\) given that event \(B\) has already occurred
\[P(A|B) = \frac{P(A \text{ and } B)}{P(B)}\]
\[P(A|B) = \frac{P(A \cap B)}{P(B)}\]
Similarly:
\[P(B|A) = \frac{P(A \text{ and } B)}{P(A)}\]
\[P(B|A) = \frac{P(A \cap B)}{P(A)}\]
An economist predicts a 60% chance that stock \(A\) will perform poorly and a 25% chance that stock \(B\) will perform poorly. There is also a 16% chance that both stocks will perform poorly
What is the probability that stock \(A\) performs poorly given that stock \(B\) performs poorly?
\[ \begin{array}{|c|c|c|c|c|c|} \hline \textbf{} & \textbf{Not Sick} & \textbf{Maybe Sick} & \textbf{Likely Sick} & \textbf{Definitely Sick} & \textbf{Total}\\ \hline \text{Dawgs} & 7 & 13 & 20 & 10 & 50 \\ \hline \text{Yard Bar} & 17 & 20 & 7 & 13 & 54 \\ \hline \text{Kaw's} & 7 & 6 & 13 & 9 & 35 \\ \hline \text{Tubby's} & 0 & 19 & 20 & 19 & 58 \\ \hline \text{Total} & 31 & 58 & 60 & 51 & 200 \\ \hline \end{array} \]
Select an individual at random
- What’s the probability that someone went to Dirty Dawgs?
- What’s the probability that someone went to Dirty Dawgs and was “Likely Sick”?
- What’s the probability that someone is “Likely Sick” given that they went to Dirty Dawgs?
- What’s the probability that someone went to Dirty Dawgs, given that they’re “Likely Sick”?
Multiplication Rule
Using the definition of conditional probability:
\[ P(A|B) = \frac{P(A \cap B)}{P(B)}, \]
We can do some simple algebra and find ourselves at the multiplication rule:
\[P(A \cap B) = P(A|B) P(B)\]
Independence
Events \(A\) and \(B\) are independent if the outcome of \(A\) does not affect the outcome of \(B\) and vice versa
In terms of conditional probability:
- The probability of \(A\) does not change given \(B\) happened and vice versa
That is, \(A\) and \(B\) are independent if one of the following is true:
\[P(A|B) = P(A)\]
\[P(B|A) = P(B)\]
\[P(A \cap B) = P(A)P(B)\]
(You can show that all three statements are equivalent)
Multiplication Rule for Independent Events
Given:
\[A \text{ and } B \Rightarrow \text{ Independent}\]
\[P(A \cap B) = P(A)P(B)\]
Suppose we roll a fair die twice. What is the probability that the first roll is a 1 and the second roll is a 6?
\[ \begin{array}{|c|c|c|c|c|c|} \hline \textbf{} & \textbf{Not Sick} & \textbf{Maybe Sick} & \textbf{Likely Sick} & \textbf{Definitely Sick} & \textbf{Total}\\ \hline \text{Dawgs} & 7 & 13 & 20 & 10 & 50 \\ \hline \text{Yard Bar} & 17 & 20 & 7 & 13 & 54 \\ \hline \text{Kaw's} & 7 & 6 & 13 & 9 & 35 \\ \hline \text{Tubby's} & 0 & 19 & 20 & 19 & 58 \\ \hline \text{Total} & 31 & 58 & 60 & 51 & 200 \\ \hline \end{array} \]
Suppose that every individual in this sample spent their entire night out at the bar they’re associated with in the data set, and had no interaction with one another
Select two individuals at random
Let \(A=\{\text{Individual 1 went to Tubby's and was some form of sick}\}\)
Let \(B=\{\text{Individual 2 went to Yard Bar and was Definitely Sick}\}\)
Find \(P(B|A)\)
Does this make sense within our example? Why / Why not?
Does this make sense in real life? Why / Why not?