Day 11

We finally made it to statistics y’all


Adressing Changes

  • Moving forward, I’ll be providing more regular practice problems beyond your homework and QOTD

    • The segments that we’re moving into aren’t generally intuitive, don’t trick yourself
  • Statistical theory is best shown simulations and pictures

    • I’ll be leaning on the blackboard more
  • Some of these concepts will get fairly abstract

    • Don’t hesitate to ask questions BUT

    • We actually have to limit questions more as we progress

    • Attend office hours, email me, go to the help lab




Basic Concepts of Probability I

Why do we study probability in statistics?






Probability: A number between \(0\) and \(1\) that tells us how likely a given “event” is to occur



Probability equal to \(0\) means the event cannot occur


\[P(x)=0\]


Probability equal to \(1\) means the event must occur


\[P(x)=1\]


Probability equal to \(1/2\) means the event is as likely to occur as it is to not occur


\[P(x)=0.5\]



Probability close to 0 (but not equal to 0) means the event is very unlikely to occur


  • The event may still occur, but we’d tend to be surprised if it did

    • \(x=\{\text{A shark attack on a beach in the U.S.}\}\)

      • \(P(x) \approx 0.00000008\)

      • \(P(x) \ne 0\)


Probability close to 1 (but not equal to 1) means the event is very likely to occur


  • The event may not occur, but we’d tend to be surprised if it didn’t

    • \(x=\{ \text{You lose the national lottery you bought a ticket for} \}\)

      • \(P(x) \approx 0.999999997\)

      • \(P(x) \ne 1\)

        • Do not gamble




Probability Terminology

To study probability formally, we need some basic terminology



Experiment (in context of probability):

  • An activity that results in a definite outcome where the observed outcome is determined by chance


Sample space:

  • The set of ALL possible outcomes of an experiment; denoted by \(S\)



  1. Flip a coin once


\[S = \{H, T\}\]


  1. Randomly select a person and then determine blood type


\[S = \{A, B, AB, O\}\]



Event

  • A subset of outcomes belonging to sample space \(S\)

  • A capital letter towards the beginning of the alphabet is used to denote an event

    • i.e. \(A\), \(B\), \(C\), etc.



  1. Suppose we flip a coin twice


\[S = \{HH, HT, TH, TT\}\]


  1. Let \(A\) be the event we observe at least one tails


\[A = \{HT, TH, TT\}\]


  1. Let \(B\) be the event we observe at most one tails


\[B = \{HH, HT, TH\}\]


Simple event: An event containing a single outcome in the sample space \(S\)


\[S = \{HH, HT, TH, TT\}\]


\(A = \text{we observe two heads} = \{HH\}\)


  • Simple event


Compound event: An event formed by combining two or more events (thereby containing two or more outcomes in the sample space \(S\))


\[S = \{HH, HT, TH, TT\}\]


\(B = \text{we observe a head in the first or in the second flip} = \{HT, TH, HH\}\)


  • Compound event




Probability Methods

There are 3 general views of probability


  • Consider these as methods of assigning probabilities to events




Subjective Probability

Probability is assigned based on judgement or experience

  • i.e. expert opinion, personal experience, “vibe math


  • A doctor assessing the chance of a patient recovering from a medical procedure

  • A managerial team estimating the probability a project will achieve technical success


This probability may not be expressed in an actual number; instead, we may say “low”, “high”, “almost certain”, etc.




Classical Probability

Make some assumptions in order to build a mathematical model from which we can derive probabilities


  • It’s not vibe math but it can definitely feel like it


Suppose we want to put a probability on the event of observing “tails” in one flip of a coin. We might assume the following:


  • 2 possible outcomes: “heads” or “tails”

  • The coin is “fair”

    • i.e. heads & tails have an equal chance of occurring


Based on our model:


\[\text{the probability of observing tails} = P(\text{tails}) = {1\over 2}\]


Note: This is an example of an equally-likely probability model (i.e., all possible outcomes are equally likely to occur) where:

\[P(A) = \frac{\text{number of outcomes in event } A}{\text{total number of outcomes in } S}.\]

  • Here \(P(A)\) denotes the probability of the event \(A\)




Relative or Empirical Probability

Think of the probability of an event as the proportion of times that the event occurs


Flipping a tack:


\[P(\text{point up}) = \text{the proportion of all possible flips of the tack where it lands "point up"}\]


  • We could flip our tack a large number of times (let’s say \(1000\)) and count the number of times it lands point up

  • This is like a simple random sample (SRS) from the population of all tack flips


\[P(\text{point up}) \approx \frac{\text{number of times "point up" is observed}}{1000}\]



Law of Large Numbers

As the size of our sample (i.e., number of experiments) gets larger and larger:

  • The relative frequency of the event of our interest gets closer and closer to the true probability

  • What does this mean to us?

    • You can get as close as needed to the true probability by taking a large enough sample

    • It is also used to justify the relative frequency view of probability

    • This is how some statisticians evaluate new statistical procedures; they simulate many datasets and observe the proportion of time the procedure “works”




Assume that a fair die is rolled (i.e., all outcomes are equally-likely)


  1. What is the sample space?



  1. What’s the probability of rolling a \(5\)?



  1. What’s the probability of rolling an even number?



  1. What’s the probability of rolling a number less than \(3\)?



An automobile insurance company divides customers into three categories: good risks, medium risks, and poor risks. Assume that of a total of \(11,217\) customers, \(7792\) are good risks, \(2478\) are medium risks, and \(947\) are poor risks. As part of an audit, one customer is chosen at random.


  1. What’s the probability that the customer is a good risk?


  1. What’s the probability that the customer is not a poor risk?





Law of Large Numbers Visualized

Say you flip a coin twice:

  1. What is the sample space?


  1. What is the probability, given the sample space, of having at least one heads?


  1. What is the probability of at least one heads if you flip a coin three times?
  • Can you reasonably prove it?


  1. Let’s actually flip a coin “a few times”





Attendance QOTD


Go away