STAT 240 - Fall 2025
In Ronald Fisher’s seminal work, The Design of Exerpiments (1935), he described a party he attended where a woman claimed to be able to tell whether the tea or the milk was added first in a cup of tea. To test this claim, Fisher randomly arranged 8 cups of tea, 4 of which had milk poured first and 4 tea first. The woman was left to perform a blind tasting and report her guesses on the order.
Easy to understand, harder to perform than it looks (see: Combinatorics)
Counting is a very useful tool for probability assessment
Small or large counts are easily informed with relative frequency
\[ H_0: \text{The woman cannot distinguish between milk vs. tea first} \]
The null should arise naturally from the chosen test as an exact statistical hypothesis
We refer to it as the null because our goal is to nullify the statement with data
We refine the null to that “exact” sense by counting. Since the woman knows there’s 4 of each she only has to guess 4 correctly:
\[ \begin{array}{|c|c|} \hline \text{Correct Guesses} & \text{Configuartion} \\ \hline 0 & OOOO \\ \hline 1 & OOOX, OOXO, OXOO, XOOO\\ \hline 2 & OOXX, OXOX, OXXO, XOXO, XOOX \\ \hline 3 & OXXX, XOXX, XXOX, XXXO \\ \hline 4 & XXXX \\ \hline \end{array} \]
Since order doesn’t matter (think about why) we can use the choose function:
\[ \begin{array}{|c|c|} \hline \text{Correct Guesses} & \text{# of Combinations} \\ \hline 0 & {4 \choose 0} \times {4 \choose 4} = 1\\ \hline 1 & {4 \choose 1} \times {4 \choose 3} = 16\\ \hline 2 & {4 \choose 2} \times {4 \choose 2} = 32\\ \hline 3 & {4 \choose 3} \times {4 \choose 1} = 16\\ \hline 4 & {4 \choose 4} \times {4 \choose 0} = 1\\ \hline \end{array} \]
We finalize the null by take the relative frequency of each outcome (70 combinations in total):
\[ \begin{array}{|c|c|} \hline \text{Correct Guesses} & \text{Probability} \\ \hline 0 & 1/70 = 0.0143\\ \hline 1 & 16/70 = 0.2286\\ \hline 2 & 36/70 = 0.5142\\ \hline 3 & 16/70 = 0.2286\\ \hline 4 & 1/70 = 0.0143\\ \hline \end{array} \]
We now have everything necessary to perform a very simple and direct hypothesis test.
If she gets 4 correct, then our assumption of reality (the null) doesn’t match the data
Anything besides that is seemingly luck/chance
The \(p\)-value is that probability of occurrence, and it’s exact to the scenario
Fisher’s Exact test
I promised I’d teach card counting
Pull a card from a standard \(52\) card deck
Pull another card, how many are left in the deck?
The r.v. \(X\) is distributed hypergeometric if:
Drawing from \(X\) (realizing \(x\)) has two mutually exclusive results
Probability of outcomes change as each draw occurrs
We can use this to form a counting rule
DISCLAIMER: Don’t do this.
\(X\) is the deck of cards, \(X \sim \text{Hyper}(k,N,K,n)\).
\[ P(X = k) = \frac{{K \choose k}{N-K \choose n-k}}{{N \choose n}} \]
Simplifying the calculation:
Classify three buckets of cards: “Bad”, “Neutral”, “Good”
2 to 6 are bad, 7 to 9 are neutral, and 10 to Ace are good
Bad = 1, Neutral = 0, Good = -1
There’s 20 bad cards, 12 neutral, and 20 good
By adding up the scores (1, 0, or -1) we keep a running total of weighted values
By dividing the summed scores by the total remaining cards, we get a weighted expectation
\(\mathbb{E}\) bad = \(\mathbb{E}\) good \(<\) \(\mathbb{E}\) bad + neutral
As you play, sum the scores and divide by the remaining cards
This is a test statistic that computes a \(p\)-value
Bet higher as the \(p\)-value goes past your risk acceptance, i.e.;
\[ \frac{\text{Score}}{\text{Cards Remaining}} = \frac{7}{25} = 0.28 \]
Fisher’s Exact Test uses the hypergeometric distribution
We’re sampling without replacement and counting results
Know how the function you’re using works so you know its strengths/weaknesses
To set up the exact test we start by building a contingency table
\[ \begin{array}{|c|c|c|} \hline & \text{A} & \text{B} & \text{Row Total} \\ \hline \text{P} & a & b & a+b \\ \hline \text{P}^c & c & d & c+b \\ \hline \text{Column Total} & a+c & b+d & a+b+c+d \\ \hline \end{array} \]
A company is testing a new anti-helminth treatment for fish farms. 30 fish are selected for the experiment, with 12 being administered the treatment and 18 left as controls. Of the treatment group, 6 became infected. Of the control 11 became infected.
\[ \begin{array}{|c|c|c|} \hline & \text{Infected} & \text{Not Infected} & \text{Row Total} \\ \hline \text{Treatment} & 6 & 6 & 12 \\ \hline \text{Control} & 11 & 7 & 18 \\ \hline \text{Column Total} & 17 & 13 & 30 \\ \hline \end{array} \]
Remember, we only have a null.
\[ H_0: \text{No effect of treatment} \]
There’s notational ways to express this, we can just use natural language
Our \(p\)-value will be the probability that the data occurred under the null
\[ \begin{aligned} \frac{{a+b \choose a}{c+d \choose c}}{{n \choose b+d}} = \frac{{6+6 \choose 6}{11+7 \choose 11}}{{30 \choose 6+7}} = \\ \\ \frac{{12 \choose 6}{18 \choose 11}}{{30 \choose 13}} = 0.24554 \end{aligned} \]
The \(p\)-value is actually the complement so \(1-0.24554\).
We would say “the results are statistically insignificant”
“Be wise, generalize!”
\[ OR = \frac{a / b}{c / d} = \frac{6 / 6}{11 / 7} = \frac{1}{1.5714} = 0.6364 \]
Odds ratios describe the strength of association between two events
In this case “fish exposed to the treatment are 0.64 times as likely to be infected compared to non-treated”
Anything can have a confidence interval
\[ CI(OR) = \text{exp} \{ \ln{OR} \pm \sqrt{1/a + 1/b + 1/c + 1/d} \} \]
\(\text{exp} \{ x \}\) is another way of writing \(e^x\)
The rule of thumb is that an OR crossing \(0\) is insignificant
The OR interval from our example was \((0.115,3.541)\)