STAT 240 - Fall 2025
How much more probable is a baby boy versus a girl?
Does the probability differ between Paris and London?
\[ \begin{array}{|c|c|c|} \hline & \text{Male} & \text{Female} \\ \hline \text{Paris} & 251527 & 241945 \\ \hline \text{London} & 738629 & 698958 \\ \hline \end{array} \]
Fisher’s Exact Test uses a counting system
We need a method that can work with relative frequency
The null and alternate hypotheses propose two separate populations
Effect size is a major assumption of N-P’s method
A method of estimating effect size is Cohen’s \(h\)
\[ h = 2 \arcsin{\sqrt{p_1}} - 2 \arcsin{\sqrt{p_2}} \]
In practice, effect size is a product of the alternate hypothesis
N-P doesn’t see this as a hypothesis worth testing
We will assign a fixed effect size in the form of \(\beta\) later on
Neyman and Pearson would ideally have us use their lemma
The philosophy is that we select our test dynamically
Based on our assumptions and hypotheses
For our purposes, we’ll always be provided the test choice
N-P propose two hypotheses, null (or main) and the alternate
Unlike Fisher’s method, we aren’t intending to eliminate the null
The alternate isn’t ever interesting, but it’s important to have
The assumption is that if we reject the main then the alternate is accepted
Fisher’s method doesn’t care about typed error, it assumes zero replication
N-P care about error because everything is done “a priori”
\(\alpha\) is the probability of type 1 error
\(\beta\) is the probability of type 2 error
We want to minimize both \(\alpha\) and \(\beta\)
\(\beta\) will always be greater than \(\alpha\)
If \(\beta\) needs to be less than alpha, just switch your hypotheses
N-P tended to set \(\alpha = 0.05\) and \(\beta = 0.20\)
\(\alpha\) is the chance of rejecting a true null
\(\beta\) is the chance of maintaining a false null
\(1-\beta\) is the chance of rejecting a false null
This is called “power”
It’s common practice to estimate a sample size to achieve “good power”.
Regardless of method or philosophy used you’ll find this everywhere
It’s required in some cases to be awarded funding from a grant
There are many methods for calculating sample size
We typically stick to computers for this
A candidate formula for Cohen’s \(h\), however, is:
\[ n = \frac{2(z_{1-\alpha/2} + z_{1-\beta})^2}{h^2} \]
Show at \(\alpha = 0.05\), \(\beta = 0.20\), and \(h = 0.30\) that the ideal sample size is roughly 175
How much more probable is a baby boy versus a girl?
\[ \begin{aligned} H_M: p_b - p_g = 0 \\ \\ H_A: p_b - p_g \neq 0 \\ \end{aligned} \]
\[ z = \frac{\hat{p} - 0.5}{\sqrt{0.5(1-0.5)/n}} \]
Note: the true UMP is the binomial test but at these sample sizes it’s equivalent to the \(z\)-test
Sticking with N-P a priori philosophy:
\[ h = 2 \arcsin{\sqrt{0.51}} - 2 \arcsin{\sqrt{0.5}} = 0.02 \]
\[ n = \frac{2(1.96 + 0.84)^2}{0.02^2} = 39200 \]
Since \(n = 2241806\), we should be fine to proceed.
\[ z = \frac{0.4417 - 0.5}{\sqrt{0.5(1-0.5)/2241806}} = -174.5812 \]
\[ \begin{aligned} z_{Paris} = \frac{0.5097 - 0.5}{\sqrt{0.5(1-0.5)/493472}} = 13.629\\ \\ z_{London} = \frac{0.5138 - 0.5}{\sqrt{0.5(1-0.5)/1437587}} = 33.092\\ \end{aligned} \]
Does the probability differ between Paris and London?
\[ \begin{aligned} H_M: p_P - p_L = 0 \\ \\ H_A: p_P - p_L \neq 0 \\ \end{aligned} \]
\[ z = \frac{\hat{p_P} - \hat{p_L}}{\sqrt{\tilde{p}(1-\tilde{p})(n_P^{-1} + n_L^{-1})}} \]
\[ h = 2 \arcsin{\sqrt{0.5097}} - 2 \arcsin{\sqrt{0.5138}} = -0.0082 \]
\[ \tilde{p} = \frac{b_P + b_L}{n_P + n_L} = 0.4417 \]
\[ z = \frac{0.5097 - 0.5138}{\sqrt{0.4417(1-0.4417)(493472^{-1} + 1437587^{-1})}} = -5.004 \]
There is a difference in sex ratio among humans
Pre-clustering it seems like female births are enormously favored
Clustering reveals male births are slightly favored
There is a difference in sex ratio between cities
Londos has a higher ratio
This is why the results shifted so much when combining the data
Cohen’s \(d\):
\[ d = \frac{\bar{x}_1 - \bar{x}_2}{\tilde{s}} \]
\[ \tilde{s} = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 -2} \]
Sample size estimation:
\[ n = \frac{2(z_{1-\alpha/2} + z_{1-\beta})^2}{d^2} \]
Friday is bulk exam review
Next Monday and Wednesday are practice exam/open office hours
Next Friday is the exam, during normal class hours
Clear up outstanding assignments
Individual meetings
General apology tour
TEVALs
Biometrics II? Start deciding on a project now.