Hypothesis Testing IV

STAT 240 - Fall 2025

Robert Sholl

Neyman-Pearson Hypothesis Testing

Human Sex Ratios

  1. How much more probable is a baby boy versus a girl?

  2. Does the probability differ between Paris and London?

\[ \begin{array}{|c|c|c|} \hline & \text{Male} & \text{Female} \\ \hline \text{Paris} & 251527 & 241945 \\ \hline \text{London} & 738629 & 698958 \\ \hline \end{array} \]

Why not Fisher’s Exact?

  • Fisher’s Exact Test uses a counting system

    • Calculate \((251527 + 241945 + 738629 + 698958)!\) (don’t)
  • We need a method that can work with relative frequency

    • We also need a method of extracting a solution to the second question

Cohen’s \(h\)

  • The null and alternate hypotheses propose two separate populations

  • Effect size is a major assumption of N-P’s method

    • This is the degree the null and alternate groups differ
  • A method of estimating effect size is Cohen’s \(h\)

\[ h = 2 \arcsin{\sqrt{p_1}} - 2 \arcsin{\sqrt{p_2}} \]

Effect Size

  • In practice, effect size is a product of the alternate hypothesis

  • N-P doesn’t see this as a hypothesis worth testing

    • So measurement isn’t necessary
  • We will assign a fixed effect size in the form of \(\beta\) later on

    • Cohen statistics will still come in handy

Optimal Test

  • Neyman and Pearson would ideally have us use their lemma

    • Select the Uniformally Most Powerful Test (UMP)
  • The philosophy is that we select our test dynamically

    • Based on our assumptions and hypotheses

    • For our purposes, we’ll always be provided the test choice

Hypotheses

  • N-P propose two hypotheses, null (or main) and the alternate

  • Unlike Fisher’s method, we aren’t intending to eliminate the null

    • Hence the language of “Main” hypothesis
  • The alternate isn’t ever interesting, but it’s important to have

  • The assumption is that if we reject the main then the alternate is accepted

Errors

  • Fisher’s method doesn’t care about typed error, it assumes zero replication

  • N-P care about error because everything is done “a priori”

    • Before data is collected/observed
  • \(\alpha\) is the probability of type 1 error

  • \(\beta\) is the probability of type 2 error

Errors

  • We want to minimize both \(\alpha\) and \(\beta\)

    • \(\beta\) will always be greater than \(\alpha\)

    • If \(\beta\) needs to be less than alpha, just switch your hypotheses

  • N-P tended to set \(\alpha = 0.05\) and \(\beta = 0.20\)

Power

  • \(\alpha\) is the chance of rejecting a true null

    • \(1-\alpha\) is the chance of maintaining a true null
  • \(\beta\) is the chance of maintaining a false null

    • \(1-\beta\) is the chance of rejecting a false null

    • This is called “power”

Achieving Good Power

It’s common practice to estimate a sample size to achieve “good power”.

  • Regardless of method or philosophy used you’ll find this everywhere

  • It’s required in some cases to be awarded funding from a grant

  • There are many methods for calculating sample size

Sample Size Estimation

  • We typically stick to computers for this

  • A candidate formula for Cohen’s \(h\), however, is:

\[ n = \frac{2(z_{1-\alpha/2} + z_{1-\beta})^2}{h^2} \]

Show at \(\alpha = 0.05\), \(\beta = 0.20\), and \(h = 0.30\) that the ideal sample size is roughly 175

Question 1

How much more probable is a baby boy versus a girl?

\[ \begin{aligned} H_M: p_b - p_g = 0 \\ \\ H_A: p_b - p_g \neq 0 \\ \end{aligned} \]

  • \(\alpha = 0.05\), \(\beta = 0.20\)

Question 1

  • Our optimal test will be one-sample \(z\) test

\[ z = \frac{\hat{p} - 0.5}{\sqrt{0.5(1-0.5)/n}} \]

Note: the true UMP is the binomial test but at these sample sizes it’s equivalent to the \(z\)-test

Question 1

  • Sticking with N-P a priori philosophy:

    • Assume male birth rates are believed to be higher by \(1\%\)

\[ h = 2 \arcsin{\sqrt{0.51}} - 2 \arcsin{\sqrt{0.5}} = 0.02 \]

\[ n = \frac{2(1.96 + 0.84)^2}{0.02^2} = 39200 \]

  • Since it’s only one proportion, we cut the sample size in half: 19600

Question 1

Since \(n = 2241806\), we should be fine to proceed.

  • Proportion of boys: \(990156/2241806 = 0.4417\)

\[ z = \frac{0.4417 - 0.5}{\sqrt{0.5(1-0.5)/2241806}} = -174.5812 \]

  • We now resolve to reject the main hypothesis in favor of the alternate

Clustering

  • Intro to spatial statistics: space matters sometimes

\[ \begin{aligned} z_{Paris} = \frac{0.5097 - 0.5}{\sqrt{0.5(1-0.5)/493472}} = 13.629\\ \\ z_{London} = \frac{0.5138 - 0.5}{\sqrt{0.5(1-0.5)/1437587}} = 33.092\\ \end{aligned} \]

Question 2

Does the probability differ between Paris and London?

\[ \begin{aligned} H_M: p_P - p_L = 0 \\ \\ H_A: p_P - p_L \neq 0 \\ \end{aligned} \]

  • \(\alpha = 0.05\), \(\beta = 0.20\)

Question 2

  • Our test will be a 2 proportion \(z\) test (simplifying once again)

\[ z = \frac{\hat{p_P} - \hat{p_L}}{\sqrt{\tilde{p}(1-\tilde{p})(n_P^{-1} + n_L^{-1})}} \]

\[ h = 2 \arcsin{\sqrt{0.5097}} - 2 \arcsin{\sqrt{0.5138}} = -0.0082 \]

  • Show that the estimated \(n\) is roughly \(36951\)

Question 2

\[ \tilde{p} = \frac{b_P + b_L}{n_P + n_L} = 0.4417 \]

\[ z = \frac{0.5097 - 0.5138}{\sqrt{0.4417(1-0.4417)(493472^{-1} + 1437587^{-1})}} = -5.004 \]

  • We resolve to reject the main hypothesis in favor of the alternate

Conclusions

  • There is a difference in sex ratio among humans

    • Pre-clustering it seems like female births are enormously favored

    • Clustering reveals male births are slightly favored

  • There is a difference in sex ratio between cities

    • Londos has a higher ratio

    • This is why the results shifted so much when combining the data

Dealing with Means

Cohen’s \(d\):

\[ d = \frac{\bar{x}_1 - \bar{x}_2}{\tilde{s}} \]

\[ \tilde{s} = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 -2} \]

Sample size estimation:

\[ n = \frac{2(z_{1-\alpha/2} + z_{1-\beta})^2}{d^2} \]

That’s all folks

  • Friday is bulk exam review

  • Next Monday and Wednesday are practice exam/open office hours

    • I’ll provide practice problems/give feedback/clear makeups
  • Next Friday is the exam, during normal class hours

Closing statements

  • Clear up outstanding assignments

  • Individual meetings

  • General apology tour

  • TEVALs

  • Biometrics II? Start deciding on a project now.

Go away