Lec 25

Review

Null Hypothesis \(H_0\)

The statement we are holding as known and established information

  • i.e., The average body weight of an adult cat is \(10\) lbs.


\[H_0:\mu=10\]


Alternate Hypothesis \(H_a\) or \(H_1\)


The statement we are testing to determine the accuracy of

  • I believe that the cats I interact with regularly have a different average body weight than the population


\[H_a:\mu \neq 10\]


Test Statistic \(t^*\)


A value calculated as part of the hypothesis testing process. We place it into a \(t\)-table (or \(z\)-table depending) to get a \(p\)-value.


\[t^* = \frac{\bar{x} - \mu_0}{{s}/{\sqrt{n}}}\]


  • I weighed \(4\) of my friends cats and my own cat and found that their average body weight was \(8\) pounds, with a standard deviation of \(2.49\)


\[t^* = \frac{8 - 10}{{2.49}/{\sqrt{5}}}\]


\[t^*=-1.796039\]


A reminder of our key study participant:



Significance level \(\alpha\)


The percentage probability we incur Type 1 Error in our hypothesis testing process

  • I want to test my cat weight hypothesis at \(\alpha=0.05\)


P-value


The final statistic calculated in a hypothesis test, used to determine if we reject or fail to reject the null hypothesis

\[2*P(T>t^*)=0.15\]

\[0.15>\alpha \quad \text{Fail to Reject} \ H_0\]


Statistically Significant


We refer to a result as statistically significant if we tested it against a null hypothesis and proceeded to reject the null hypothesis

  • There is insufficient evidence to suggest that the body weight of the cats that interact with regularly have a statistically significant difference in average body weight from the population




Questions?




Goals for today:

  1. Wrap up hypothesis testing (two-sample)

  2. Wrap up the class



Hypothesis Testing II


Hypothesis Tests for Difference Between Two Means (Independent)


We’ve covered hypothesis testing for a single population parameter

  • (e.g., population mean \(\mu\))


Let’s look at testing a claim about the difference between two population means


\[\mu_1-\mu_2\]


We need two independent random samples from two distinct populations

  • Independence implies that \(X\) and \(Y\) have no effect on one another


As with everything we do in this class, we need to confirm our sample can be assumed as approximately normal


\[n>30\]


We want to see if population means \(\mu_1\) and \(\mu_2\) are equal:


\[H_0:\mu_1=\mu_2\]



There are three possible alternate hypotheses:


Left-tailed: \(H_1:\mu_1<\mu_2\)


Right-tailed: \(H_1:\mu_1>\mu_2\)


Two-tailed: \(H_1:\mu_1\neq\mu_2\)



We need a test statistic, \(t^*\):


\[t^*=\frac{(\bar{x}_1-\bar{x}_2)-(\mu_1-\mu_2)}{\sqrt{({s_1^2}/{n_1})+({s_2^2}/{n_2})}}\]


Under \(H_0\), \(\mu_1=\mu_2\), so \(\mu_1-\mu_2=0\)


\[t=\frac{(\bar{x}_1-\bar{x}_2)-0}{\sqrt{({s_1^2}/{n_1})+({s_2^2}/{n_2})}}\]


  • \(\mu_1\), \(\mu_2\) are population means (under the assumption that \(H_0\) is true)

  • \(\bar{x}_1\), \(\bar{x}_2\) are sample means

  • \(s_1\), \(s_2\) are sample standard deviations

  • \(n_1\), \(n_2\) are sample sizes


  • The test statistic measures how large the sample mean difference \((\bar{x}_1-\bar{x}_2)\) differs from the hypothesized value \(\mu_1-\mu_2\) in \(H_0\)


  • The test statistic comes from a Student’s \(t\) distribution with degrees of freedom:


\[df=\min(n_1-1,n_2-1)\]


  • (i.e., the smaller of \(n_1-1\) and \(n_2-1\)).


For the P-value calculation:


Left-tailed: \(H_1:\mu_1<\mu_2\)


\[\text{P-value}=P(T<t)\]


Right-tailed: \(H_1:\mu_1>\mu_2\)


\[\text{P-value}=P(T>t)\]


Two-tailed: \(H_1:\mu_1\neq\mu_2\)


\[\text{P-value}=2\cdot P(T<-|t|) \text{ OR } 2\cdot P(T>|t|)\]


The steps for this hypothesis test are:


  1. State the null and alternate hypotheses


  1. Choose a significance level \(\alpha\)


  1. Compute the test statistic:


\[t=\frac{(\bar{x}_1-\bar{x}_2)}{\sqrt{({s_1^2}/{n_1})+({s_2^2}/{n_2})}}\]


  1. Compute the P-value of the test statistic \(t\)


  • Left-tailed: \(\text{P-value}=P(T<t)\)

  • Right-tailed: \(\text{P-value}=P(T>t)\)

  • Two-tailed: \(\text{P-value}=2\cdot P(T<-|t|)\) or \(2\cdot P(T>|t|)\)


Note: The degrees of freedom of the \(t\) distribution is: \[df=\min(n_1-1,n_2-1)\]


  1. Determine whether to reject \(H_0\):
  • Reject \(H_0\) if \(\text{P-value} \leq \alpha\)


  1. State a conclusion




The National Assessment Educational Progress tested a sample of students who had used a computer in their mathematics classes, and another sample of students who had not used a computer. The sample mean score for students using a computer was 309, with a sample standard deviation of 29. For students not using a computer, the sample mean was 303, with a sample standard deviation of 32. Assume there were 60 students in the computer sample and 40 students in the sample that hadn’t used a computer.

At 5% significance level, conduct a hypothesis test to determine whether the population mean scores differ in the between those students who use a computer and those who do not.


Step 1. State the null and alternative hypotheses


\[H_0: \mu_1 = \mu_2\]


\[H_A: \mu_1 \neq \mu_2 \quad (\rightarrow \text{two-tailed})\]


Step 2. The significance level is \(\alpha=0.05\)


Step 3. Compute the test statistic


\[ \begin{array}{|c|c|c|c|} \hline \text{} & \text{Sample Mean} & \text{Sample Std. Dev.} & \text{Sample Size} \\ \hline \text{With Computer} & \bar{x}_1 = 309 & s_1 = 29 & n_1 = 60 \\ \hline \text{Without Computer} & \bar{x}_2 = 303 & s_2 = 32 & n_2 = 40 \\ \hline \end{array} \]


\[t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]


\[t = \frac{(309 - 303) - (0)}{\sqrt{\frac{29^2}{60} + \frac{32^2}{40}}} \approx 0.953\]


Step 4. Compute the P-value


We use the t-table with \(df=\min(n_1-1, n_2-1)=39\). Then:


\[P(T>0.953) \text{ is between } P(T>1.304)=0.10 \text{ and } P(T>0.681)=0.25\]


For the two-tailed test, the P-value:


\[\text{P-value} = 2 \cdot P(T>0.953) \text{ is between } 0.20 \text{ and } 0.50\]


Step 5. Determine whether to reject \(H_0\)


Since the P-value \(> \alpha = 0.05\), we fail to reject \(H_0\)


Step 6. State a conclusion


There is not enough evidence to conclude that the mean scores differ between those students who use a computer and those who do not (i.e., the mean scores may be the same)




Hypothesis Tests for Difference Between Two Means (Paired)


Next we turn our attention to a hypothesis test for paired (or matched) samples


Example: Gas mileage before and after tune-up for automobiles


\[ \begin{array}{|c|c|c|c|c|c|c|c|c|c|} \hline \text{Automobile} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \text{After Tune-up} & 35.44 & 35.17 & 31.07 & 31.57 & 26.48 & 23.11 & 25.18 & 32.39 \\ \hline \text{Before Tune-up} & 33.76 & 34.30 & 29.55 & 30.90 & 24.92 & 21.78 & 24.30 & 31.25 \\ \hline \end{array} \]


Both mileages before and after tune-up are obtained from the same automobile (i.e., the values are paired within the subject)


Now, we are interested in testing the population mean difference for the matched pairs


Our hypothesis test will involve two paired random samples from a single population

  • The set of differences between the values in the matched pairs is considered as the sample data


The population mean difference for the matched pairs is denoted \(\mu_d\)


\[\mu_d = \text{the mean mileage difference before and after tune-up}\]


The sample mean of the differences is denoted \(\bar{d}\)


\[\bar{d} = \frac{1.68 + 0.87 + \ldots + 1.14}{8} \approx 1.2063\]


The sample std. deviation of the differences is denoted \(s_d\)


\[s_d = \sqrt{\frac{(1.68 - 1.206)^2 + \ldots + (1.14 - 1.206)^2}{7}} \approx 0.3732\]



Step 1. State the null and alternate hypotheses. The null hypothesis is of the form


\[H_0: \mu_d = \mu_0\]


where \(\mu_0\) is a prespecified value (e.g. \(\mu_0 = 0\) is most common)


The alternate hypothesis:


  • Left-tailed: \(H_1: \mu_d < \mu_0\)


  • Right-tailed: \(H_1: \mu_d > \mu_0\)


  • Two-tailed: \(H_1: \mu_d \neq \mu_0\)



Step 2. Choose a significance level \(\alpha\)



Step 3. Compute the test statistic:


\[t = \frac{\bar{d} - \mu_0}{s_d / \sqrt{n}}\]


which follows a Student’s \(t\) distribution with \(df = n - 1\)



Step 4. Compute the P-value of the test statistic \(t\)


Left-tailed: P-value = area under the Student’s \(t\) distribution to the left of \(t\), i.e., \(P(T < t)\)


Right-tailed: P-value = area under the Student’s \(t\) distribution to the right of \(t\), i.e., \(P(T > t)\)


Two-tailed: P-value = sum of the areas under the Student’s \(t\) distribution to the left of \(-|t|\) and right of \(|t|\), i.e., \(2 \cdot P(T < -|t|)\) or \(2 \cdot P(T > |t|)\)



Step 5. Determine whether to reject \(H_0\):


  • Reject \(H_0\) if P-value \(\leq \alpha\)


Step 6. State a conclusion




Attendance QOTD




Go away