Day 16

Review

A standard normal distribution is a normal distribution with:

  • \(\mu=0\)

  • \(\sigma = 1\)



We use the letter \(Z\) to represent a standard normal random variable (referring to \(z\)-score)

The probability that a standard normal random variable \(Z\) is between \(a\) and \(b\) (\(P(a<Z<b)\)) is equal to the area under the standard normal curve over the interval \([a,b]\)


To make inference about the standard Normal distribution, for this class we use the \(z\)-table


  • Reading it isn’t difficult

  • You can put a reminder on your cheat sheet if you so choose:


\[\text{Left Column}=0.0X, \ i.e. 1.2X\]


\[\text{Top Column}=X.X0, \ i.e. X.X6\]


\[L+T=1.26\]


The difficulty is getting to the point where we read it


Conceptually you need to grasp a couple things:









\[0.8413-0.1586=0.6827\]



\[1-0.6827=0.3173\]



A tool for your toolbox


\[1-0.1586=0.8414 \newline 0.8414-0.1586=0.6828\]


What have we done?








Questions?




Goals for today:

  1. Apply the \(z\)-score method for non-standard Normal data

  2. Build a conceptual framework for sampling distributions

  3. Define the Central Limit Theory




The Normal Distribution


Non-standard Normal Distributions

If \(X\) is a normal random variable with mean \(\mu\) and standard deviation \(\sigma\) we write \(X \sim N(\mu,\sigma^2)\)


\(\sim\) means “is distributed (as)”

\(N()\) refers to the normal distribution


Together we’re saying “X is distributed normal with mean \(\mu\) and variance \(\sigma^2\)


  • So if \(\mu=100\) and \(\sigma=5\) we’ll write \(X\sim (100,5^2)\) or \(X \sim (100,25)\)


  • If we write \(X \sim (16,5)\) then \(\mu=16\) and \(\sigma=\sqrt{5}\)


We’ve seen that the standard normal distribution is well understood and we can find probabilities, percentiles, etc. “easily” using a \(z\)-table


If we want to easily learn these things about non-standard random variables it would be convenient if we could transform them into standard normal random variables


Fortunately we can


\[\text{if} \ X \sim N(\mu,\sigma^2), \ \text{then} \ Z={X-\mu \over \sigma} \sim N(0,1)\]



Say \(X=110\)



\[X \sim N(100,10) \Rightarrow Z={X-100 \over 10} \sim N(0,1)\]



\[Z={110-100 \over 10} =1\]





\[z={x-\mu \over \sigma}\]


What we’ve done is convert the non-standard normal distribution into a set of \(z\)-scores


  • What do these \(z\)-scores measure?


The process we’ve performed in making this conversion is called standardizing a normal random variable


  • We can do this moving forward to find out information from a non-standard normal r.v. using the \(z\)-tables


\[P(X\leq x)=P\left({X-\mu \over \sigma}\leq{x-\mu \over \sigma}\right)=P(Z\leq z)\]




Suppose that the heights of American men (\(20\) years and older) are approximately normal with a mean of \(70\) inches and a standard deviation of \(4\) inches


  1. What proportion of American men are less than \(6\) feet tall?


  • (\(6\)\(=\) \(72\)”)

  • \(X \sim N(70,4^2)\)



\[P(X\leq 72)=P\left(Z\leq{72-70 \over 4}\right)=P(Z\leq 0.5)\]



  1. What proportion of American men are between 5’ and 6’ 8”tall?


(\(5\text{’}=60”\) and \(6\text{’}8” = 80”\))


\[P(60<X<80)=P\left({60-70 \over 4}<Z<{80-70 \over 4}\right)\]


We know that for any \(z\)-score, the area to the left of the negative is exactly equal to the area to the right of the positive:




So:


\[=P(-2.50<Z<2.50) =2\times P(Z<-2.50)\]


\[=1-2(\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ )=\]



Given our height example (\(X \sim N(70,4^2)\)), how tall would you have to be so that you are taller than \(90\%\) of American men?


Refer to your \(z\)-table


Find the closest value to \(0.90\)


“un-standardize” the value:


\[x=\sigma z + \mu=4( \ \ \ \ \ \ \ \ \ \ \ \ \ ) + 70 =\]


What is an interval of two heights that contains approximately \(50\%\) of American men?



Find a value on your \(z\)-table that is approximately \(0.25\) or \(0.75\)


  • Why these values?



We should end up with \(0.67\) at \(0.75\)

  • at \(0.25\) it should be \(-0.67\)


\[x=\sigma (-z) + \mu=4(-0.67) + 70 =67.32\]


\[x=\sigma z + \mu=4(0.67) + 70 =72.68\]




Sampling Distribution of Sample Mean & Central Limit

Let’s remember some core vocabulary:


  • Population: The entire collection of individuals we’re seeking information from


  • Sample: A subset of a population of which we can gather real observations from


  • Parameter: A value derived from a population


  • Statistic: A value derived from a sample


Realistically we will never quantify a parameter directly from a population


  • The major goal of the statistical sciences is to make inference about a population and its parameters by gathering a sample and deriving statistics


In practice:


Start with a research question


  • “How effective are seasonal Influenza vaccine campaigns in Kansas?”


\[ \begin{array}{|c|c|c|c|} \hline \text{Population} & \text{Parameter} & \text{Sample} & \text{Statistic } \\ \hline \text{Kansas Residents} & p_V & 10 \ \text{Kansas Towns} & \hat{p}_V\\ \hline \\ \hline \\ \hline \\ \hline \end{array} \]




Business Week reported on the cost per treatment of Herceptin, a drug used to treat breast cancer. Typical treatment costs (in dollars) for Herceptin are provided by a simple random sample of 5 patients.


\[ \begin{array}{|c|c|c|c|c|} \hline 4376 & 5578 & 2717 & 4920 & 4495 \\ \hline \end{array} \]


Find a number that can be used as an estimate of the mean cost per treatment with Herceptin.



Suppose we are interested in determining the average time (in minutes) it takes K-State students to travel to their hometowns.


  • We take a simple random sample of 100 K-State students, ask each selected student how long it takes to travel home, and then compute the sample mean:


\[\bar{x} = 91.34\]


Suppose we take another sample of 100 K-State students. This time our sample mean is:


\[\bar{x} = 89.63\]


  • If we view taking a random sample as an experiment, then the sample mean \(\bar{x}\) is a numerical value assigned to each outcome of the experiment.


We’ve discussed this previously, \(\bar{x}\), our sample mean, is a random variable


When our value is arising from a sample, a limited subset of the population, it’s value with vary each time our sample changes



So all statistics derived from a sample are random variables


This is a fundamental concept to grasp for all of statistics:


All random variables have a random probability distribution


As all statistics are random variables:


All statistics arise from a random probability distribution


We refer to the probability distribution of a sample statistic as the sampling distribution


  • We’re going to look at this through the lens of the sample mean


Let \(\bar{x}\) be the mean of a random sample of size \(n\), drawn from a population with mean \(\mu\) and standard deviation \(\sigma\)


Since \(\bar{x}\) is a random variable, it has the mean and the standard deviation


The mean of \(\bar{x}\) is \(\mu\)


\[\mu_{\bar{x}} = \mu = \text{population mean}\]


The standard deviation of \(\bar{x}\) is \(\sigma / \sqrt{n}\)


\[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}\]



a. A population has mean \(\mu = 6\) and standard deviation \(\sigma = 4\). Find \(\mu_{\bar{x}}\) and \(\sigma_{\bar{x}}\) for a sample size of \(n = 25\)



b. A population has mean \(\mu = 17\) and standard deviation \(\sigma = 20\). Find \(\mu_{\bar{x}}\) and \(\sigma_{\bar{x}}\) for a sample size of \(n = 100\)



The mean and standard deviation of the sample mean \(\bar{x}\) are


\[\mu_{\bar{x}} = \mu\]


\[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}\]


This is true even when the true values of \(\mu\) and \(\sigma\) are unknown


This is how we make inference about population parameters with only sample statistics


We know the values of two parameters associated with the sampling distribution of \(\bar{x}\)

  • To fully understand its distribution, we also need to know its shape

  • Accessing all of this information is typically done through something called an exploratory analysis




Attendance QOTD




Go away