Day 16

Review

A standard normal distribution is a normal distribution with:

\(\mu=0\)
\(\sigma = 1\)

We use the letter \(Z\) to represent a standard normal random variable (referring to \(z\)-score)

The probability that a standard normal random variable \(Z\) is between \(a\) and \(b\) (\(P(a<Z<b)\)) is equal to the area under the standard normal curve over the interval \([a,b]\)

To make inference about the standard Normal distribution, for this class we use the \(z\)-table

Reading it isn’t difficult
You can put a reminder on your cheat sheet if you so choose:

\[\text{Left Column}=0.0X, \ i.e. 1.2X\]

\[\text{Top Column}=X.X0, \ i.e. X.X6\]

\[L+T=1.26\]

The difficulty is getting to the point where we read it

Conceptually you need to grasp a couple things:

\[0.8413-0.1586=0.6827\]

\[1-0.6827=0.3173\]

A tool for your toolbox

\[1-0.1586=0.8414 \newline 0.8414-0.1586=0.6828\]

What have we done?

Questions?

Goals for today:

Apply the \(z\)-score method for non-standard Normal data
Build a conceptual framework for sampling distributions
Define the Central Limit Theory

The Normal Distribution

Non-standard Normal Distributions

If \(X\) is a normal random variable with mean \(\mu\) and standard deviation \(\sigma\) we write \(X \sim N(\mu,\sigma^2)\)

\(\sim\) means “is distributed (as)”

\(N()\) refers to the normal distribution

Together we’re saying “X is distributed normal with mean \(\mu\) and variance \(\sigma^2\)

So if \(\mu=100\) and \(\sigma=5\) we’ll write \(X\sim (100,5^2)\) or \(X \sim (100,25)\)

If we write \(X \sim (16,5)\) then \(\mu=16\) and \(\sigma=\sqrt{5}\)

We’ve seen that the standard normal distribution is well understood and we can find probabilities, percentiles, etc. “easily” using a \(z\)-table

If we want to easily learn these things about non-standard random variables it would be convenient if we could transform them into standard normal random variables

Fortunately we can

\[\text{if} \ X \sim N(\mu,\sigma^2), \ \text{then} \ Z={X-\mu \over \sigma} \sim N(0,1)\]

Say \(X=110\)

\[X \sim N(100,10) \Rightarrow Z={X-100 \over 10} \sim N(0,1)\]

\[Z={110-100 \over 10} =1\]

\[z={x-\mu \over \sigma}\]

What we’ve done is convert the non-standard normal distribution into a set of \(z\)-scores

What do these \(z\)-scores measure?

The process we’ve performed in making this conversion is called standardizing a normal random variable

We can do this moving forward to find out information from a non-standard normal r.v. using the \(z\)-tables

\[P(X\leq x)=P\left({X-\mu \over \sigma}\leq{x-\mu \over \sigma}\right)=P(Z\leq z)\]

Suppose that the heights of American men (\(20\) years and older) are approximately normal with a mean of \(70\) inches and a standard deviation of \(4\) inches

What proportion of American men are less than \(6\) feet tall?

(\(6\)’ \(=\) \(72\)”)
\(X \sim N(70,4^2)\)

\[P(X\leq 72)=P\left(Z\leq{72-70 \over 4}\right)=P(Z\leq 0.5)\]

What proportion of American men are between 5’ and 6’ 8”tall?

(\(5\text{’}=60”\) and \(6\text{’}8” = 80”\))

\[P(60<X<80)=P\left({60-70 \over 4}<Z<{80-70 \over 4}\right)\]

We know that for any \(z\)-score, the area to the left of the negative is exactly equal to the area to the right of the positive:

So:

\[=P(-2.50<Z<2.50) =2\times P(Z<-2.50)\]

\[=1-2(\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ )=\]

Given our height example (\(X \sim N(70,4^2)\)), how tall would you have to be so that you are taller than \(90\%\) of American men?

Refer to your \(z\)-table

Find the closest value to \(0.90\)

“un-standardize” the value:

\[x=\sigma z + \mu=4( \ \ \ \ \ \ \ \ \ \ \ \ \ ) + 70 =\]

What is an interval of two heights that contains approximately \(50\%\) of American men?

Find a value on your \(z\)-table that is approximately \(0.25\) or \(0.75\)

Why these values?

We should end up with \(0.67\) at \(0.75\)

at \(0.25\) it should be \(-0.67\)

\[x=\sigma (-z) + \mu=4(-0.67) + 70 =67.32\]

\[x=\sigma z + \mu=4(0.67) + 70 =72.68\]

Sampling Distribution of Sample Mean & Central Limit

Let’s remember some core vocabulary:

Population: The entire collection of individuals we’re seeking information from

Sample: A subset of a population of which we can gather real observations from

Parameter: A value derived from a population

Statistic: A value derived from a sample

Realistically we will never quantify a parameter directly from a population

The major goal of the statistical sciences is to make inference about a population and its parameters by gathering a sample and deriving statistics

In practice:

Start with a research question

“How effective are seasonal Influenza vaccine campaigns in Kansas?”

\[ \begin{array}{|c|c|c|c|} \hline \text{Population} & \text{Parameter} & \text{Sample} & \text{Statistic } \\ \hline \text{Kansas Residents} & p_V & 10 \ \text{Kansas Towns} & \hat{p}_V\\ \hline \\ \hline \\ \hline \\ \hline \end{array} \]

Business Week reported on the cost per treatment of Herceptin, a drug used to treat breast cancer. Typical treatment costs (in dollars) for Herceptin are provided by a simple random sample of 5 patients.

\[ \begin{array}{|c|c|c|c|c|} \hline 4376 & 5578 & 2717 & 4920 & 4495 \\ \hline \end{array} \]

Find a number that can be used as an estimate of the mean cost per treatment with Herceptin.

Suppose we are interested in determining the average time (in minutes) it takes K-State students to travel to their hometowns.

We take a simple random sample of 100 K-State students, ask each selected student how long it takes to travel home, and then compute the sample mean:

\[\bar{x} = 91.34\]

Suppose we take another sample of 100 K-State students. This time our sample mean is:

\[\bar{x} = 89.63\]

If we view taking a random sample as an experiment, then the sample mean \(\bar{x}\) is a numerical value assigned to each outcome of the experiment.

We’ve discussed this previously, \(\bar{x}\), our sample mean, is a random variable

When our value is arising from a sample, a limited subset of the population, it’s value with vary each time our sample changes

So all statistics derived from a sample are random variables

This is a fundamental concept to grasp for all of statistics:

All random variables have a random probability distribution

As all statistics are random variables:

All statistics arise from a random probability distribution

We refer to the probability distribution of a sample statistic as the sampling distribution

We’re going to look at this through the lens of the sample mean

Let \(\bar{x}\) be the mean of a random sample of size \(n\), drawn from a population with mean \(\mu\) and standard deviation \(\sigma\)

Since \(\bar{x}\) is a random variable, it has the mean and the standard deviation

The mean of \(\bar{x}\) is \(\mu\)

\[\mu_{\bar{x}} = \mu = \text{population mean}\]

The standard deviation of \(\bar{x}\) is \(\sigma / \sqrt{n}\)

\[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}\]

a. A population has mean \(\mu = 6\) and standard deviation \(\sigma = 4\). Find \(\mu_{\bar{x}}\) and \(\sigma_{\bar{x}}\) for a sample size of \(n = 25\)

b. A population has mean \(\mu = 17\) and standard deviation \(\sigma = 20\). Find \(\mu_{\bar{x}}\) and \(\sigma_{\bar{x}}\) for a sample size of \(n = 100\)

The mean and standard deviation of the sample mean \(\bar{x}\) are

\[\mu_{\bar{x}} = \mu\]

\[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}\]

This is true even when the true values of \(\mu\) and \(\sigma\) are unknown

This is how we make inference about population parameters with only sample statistics

We know the values of two parameters associated with the sampling distribution of \(\bar{x}\)

To fully understand its distribution, we also need to know its shape
Accessing all of this information is typically done through something called an exploratory analysis

Day 16

Review

The Normal Distribution

Non-standard Normal Distributions

Sampling Distribution of Sample Mean & Central Limit

Attendance QOTD

Go away