Age | 67 | 52 | 57 | 56 | 60 | 42 | 58 | 53 | 52 | 39 | 54 | 50 | 62 | 38 | 54 | 48 | 64 | 69 | 53 | 62 |
Cholesterol | 229 | 196 | 241 | 283 | 253 | 265 | 218 | 282 | 205 | 219 | 304 | 196 | 263 | 175 | 214 | 245 | 325 | 239 | 264 | 209 |
Day 9
Review
Sample Mean
\[\bar{x} = {1 \over n}\sum\limits_{i=1}^nx_i\]
\[\bar{y} = {1 \over n}\sum\limits_{i=1}^ny_i\]
Sample Variance
\[s_x^2 = {{{\sum\limits_{i=1}^n}(x_i-\bar{x})^2}\over (n-1)}\]
\[s_y^2 = {{{\sum\limits_{i=1}^n}(y_i-\bar{y})^2}\over (n-1)}\]
Sample Standard Deviation
\[\sqrt{s_x^2}=s_x\]
\[\sqrt{s_y^2}=s_y\]
Correlation Coefficient
Given \(n\) ordered pairs: (\(x_i,y_i\))
With sample means: \(\bar{x}\) and \(\bar{y}\)
Sample standard deviations: \(s_x\) and \(s_y\)
The correlation coefficient \(r\) is given by:
\[r = \frac{1}{n-1} \sum_i \left( \frac{x_i - \bar{x}}{s_x} \right) \left( \frac{y_i - \bar{y}}{s_y} \right)\]
Correlation Coefficient Properties:
- The value is always between \(-1 \le r \le 1\)
If \(r=1\), all of the data falls on a line with a positive slope
If \(r=-1\) all of the data falls on a line with a negative slope
The closer \(r\) is to 0, the weaker the linear relationship between \(x\) and \(y\)
If \(r=0\) no linear relationship exists
- The correlation does not depend on the unit of measurement for the two variables
- Correlation is very sensitive to outliers
- Correlation measures only the linear relationship and may not (by itself) detect a nonlinear relationship
Cholesterol Example
We can draw a line through this data:
We come up with this line using Least Squares Regression:
Given ordered pairs: (\(x,y\))
With sample means: \(\bar{x}\) and \(\bar{y}\)
Sample standard deviations: \(s_x\) and \(s_y\)
Correlation coefficient: \(r\)
The equation of the least-squares regression line for predicting \(y\) from \(x\) is:
\[\hat{y}=\beta_0+\beta_1x\]
Where the slope (\(\beta_1\)) is:
\[\beta_1 = r * {s_y\over s_x}\]
And the intercept (\(\beta_0\)) is:
\[\beta_0=\bar{y}-\beta_1 \bar{x}\]
- What do we call \(y\)?
- \(x\)?
- \(\beta_0\)?
- \(\beta_1\)?
- \(\hat{y}\)?
- How do we interpret \(\beta_0\) and \(\beta_1\)?
Questions?
Goals for Today:
Define and discuss mathematical modeling
Define statistical models and the linear model
Make predictions about some birds
Linear Regression
Mathematical Modeling
What is a mathematical model?
A simplified mathematical representation of an existing system
Think back to LSR
\[Y=\beta_0+\beta_1X\]
Least Squares is a mathematical model
- We’re taking coordinates and using them in a function to create a simplification of those coordinates
\[y=mx+b\]
\[y=b+ax\]
Mathematical modeling comes in many forms:
Quadratic/Polynomial regression
Quantile regression
Categorical/Ordinal modeling
Differential/Difference equations
Network/Graph models
None of these are used as commonly as the OLS/LSR model
Why?
Statistical Modeling
“All models are wrong, but some are useful” - George Box
Where there is OLS/LSR, there is the Linear Model
\[y_i=\beta_0+\beta_1x_i+\epsilon_i\] \[\epsilon_i\sim N(0,\sigma^2)\]
\(\epsilon_i\) (error) is what separates the mathematical model from the statistical model
This is roughly the result of the major goal of OLS/LSR being combined with the major assumption of the linear model:
Minimizing the sum of squared errors (residuals)
Assumption of normality in the residuals
Do we see any issues with this process?
\[ \begin{array}{|c|c|c|c|c|c|c|} \hline \text{Year} & \text{Count}\\ \hline 1980 & 78 \\ \hline 1981 & 73 \\ \hline 1982 & 73 \\ \hline 1983 & 75 \\ \hline 1984 & 86 \\ \hline 1985 & 97 \\ \hline 1986 & 110 \\ \hline 1987 & 134 \\ \hline 1988 & 138 \\ \hline 1989 & 146 \\ \hline 1990 & 146 \\ \hline \end{array} \]
\[ \begin{array}{|c|c|c|c|c|c|c|} \hline \text{n} & \bar{x} & \bar{y} & s_x & s_y & r\\ \hline \quad \quad & \quad \quad \quad & \quad \quad \quad & \quad \quad \quad & \quad \quad \quad & \quad \quad \quad\\ \hline \end{array} \]
\[r = \frac{1}{n-1} \sum_i \left( \frac{x_i - \bar{x}}{s_x} \right) \left( \frac{y_i - \bar{y}}{s_y} \right)\]
\[\beta_1 = r * {s_y\over s_x}\]
\[\beta_0=\bar{y}-\beta_1 \bar{x}\]
Attendance QOTD
Go away