STAT 240 - Fall 2025
Colorado has seen a large plummet in divorces over the past 2 decades
What are they doing to produce such results?!
Two or more variables that have some shared trend due to unknown causation or a shared cause
\[ \begin{aligned} \text{Let } & X \equiv \{\text{Milk Consumption} \} \\ & Y \equiv \{ \text{Divorce Rate} \} \\ \\ & x_i = \text{pounds of milk consumed in year } i\\ & y_i = \text{divorce rate in year } i\\ \\ & (x_i,y_i) \text{ are an "ordered pair" from bivariate data} \\ & (x_1,y_1) , ... , (x_n,y_n) \text{ are coordinates on a graph} \end{aligned} \]
Linear association: Two variables can be reasonably described with a line
Positive association: larger values of one variable relate to larger values of another
Strength of association: the degree a linear association fits on a line
Negative association: larger values of one variable related to smaller values of another
When values are flatlined or not expressable with a line they’re referred to as “lacking association” or “nonlinear”
\[z_{x,y} = \frac{(x,y)-(\bar x, \bar y)}{(s_x,s_y)}\]
\[ \frac{\sum(z_x \times z_y)}{n-1} \]
\[ r = \frac{1}{n-1}\sum_i \left(\frac{x_i - \bar x}{s_x} \right) \left(\frac{y_i - \bar y}{s_y} \right) \]
If \(r = 1\), all of the data falls on a line with a positive slope
If \(r = -1\), all of the data falls on a line with a negative slope
As \(r \rightarrow 0\) the relationship between \(x\) and \(y\) weakens
If \(r=0\) no linear relationship exists
As a rule of thumb, \(-0.6<r<0.6\) is considered a weak relationship
The correlation does not depend on unit of measure
Correlation is sensitive to outliers
Correlation cannot capture nonlinear relationships
\[ \begin{array}{|c|c|c|c|c|c|c|} \hline \text{Min} & \text{Q}_1 & \text{Median} & \text{Q}_3 & \text{Max} \\ \hline 134 & 154 & 177 & 185 & 197\\ \hline \end{array} \]
Find the 5 values in the five number summary
Compute the IQR
Find the upper & lower bounds for outliers
Draw a number line to represent the scale
Above the number line, draw a box with one end at \(\text{Q}_1\) and the other at \(\text{Q}_3\)
Draw horizontal lines (“whiskers”) from the box to the smallest and largest values within the upper & lower outlier bounds
Plot observations outside the bounds with a “star” (*) to identify them as outliers
Correlation doesn’t depend on unit of measure
It also can’t consider them
What other variable do \(x\) and \(y\) change with?