STAT 240 - Fall 2025
Deer body weights
\[ \begin{array}{|c|c|c|c|c|c|c|c|c|} \hline \text{Sex} & \text{F} & \text{M} & \text{F} & \text{M} \\ \hline \text{Body Mass (kg)} & 45.8 & 65.8 & 44.5 & 71.2\\ \hline \text{Sex} & \text{F} & \text{M} & \text{M} & \text{M} \\ \hline \text{Body Mass (kg)} & 42.2 & 68.9 & 61.7 & 19.5 \\ \hline \end{array} \]
\[\bar x = \frac{1}{n}\sum_{i=1}^nx_i\]
\[\mu = \frac{1}{N}\sum_{i=1}^n x_i\]
“Average”
Most common measure of center
“True” center or middle value
Organize data from lowest to highest value
If \(n\) is odd: Choose position \({(n+1)\over2}\) in the ordered data set
If \(n\) is even: Pick \(n\over 2\) and \({n \over 2}+1\) and average the two data points
\[ \begin{array}{|c|c|c|c|c|} \hline \xcancel{1.23} & \xcancel{1.25} & 1.4 & \xcancel{1.45} & \xcancel{1.92} \\ \hline \end{array} \]
\[ \begin{array}{|c|c|c|c|c|} \hline \boldsymbol{3} & \boldsymbol{3} & 5 & 7 & 12 \\ \hline \end{array} \]
Most frequent observation
Resistant
Not always very informative
Easy to skew in a few cases
Imagine a homogeneous data set
Difference between the largest and smallest data value
\[\text{Range} = \text{Maximum} - \text{Minimum}\]
\[\sigma^2=\frac{1}{N}\sum_{i=1}^n(x_i-\mu)^2\]
\[s^2=\frac{1}{n-1}\sum_{i=1}^n(x_i-\bar x)^2\]
\[\sigma = \sqrt{\sigma^2}\]
\[s = \sqrt{s^2}\]
Restores the original units
Has a ton of useful mathematical properties
Almost everything can be measured relative to its standard deviation
Population:
\[ z = \frac{x-\mu}{\sigma} \]
Sample:
\[ z = \frac{x-\bar x}{\sigma} \]
Arrange the data in increasing order (low to high)
Define a number, \(p\), between \(1\) and \(99\)
Let \(n\) be the sample size. The \(p^{th}\) percentile is located at:
\[ L = \frac{p}{100} \times n \]
\[ \begin{aligned} \text{Q}_1 = 25^{th} \text{ percentile} \\ \\ \text{Q}_2 = 50^{th} \text{ percentile} \\ \\ \text{Q}_3 = 75^{th} \text{ percentile} \\ \end{aligned} \]
Percentiles are a special case of quantiles
Percentiles can only be between \(1\) and \(99\)
Quantiles can take on any value between \(0\) and \(100\)
Locating them is the same as percentiles:
\[ L = \frac{q}{100} \times n \]
\[ \begin{aligned} \text{Min} = 0^{th} \text{ percentile} \\ \\ \text{Q}_1 = 25^{th} \text{ percentile} \\ \\ \text{Q}_2 = 50^{th} \text{ percentile} \\ \\ \text{Q}_3 = 75^{th} \text{ percentile} \\ \\ \text{Max} = 100^{th} \text{ percentile} \\ \end{aligned} \]
\[ \text{IQR} = \text{Q}_3 - \text{Q}_1 \]
\[ \text{Lower Boundary} = \text{Q}_1 - 1.5 \times \text{IQR} \]
\[ \text{Upper Boundary} = \text{Q}_3 + 1.5 \times \text{IQR} \]
(If there’s time)