Summary Statistics III

STAT 240 - Fall 2025

Robert Sholl

Review

Data

Deer body weights

\[ \begin{array}{|c|c|c|c|c|c|c|c|c|} \hline \text{Sex} & \text{F} & \text{M} & \text{F} & \text{M} \\ \hline \text{Body Mass (kg)} & 45.8 & 65.8 & 44.5 & 71.2\\ \hline \text{Sex} & \text{F} & \text{M} & \text{M} & \text{M} \\ \hline \text{Body Mass (kg)} & 42.2 & 68.9 & 61.7 & 19.5 \\ \hline \end{array} \]

Measures of Center

Mean

\[\bar x = \frac{1}{n}\sum_{i=1}^nx_i\]

\[\mu = \frac{1}{N}\sum_{i=1}^n x_i\]

“Average”
Most common measure of center

Median

“True” center or middle value
Organize data from lowest to highest value
- If \(n\) is odd: Choose position \({(n+1)\over2}\) in the ordered data set
- If \(n\) is even: Pick \(n\over 2\) and \({n \over 2}+1\) and average the two data points

\[ \begin{array}{|c|c|c|c|c|} \hline \xcancel{1.23} & \xcancel{1.25} & 1.4 & \xcancel{1.45} & \xcancel{1.92} \\ \hline \end{array} \]

Mode

\[ \begin{array}{|c|c|c|c|c|} \hline \boldsymbol{3} & \boldsymbol{3} & 5 & 7 & 12 \\ \hline \end{array} \]

Most frequent observation
Resistant
Not always very informative
Easy to skew in a few cases
Imagine a homogeneous data set

Measures of Spread

Range

Difference between the largest and smallest data value

\[\text{Range} = \text{Maximum} - \text{Minimum}\]

Variance

Population

\[\sigma^2=\frac{1}{N}\sum_{i=1}^n(x_i-\mu)^2\]

Sample

\[s^2=\frac{1}{n-1}\sum_{i=1}^n(x_i-\bar x)^2\]

Standard Deviation

\[\sigma = \sqrt{\sigma^2}\]

\[s = \sqrt{s^2}\]

Restores the original units
Has a ton of useful mathematical properties
Almost everything can be measured relative to its standard deviation

Empirical Rule

Measures of Position

z-scores

Population:

\[ z = \frac{x-\mu}{\sigma} \]

Sample:

\[ z = \frac{x-\bar x}{\sigma} \]

Percentiles

Arrange the data in increasing order (low to high)
Define a number, \(p\), between \(1\) and \(99\)
Let \(n\) be the sample size. The \(p^{th}\) percentile is located at:

\[ L = \frac{p}{100} \times n \]

Quartiles

Special case of percentiles

\[ \begin{aligned} \text{Q}_1 = 25^{th} \text{ percentile} \\ \\ \text{Q}_2 = 50^{th} \text{ percentile} \\ \\ \text{Q}_3 = 75^{th} \text{ percentile} \\ \end{aligned} \]

Quantiles

Percentiles are a special case of quantiles
- Percentiles can only be between \(1\) and \(99\)
- Quantiles can take on any value between \(0\) and \(100\)
Locating them is the same as percentiles:

\[ L = \frac{q}{100} \times n \]

Where \(q\) is any real number between \(0\) and \(100\)

Five Number Summary

\[ \begin{aligned} \text{Min} = 0^{th} \text{ percentile} \\ \\ \text{Q}_1 = 25^{th} \text{ percentile} \\ \\ \text{Q}_2 = 50^{th} \text{ percentile} \\ \\ \text{Q}_3 = 75^{th} \text{ percentile} \\ \\ \text{Max} = 100^{th} \text{ percentile} \\ \end{aligned} \]

IQR

\[ \text{IQR} = \text{Q}_3 - \text{Q}_1 \]

Define outlier boundaries

\[ \text{Lower Boundary} = \text{Q}_1 - 1.5 \times \text{IQR} \]

\[ \text{Upper Boundary} = \text{Q}_3 + 1.5 \times \text{IQR} \]

Check for values outside of the boundaries

Whiteboard Lecture

Program R

(If there’s time)