Summary Statistics III

STAT 240 - Fall 2025

Robert Sholl

Review

Data

Deer body weights

\[ \begin{array}{|c|c|c|c|c|c|c|c|c|} \hline \text{Sex} & \text{F} & \text{M} & \text{F} & \text{M} \\ \hline \text{Body Mass (kg)} & 45.8 & 65.8 & 44.5 & 71.2\\ \hline \text{Sex} & \text{F} & \text{M} & \text{M} & \text{M} \\ \hline \text{Body Mass (kg)} & 42.2 & 68.9 & 61.7 & 19.5 \\ \hline \end{array} \]

Measures of Center

Mean

\[\bar x = \frac{1}{n}\sum_{i=1}^nx_i\]

\[\mu = \frac{1}{N}\sum_{i=1}^n x_i\]

  • “Average”

  • Most common measure of center

Median

  • “True” center or middle value

  • Organize data from lowest to highest value

    • If \(n\) is odd: Choose position \({(n+1)\over2}\) in the ordered data set

    • If \(n\) is even: Pick \(n\over 2\) and \({n \over 2}+1\) and average the two data points

\[ \begin{array}{|c|c|c|c|c|} \hline \xcancel{1.23} & \xcancel{1.25} & 1.4 & \xcancel{1.45} & \xcancel{1.92} \\ \hline \end{array} \]

Mode

\[ \begin{array}{|c|c|c|c|c|} \hline \boldsymbol{3} & \boldsymbol{3} & 5 & 7 & 12 \\ \hline \end{array} \]

  • Most frequent observation

  • Resistant

  • Not always very informative

  • Easy to skew in a few cases

  • Imagine a homogeneous data set

Measures of Spread

Range

Difference between the largest and smallest data value

\[\text{Range} = \text{Maximum} - \text{Minimum}\]

Variance

  • Population

\[\sigma^2=\frac{1}{N}\sum_{i=1}^n(x_i-\mu)^2\]

  • Sample

\[s^2=\frac{1}{n-1}\sum_{i=1}^n(x_i-\bar x)^2\]

Standard Deviation

\[\sigma = \sqrt{\sigma^2}\]

\[s = \sqrt{s^2}\]

  • Restores the original units

  • Has a ton of useful mathematical properties

  • Almost everything can be measured relative to its standard deviation

Empirical Rule

Measures of Position

z-scores

Population:

\[ z = \frac{x-\mu}{\sigma} \]


Sample:

\[ z = \frac{x-\bar x}{\sigma} \]

Percentiles

  • Arrange the data in increasing order (low to high)

  • Define a number, \(p\), between \(1\) and \(99\)

  • Let \(n\) be the sample size. The \(p^{th}\) percentile is located at:

\[ L = \frac{p}{100} \times n \]

Quartiles

  • Special case of percentiles

\[ \begin{aligned} \text{Q}_1 = 25^{th} \text{ percentile} \\ \\ \text{Q}_2 = 50^{th} \text{ percentile} \\ \\ \text{Q}_3 = 75^{th} \text{ percentile} \\ \end{aligned} \]

Quantiles

  • Percentiles are a special case of quantiles

    • Percentiles can only be between \(1\) and \(99\)

    • Quantiles can take on any value between \(0\) and \(100\)

  • Locating them is the same as percentiles:

\[ L = \frac{q}{100} \times n \]

  • Where \(q\) is any real number between \(0\) and \(100\)

Five Number Summary

\[ \begin{aligned} \text{Min} = 0^{th} \text{ percentile} \\ \\ \text{Q}_1 = 25^{th} \text{ percentile} \\ \\ \text{Q}_2 = 50^{th} \text{ percentile} \\ \\ \text{Q}_3 = 75^{th} \text{ percentile} \\ \\ \text{Max} = 100^{th} \text{ percentile} \\ \end{aligned} \]

IQR

\[ \text{IQR} = \text{Q}_3 - \text{Q}_1 \]

  1. Define outlier boundaries

\[ \text{Lower Boundary} = \text{Q}_1 - 1.5 \times \text{IQR} \]

\[ \text{Upper Boundary} = \text{Q}_3 + 1.5 \times \text{IQR} \]

  1. Check for values outside of the boundaries

Whiteboard Lecture

Program R

(If there’s time)

Go away