Linear Regression

STAT 240 - Fall 2025

Robert Sholl

Least Squares Regression

\[ y_i = \beta_0 + \beta_1x_i + \epsilon_i \]

  • Linearity

    • We can describe the process with a line

Least Squares Regression

\[ y_i = \beta_0 + \beta_1x_i + \epsilon_i \]

  • Independence in the residuals

\[ \frac{1}{n} \sum_{i=1}^n \epsilon_i = 0 \]

Least Squares Regression

\[ y_i = \beta_0 + \beta_1x_i + \epsilon_i \]

  • Homoscedasticity

    • Constant variance

Linear Regression

\[ \begin{aligned} y_i = \beta_0 + \beta_1x_i + \epsilon_i \\ \\ \epsilon_i \sim N(0,\sigma^2) \\ \end{aligned} \]

  • Everything from before

  • Normality!

    • (in the residuals)

Normality

\[ \epsilon \sim N(0,\sigma^2) \]

\[ f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}}\exp\left(\frac{-(x-\mu)^2}{2\sigma^2}\right) \]

Maximum Likelihood Estimation

\[L(\mu,\sigma^2 | x_1,x_2,...,x_n)=\prod_{i=1}^n\frac{1}{\sqrt{2\pi \sigma^2}}\exp\left({-\frac{(x_i-\mu)^2}{2\sigma^2}}\right)\]

\[ \left(\frac{1}{\sqrt{2\pi \sigma^2}}\right)^n\exp\left({-\frac{(x_i-\mu)^2}{2\sigma^2}}\right) \]

\[ \hat{\theta} = \arg \max L(\theta | y) \]

Maximum Likelihood Estimation

\[\frac{\partial\ell(y_i| x_i,\beta_0,\beta_1,\sigma^2)}{\partial\beta_0}=0\]

\[\frac{\partial\ell(y_i| x_i,\beta_0,\beta_1,\sigma^2)}{\partial\beta_1}=0\]

\[\frac{\partial\ell(y_i| x_i,\beta_0,\beta_1,\sigma^2)}{\partial\sigma^2}=0\]

Maximum Likelihood Estimation

\[\hat\beta_0=\bar y - \hat\beta_1 \bar x\]

\[\hat\beta_1 = \frac{\sum_i^n(x_i-\bar x)(y_i-\bar y)}{\sum_i^n(x_i-\bar x)^2}\]

\[\hat \sigma^2 = \frac{1}{n} \sum_{i=1}^n(y_i-(\beta_0+\beta_1x_i))^2\]

Coefficient of Determination

Coefficient of Determination

Coefficient of Determination

Coefficient of Determination

Coefficient of Determination

\[R^2=1-\frac{\text{RSS}}{\text{TSS}}\]

\[\text{RSS}=\sum_{i=1}^n(y_i-\hat y)^2\]

\[\text{TSS}=\sum_{i=1}^n(y_i-\bar y)^2\]

\[R^2=r\times r\]

Applied Review

Fentanyl

  • Synthetic opioid

  • ~50-100x stronger than morphine

  • ~70,000 deaths per year

  • Lethal dose of 2 milligrams

Exploratory Analysis

Exploratory Analysis

Exploratory Analysis

Exploratory Analysis

Exploratory Analysis

Variable Selection

Model Proposals

\[ y_i = \beta_0 + \beta_1 x_i + \epsilon_i \]

\[ \epsilon_i \sim N(0,\sigma^2) \]

Results


Call:
lm(formula = Deaths ~ Personal_income)

Residuals:
    Min      1Q  Median      3Q     Max 
-2574.1 -1017.8  -339.0   522.9  8226.3 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -1.203e+03  3.918e+02  -3.069  0.00228 ** 
Personal_income  5.106e-02  7.276e-03   7.017 8.43e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1527 on 448 degrees of freedom
Multiple R-squared:  0.09902,   Adjusted R-squared:  0.09701 
F-statistic: 49.24 on 1 and 448 DF,  p-value: 8.431e-12

Results


Call:
lm(formula = Deaths ~ Education)

Residuals:
    Min      1Q  Median      3Q     Max 
-2003.6 -1130.6  -407.7   522.9  9143.4 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -426.51     434.22  -0.982    0.327    
Education      60.41      13.41   4.503 8.54e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1574 on 448 degrees of freedom
Multiple R-squared:  0.04331,   Adjusted R-squared:  0.04117 
F-statistic: 20.28 on 1 and 448 DF,  p-value: 8.543e-06

Results

Go Away