STAT 240 - Fall 2025
Last stop out of theory-land
Describe the support of the following random variables:
\[ \begin{array}{|c|c|} \hline \text{r.v.} & \text{Description} \\ \hline T & \text{Time it takes for a drug to enter the blood stream} \\ \hline U & \text{Probability of landing on 3 on a fair 6-sided dice} \\ \hline V & \text{Volume of the water in a swimming pool} \\ \hline W & \text{Proportion of people who get sick at an Aggieville bar}\\ \hline X & \text{The result of 50 coin flips} \\ \hline Y & \text{Number of wild boar observed in a 10 acre plot} \\ \hline \end{array} \]
Given:
\[ X \sim N(\mu,\sigma^2) \]
\(X\) is continuous
The support of \(X\) is \(S_X = (-\infty,\infty)\)
\(X\) has a centered mean as well as constant variance
There are two ways to approach the application of statistics: Design or modeling
We can use simplistic modeling if we have a very good study/experimental design
If our design is flawed we can make up for this with better modeling
Both can be improved with increased sample sizes
For modeling we have two options for getting stronger inference:
Increase sample size
Load on assumptions
Is the assumption of normality asking much?
If \(\mu\) is large and \(\sigma\) is small we don’t care about the lower bounds
\[ \begin{aligned} & Y = \{\text{The number of wild boar observed in a 10 acre plot}\} \\ & S_Y = \{0,1,2,...\} \\ \end{aligned} \]
Let \(\ell = \ln{(Y)}\)
\[ S_{\ell} = [0,\infty) \]
\[ T = \{\text{The time it takes for a drug to enter the blood stream}\} \]
Let \(\tilde T = T + 100\)
Now no matter how small \(T\) is, \(\tilde T\) won’t cross \(0\)
If \(\tilde T\) looks bell shaped we can use the normal distribution!
\[ W = \{\text{The proportion of people who get sick at an Aggieville bar}\} \\ \]
Say that \(W\) was right-skewed:
Let \(Q = \arcsin \sqrt{W}\)
Hand-wavey transformations are useful
Let’s look at what we dealt with last class
\[ \mu_{\bar x} = \mu \quad , \quad \sigma_{\bar x} = \frac{\sigma}{\sqrt{n}} \]
Let \(\mu = 10\) and \(\sigma = 5\). Show the distribution of \(\bar x\) at \(n=1\), \(n=10\), \(n=100\), and \(n=1000\).
As we increase assumptions we see the same result:
More assumptions \(\rightarrow\) more bias
Less assumptions \(\rightarrow\) less variance
What’s a possible explanation for this? Does the opposite relationship make sense?
In distribution theory we seek to:
Formally describe distributions of random variables
Determine their moments and other features
Put them into practice
Locate any useful transformations or cases of the distributions
Let \(X\) be the result of a fair coin toss.
\[ X = \begin{cases} 1 & \text{if heads} \\ 0 & \text{if tails} \\ \end{cases} \]
\[ P(X=x)= \begin{cases} 0.5 & \text{if }\ x=1 \\ 0.5 & \text{if }\ x=0 \end{cases} \]
Let \(X\) be the result of an unfair coin toss.
\[ P(X=x)= \begin{cases} p & \text{if }\ x=1 \\ 1-p & \text{if }\ x=0 \end{cases} \]
\[X \sim \text{Bern}(p)\]
Let \(X\) be the result of \(2\) fair coin tosses.
\[ \begin{array}{|c|c|} \hline \text{Flip 1} & \text{Flip 2}\\ \hline \text{1} & \text{1}\\ \hline \text{1} & \text{0}\\ \hline \text{0} & \text{1}\\ \hline \text{0} & \text{0}\\ \hline \end{array} \]
Thinking a little differently:
\[ \begin{array}{|c|c|c|c|} \hline & 0 & 1 & 2\\ \hline 0 & 1 & 0 & 0 \\ \hline 1 & 1 & 1 & 0 \\ \hline 2 & 1 & 2 & 1 \\ \hline \end{array} \]
We can represent this with the “choose” function:
\[ {n \choose x} = \frac{n!}{x!(n-x)!} \]
\(\text{where } n!=n\times(n-1)\times (n-2) \times...\times3 \times 2 \times 1\)
If the flips are independent we can express the probability as:
\[ \begin{aligned} & \text{Success} = p^x \\ & \text{Failure} = (1-p)^{n-x} \\ \end{aligned} \]
Smashing those together:
\[P(X=x)={n \choose x}p^x(1-p)^{n-x}\]
\[X \sim \text{Binom}(n,p)\]
\[ \begin{aligned} & X \sim \text{Binom}(n,p) \\ \\ & \mathbb{E}X = np \\ \\ & \mathbb{V}X=np(1-p) \\ \\ \end{aligned} \]
Poisson: the French word for “fish”
Think about counting fish in a pond
\[X \sim \text{Pois}(\lambda)\]
\[\mathbb{E}X=\mathbb{V}X=\lambda\]
Before we defined this as a shape
Imagine now that you can control the interval of that uniform bar
\[f(x)=\begin{cases} \frac{1}{b-a} & \text{for }\ a \leq x \leq b \\ 0 & \text{otherwise} \end{cases}\]
\[X\sim \text{Unif}(a,b)\]
For data that takes on values between \(0\) and \(1\)
\[X \sim \text{Beta}(\alpha,\beta)\]
\[\mathbb{E}X=\frac{\alpha}{\alpha+\beta} \quad \quad \mathbb{V}X=\frac{\alpha \beta}{(\alpha+ \beta)^2(\alpha+\beta+1)}\]
Is time discrete or continuous?
Imagine how you would describe both!
Now imagine the constraints for both
Can time be negative?
Can time stretch to infinities?
For continuous and strictly positive data:
\[X \sim \text{Gamma}(\alpha,\beta)\]
The expectation and variance of this differ
Depends how we parameterize it
As such, we won’t discuss those
Time to draw lines
Any mathematical representation of a process
Think about model planes:
Can we make a very simple one?
What about a very complex one?
Can they both fly?
Is either as good as a real plane?
Building a model, blind of a process, to better understand a process.
\[y = mx + b\]
Building a model using knowledge of a process, to better understand the data.
\[\lambda(t)=\lambda_0e^{\gamma(t-t_0)}\]
Linear in the parameters:
\[Y=aX+b\]
\[w=\beta_0+\beta_1x+\beta_2x^2\]
\[P=\mu+\alpha\log(r\times t)\]
\[\boldsymbol \Delta=\boldsymbol \beta \cos(\boldsymbol X)+10\]
We did something nonlinear to those parameters:
\[Y=a^2X+b\]
\[w=\beta_0+\log(\beta_1)x\]
\[\lambda(t)=\lambda_0e^{\gamma(t-t_0)}\]
The same inputs always result in the same outputs:
\[y = 2x\]
Let \(x=4\)
Let \(x=4\) again
Did you get the same result?
There’s some random element to the model:
\[ y = mx + b + \text{Random Error} \]
The same input won’t always get the same output
Explicit models: Input values for the parameters and solve for the output (a.k.a. response)
Implicit models: Solve for the parameters to understand the structure of the model
We do both in statistics (usually back to back)
\[P(t)=P_0e^{rt}\]
\[y=mx+b\]
There’s two more “distinctions” of models, specific to statistics:
Frequentist
Bayesian
This exists, we’ll talk about it a little, not too much
STAT 341, STAT 610/611, STAT 768