Observational Studies

STAT 240 - Fall 2025

Robert Sholl

Motivating Examples

Norovirus (Norwalk Virus)

  • \(\# 1\) causative agent for gastroenteritis

  • Highly infectious and concerning for YOPI pop.

  • Most prevalent in winter months and on cruise vessels

Norovirus

  • Symptoms include:

    • Nausae

    • Fever

    • Vomiting

    • Abdominal pain

    • Loss of taste

How can we study what disease control methods are most effective with this virus?

The Dust Bowl

  • Severe drought and dust storms

    • Western Kansas to Eastern Colorado
  • Poor farming techniques

  • Erosion / water quality vigilance

Could this happen again?

Kansas Lower-Republican River Basin

  • River basin that MHK sits on

  • Contains rural and metro populations

How can we determine which methods for preserving our land and water work?

Observational Studies

A study where the independent variable is not controlled by the researchers

  • Observing nature as nature

  • Ethics

  • Finances

  • Influence versus control

Observational Studies

Advantages:

  • Ethical resolution to unethical experiments

  • Typically cheaper than experiments

  • Easy to get a lot of data

Disadvantages:

  • Messy data

  • Lack of replication

  • Causal insufficiency

Case-control Studies

Intentionally selecting confirmed case positive participants (cases) and pairing them with confirmed case negative participants (controls).

  • Prospective: gather subjects beforehand and observe their progression

  • Retrospective: gather subjects after progression has occurred and survey them

Good at removing redundancy in sampling

Hard to execute / get large samples

Cohort Studies

Subjects sharing a common demographic characteristic are enrolled and observed at regular intervals over an extended period of time.

  • Almost always prospective studies

    • Retrospective is possible though

Very information dense data

Equally resource intensive

Problems in sample “drop-outs”

Grand Experiments

Experimenting on massive samples / populations is almost always unethical, outside of “grand” experiments

  • COVID-19 pandemic:

    • Florida and California performed a grand experiment

    • By accident

  • Florida \(\rightarrow\) anti-mask

  • California \(\rightarrow\) “militant” mask enforcement

Problems in Sampling

  • Never make up data

    • There’s a line between simulation, data correction, and falsifying information

    • It’s paper thin

  • Samples of convenience

    • Economic issue

    • Rarely valid

  • Surveys / Voluntary response

    • Necessary for a lot of obs. studies

    • Easy to mess up

Making up data

You’ve been working on a series of field experiments for 18 months. When it comes time to perform your analysis of the data you find results that are insignificant only by a small margin. You know that your boss won’t force you to keep collecting data since you’re graduating soon but instead pass off the project to the next student who can finish data collection.

Looking through the data you find some results that seem like measurement errors. If you adjust them to the average outcome your results become significant and you can publish your work before graduation.

  • This is a common story

  • The story rarely ends with the “right” outcome

Samples of Convenience

When we sample we have to ensure we’re hitting the correct population

  • Who does your sample actually represent?


  • Self reported study limitations

    • Convenience sample, \(814\) participants, \(48\%\) women

    • \(45\%\) of of women held a Master’s or higher

    • \(>90\%\) white respondents

    • ‘…we don’t expect much political ideological bias.’

Voluntary Response

Get rich quick idea:

Fix political polling in the United States.

  • How do we sample from a population that has no interest in being sampled?

  • How do we ensure the sample we’ve obtained is the sample we intended on?

  • How can we guarantee the results we get from our samples are accurate?

Question Bank Problems

Go Away