Lec 19

Announcements

Welcome back from break

  • If you missed the memo on homework being due last Friday, no worries

    • Submit it, move on with your life


  • Exam 2 is next Friday, April 4

  • Quiz 2 is due this Friday, March 28

  • Homework 4 is due next Monday, March 31

  • Lecture 19 is the last lecture pertaining to Exam 2 / Quiz 2 / Homework 4

    • Any lectures past that will be part of the final


  • The final exam will be take-home

    • It will be comprehensive

    • Timeline on it is not clear yet, but details will come soon


  • If you are worried about your grade, come talk to me

    • I have no interest in failing anyone in this course if they’re willing to put in effort

    • Of the same logic, if you want a specific grade I’m happy to let you try to earn it (I can’t promise it will be easy)

    • If you don’t communicate with me I can’t help you. Don’t wait until the week before finals to tell me you’re unhappy with your grade.


  • TEVALS are coming sooner than it seems

    • I am a statistician, I love data, please fill out your TEVALS

    • I read through every comment individually and make adjustments where I can

      • I am not your Expos 100 professor, I have no clue how any of you write, so be brutal, be honest


  • ASA DataFest

  • Event Schedule

    • If you’re interested, I encourage you grab a friend and sign-up (you can sign-up solo)

    • There’s no harm in showing up and not doing well, it’s a learning experience

    • Participation looks amazing on a resume

    • If you want help to prepare I have an overwhelming amount of resources

    • You don’t need to be a computer science student to learn programming, everything I know is self-taught




Review


Observational Studies

An observational study is one in which we study something as it exists

  • Observe individuals and measure variables of interest

    • Gather medical records to study relationship between smoking and heart disease


Does NOT attempt to influence the response



Designed Experiments

An experiment occurs when we control the conditions under which observations are taken

  • Deliberately imposes treatments on individuals in order to observe response

    • Lab rats are given either a low carbohydrate diet or a high carbohydrate diet to see the effect on weight


Does influence response


Experimental Unit: Subject, animal, or object used in the experiment


Treatment: Experimental condition applied to experimental unit


Response: The thing we measure to determine the effect of the treatments



Confounding

Two variables in a study/experiment

  • Effects are indistinguishable

  • Can’t tell which variable caused an effect

  • Observational studies don’t show causality

  • Experiments can show causality

    • Typically have to have that intent designed into them



Collinearity

Two variables are linearly dependent

“A form of extreme confounding”

  • The variables contain the same information to an extent



Bias

The design of statistical study is biased when one outcome is systematically preferred to others


Impossible to correct for


It is a systematic error caused by bad sampling design


Problems in Sampling

Making up data is always bad


Samples of convenience are easy, cheap, and easy to intentionally bias


Voluntary response surveys can work well if designed well, but they’re very easy to design poorly


Undercoverage


Nonresponse


Response Bias


Question Wording


Order of Questions


Some Observational Study Types

Case-control studies:

A form of observational study that adjusts for this unique problem


  • Select the case-subjects (those with the trait/condition) of interest

    • Take a random sample of those individuals


  • Select a control group without the condition (ideally with similarities to the case subjects)

    • Take a random sample of those individuals


Cohort Studies:

Subjects sharing a common demographic characteristic are enrolled and observed at regular intervals over an extended period of time


  • Starts with a group of similar individuals

  • Observations made over regular intervals




Questions?




Goals for Today:




Design of Experiments


Design Vocab

  • Experimental Units (EU)

    • The individuals (or units) studied in an experiment

    • The unit which treatment is applied

    • Homogenous and independent


  • Factors

    • Explanatory variables (predictors) in an experiment


  • Treatment (trt)

    • Any specific experimental condition applied to the experimental unit

    • If an experiment has several factors, a treatment is a combination of specific levels of each factor


  • Response

    • Outcome being measured




Advantages of Designed Experiments

Imagine if we could run a grand experiment where one city requires masks to be worn at all times throughout a new COVID outbreak, and one bans the use of all masks.

  • NOTE: This is wildly unethical


We would gain a lot of information dense data:

  • We’d tease out confounding effects

  • All of the variance (or noise) in the data would be explainable with the variables at our disposal

  • We would be able to study combined effects (i.e., What’s the interaction between mask usage and age relative to disease prevalence?)


These are the advantages of Experiments:

  • We avoid and tease out confounding effects

  • We control the environment and remove factors we’re not interested in

  • We can study interaction effects (the combined effects of multiple factors)


A clinical trial is being conducted to compare four diets(A,B,C,D), by assessing their ability to reduce LDL cholesterol in humans as measured in (mg/dl) from a 10-ml blood sample. Forty subjects total are available for this trial and will be assigned at random so that each diet has equal numbers of subjects.

  • Factor(s)?

  • Trt(s)?

  • EU?

  • Response?

  • How many EUs are there per trt?


Suppose we have a total of 20 mares (female horses). Ten of them will be implanted with a capsule containing fish oil. The other ten mares will be implanted with a capsule containing sterile water. At the end of a week a blood sample will be taken from each mare and the serum fatty acid concentration (in mg/l) will be measured. Mares will be kept in separate stalls for the week of the experiment.

  • Factor(s)?

  • Trt(s)?

  • EU?

  • Response?

  • How many EUs are there per trt?


Generally speaking we hold a few important principals in Experimental Design:

  • Control: We want to control all our lurking variables affecting the response

  • Randomization: Use some form of impersonal chance to assign treatments to EUs

  • Large Sample Sizes: We tend to want our experiment to have a high power so that we can determine if our results are significant




Common Experimental Designs

You want to determine if a new form of diet could help with weight loss. You gather 30 lab rats and assign each of them a random number (without replacement) betwen 1 and 30. Then you order them from 1 to 30, and randomly generate 15 numbers between that interval. Those numbers are given the new diet and the remaining are giving a standard diet. You homogenize their exercise and activity throughout the trial. The rats are weighed before and after the trial.


What we’ve performed is called a Completed Randomized Design (CRD)

  • Individuals are randomly assigned to groups, then the groups are assigned to trts completely at random

  • It doesn’t need to be an equal number of individuals in each trt group, but balanced designs are mathematically convenient

  • Can include more than one trt

  • The fundamental framework for all experimental design

  • Simplest form of experiment, provides good evidence for difference trts causing different responses



You want to determine the affect of a specific headache medication for both biological males and females. You acquire 50 volunteers and separate them by sex (30 males and 20 females). You perform a CRD on each group, assigning half of each group the trt of the headache medication and the other half a control. You compare the affect for each group.


We’ve performed a Randomized Complete Block Design (RCBD):

  • It looks like stratification, it sounds like stratification, it’s probably stratification

  • A lot of design, exploration, and observation vocabulary is the exact same terms being reworded to fit convention


RCBDs are defined by:

  • Blocking: Taking a group of individuals known prior to the experiment, who share some attribute that is expected to affect the response

  • The blocks are not randomly assigned, but within the blocks we randomly assign trts



If you wanted to observe the outcome of two different trts, one of the ways you could observe that would be by assigning a different trt to two EUs of the exact same characteristic. Another could be by assigning a different trt to the same EU.


In this case we’re met with two different designs: Matched Pair and Repeated Measure


  • Matched Pairs:

    • Two individuals are determined to be similar in the ways that are important to a study, one is given trt A the other is given trt B


  • Pairs of identical twins are selected for a study of blood pressure medicine

  • One of the twins is given medicine A, and the other is given medicine B

  • Which gets which is determined randomly

  • Reduction in blood pressure measured after one week


  • Repeated Measures:

    • Each individual is given both treatments

    • Randomized assignment and order to avoid systematic bias


  • Thirty subjects are selected to compare Pepsi to Coke

  • Each subject tastes both Pepsi and Coke

  • The cola tasted first is determined randomly (as in a coin toss)

  • Subjects not told which one they taste first

  • Subjects give scores to each on scale of 1 to 10 to indicate how much they like the taste


Independence

A huge facet of statistics is the assumption of independence. It’s a highly convenient assumption to make since it let’s us simplify our math (sometimes the math isn’t possible without the assumption)


Imagine I have a random sample of \(10\) ponds. I want to understand the affect of global climate change on the number of fish in each pond. If I assume that each pond is “iid” or “Independent and Identically Distributed” then I can essentially treat each pond as if they’re the exact same.


If we’re certain we lack independence, we must address that in our analysis


If we aren’t certain we can usually just assume we have independence


CRD and Matched Pair designs differ fundamentally since:

  • The matched pair groups are not independent

  • Each pair was intentionally decided and the analysis hinges on them being linked


In a CRD all of the individuals are independent of one another since everything was assigned via fair random chance




Statistical Significance

How can we be certain that Trt A was responsible for Response A?


At one point in time, doctors prescribed cigarettes to patients. Now we talk about cigarette smoking being one of the leading causes of lung cancer. Prior to chemical confirmatory analysis and the genesis of benchtop cancer studies, we didn’t have a guaranteed way to make that connection. How did we make it?


Imagine we have \(100\) individuals who enter into an experimental lung cancer treatment program. As part of their entry they have to fill out a detailed demographic survey that includes questions about diet, exercise, and other habits. We find that of the \(100\), \(98\) of them are regular smokers.

  • Can we say smoking causes cancer?


A fictional tobacco company, Kamel, does their own independent research and finds that roughly \(82\%\) of Americans smoke cigarettes more than once a week. Yet only \(6\%\) of Americans are developing lung cancer.

  • Can we say smoking doesn’t cause cancer?


When we see a difference between average responses we have two possible explanations:

  • There is, in fact, a true difference between trt groups

  • The differences are entirely due to random chance


Statistics allows us to use laws of probability to determine, given the magnitude of differences in trt, how likely something is to fall within the boundary of random chance

  • If the difference is so great that it is unlikely to happen by chance, we say that the difference is statistically significant


Statistical Significance:

  • An observed effect is said to be statistically significant if the effect is so large that it would rarely occur by chance

In general, a strong observed association does not imply causation, but a statistically significant association in data from a well-designed experiment does imply causation.


Despite the prevalence of lung cancer in the total population being \(6\%\) while the prevalence of smoking is \(82\%\), researchers find that the prevalence of smoking within the population of lung cancer patients is nearly \(96\%\). After some careful statistical analysis, it’s found that the probability that smoking doesn’t have some direct correlation with lung cancer is \(<3\%\).

  • At this point, can we say that smoking causes lung cancer?




Cautions

Experimental design is a complex field due to the fact that it’s attempt to capture the complexities of natural phenomenon. The inference we make from any given experiment is typically taken as absolute, so we have to be cautious and careful about the mandatory features and pitfalls of design.


Individuals must all be treated identically in every way

  • Except for the treatments being compared

Blind Studies (Single-Blind)

  • Subjects are unaware of which treatment they’re getting

Double-Blind

  • Not only are the subject unaware, the person who applies the treatment to the subjects is unaware

  • Researcher

  • Doctor

  • This is used to prevent researcher bias



All studies must be replicable

  • Some would say we’re in a “crisis of reproducibility”

    • I would personally agree


It is absolutely vital that any given experiment CAN be reproduced if provided the same general materials/substrates/environment

  • Your specific laboratory bench at Kansas State University is not considered an environment in this case

  • If you cannot provide a cookbook with detailed steps that could allow anyone in the world to perform your exact same experiment and get very similar results

    • Your experiment could be (and should be) considered fraudulent

    • Hold the same standard when reviewing the work of others



“If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is.” - John von Neumann

If a study is interpreting the results of an experimental analysis as if they have captured a real life phenomenon exactly

  • Proceed with caution

  • The lab is never the same as the real world

    • The best we can do is try to mimic it poorly



About 15 years ago the Food and Drug Administration started requiring both sexes to participate in clinical trials for drugs applicable to both males and females. Before this, drugs were approved at the prescribed dosage for use by both men and women. The clinical trials only had males participating.

Recognize that the only true application of our results is in the one setting where we acquired them:

  • Beware of lurking variables, bias, and confounding effect

  • If you’re seeing a broad generalization of results, stay skeptical




Ethics

PREFACE – I am not a moral philospher. Every student going into the science should (keyword) be mandated to take two philosophy courses: Scientific Ethics and Philosophy of Science.


I cannot provide definitive answers on experimental ethics. Instead, I can provide some scenarios and questions to help you start the thought process (and hopefully recognize the importance of taking a moral philosophy course):


If you’re conducting a trial to determine the impact of a social welfare program, should you reward your participants?

  • Poor/homeless individuals are easy to exploit

  • Paying people for their time and effort is how our economic system functions


Sign up \(3000\) patience for an experimental HIV medication trial and provide \(100\) of them a placebo

  • Is it wrong to administer a placebo?

  • I’ve created a clear effect of treatement and eliminated a lot of confounding variables


You want to determine the pest resiliance of a strain of GMO corn. Your experiment is industry sponsored and is taking place at an industry owned farm, directly adjacent to a family owned farm.

If your crop spills over to the family owned farm and taint the genetic makeup of their crop:

  • Have you harmed or helped them?

  • Do you owe them compensation?

  • Or are they stealing your intellectual property by selling their crop now?




Examples

Ability to grow in shade may help pines found in the dry forests of Arizona resist drought. To test this hypothesis, investigators planted pine seedlings in a greenhouse in either full light, light reduced to 25% of normal, or light reduced to 5% of normal. At the end of the study, they dried the young trees and weighed them.

  • Experiment or observational study? What type?

  • EU?

  • Factors?

  • Treatments?

  • Response variable?




One hundred volunteers who suffer from agoraphobia are available for a study. Fifty are selected at random and are given the drug imipramine, which is believed to be effective in treating agoraphobia. The other 50 are given a placebo. A psychiatrist evaluates the symptoms of all volunteers after two months to determine if there has been substantial improvement in the severity of the symptoms.

Suppose the volunteers were first divided into men and women and then half of the men were randomly assigned to the new drug and half of the women were assigned to the new drug. The remaining volunteers received the placebo.

  • Experiment or observational study? What type?

  • EU?

  • Factors?

  • Treatments?

  • Response variable?




Go Away