Experimental power

When we conduct hypothesis tests there are two kinds of error: \(\alpha\) (which we set, typically at \(5\%\)) and \(\beta\). \(\alpha\) is the probability of making a Type I error: rejecting a null or primary hypothesis when it was correct. \(\beta\) is the probability of making a Type II error: failing to reject the null or primary hypothesis when the alternative is correct.

\(\beta\) can be tricky since we don’t set it directly. Instead, it’s influenced by sample size, population variation, absolute effect size, experimental design, and \(\alpha\). But we should still try to get an idea of what our power is before we run an experiment so that we can:

  1. Adjust our sample size or design to decrease \(\beta\)
  2. Properly interpret our statistical results.

\(\beta\) is directly related to experimental power (\(1-\beta\)). Power is the probability that we correctly reject the null or primary hypothesis.

A problem with determining experimental power is that there may not be a nice, closed-form equation for power given your experimental design. No problem, we can simulate it.

Things you’ll need

  1. An idea of how the data-generating process works.
  2. Approximate values that you could expect
    • Use literature values if possible, even from other species or similar studies
  3. An idea of what constitutes “Biologically significant” for your problem.
    • Field experiments are often under powered (can’t detect biologically significant effects)
    • Genetic experiments are often overpowered (can detect lots of effects, but few of them are biologically significant)
    • You absolutely need to know what a biologically significant effect looks like before doing your study.
      • This is not trivial, but also isn’t the focus of this example.
      • Ask yourself: “What size of biological effect do we actually care about?”

The worked example

Let’s do an example with simple linear regression. Let’s say that we’re examining a relationship between \(x\) and \(y\). Let’s also say that a slope of 0.5 would be biologically significant for our purposes. What can we do to increase the chance of detecting this relationship assuming that it exists?

  1. Well, we can increase our sample size.
    • This is typical for all experiments.
  2. We can increase the range of \(x\).
    • This is due to leverage and it will help specifically with a regression.

We’ll need to do the following:

  1. Create a vector of sample sizes to check.
  2. Create a vector of \(x\) widths to check.
    • By this I mean increasing the range of our \(x\) variable.
    • For example, instead of sampling between \(5 \leq x \leq10\) try \(0 \leq x \leq 20\) if possible.
  3. Use expand.grid to create a data frame of all possible combinations of sample size and width
  4. Run the simulation using your grid.
    • We’ll use the mapply function to iterate the simulations over our grid.
    • To do this, we’ll create a function pow_sim that calculates a p-value for every simulation
    • We’ll store these p-values along with their sample size and width information.
  5. Using the simulation output we’ll calculate the simulated power for each sample size/range combination.
    • Power analyses work under the assumption that there is an effect to detect.
    • We can use an indicator variable (0, 1) to tell us whether or not a simulated result was “statistically significant”.
    • Then we average over the indicator variable for each sample size/range combination.
    • That’s power!
  6. Graphs, graphs, graphs!
library(tidyverse)
library(directlabels)


# Set a vector of sample sizes we want to try
sample_size <- seq(5,50,5)

# Create a vector of x-range widths to try
# This code may look strange, I'll try to explain what I'm doing
  # in class
# Essentially, I'm saying the total range of x can be [0,100]
# And we're seeing what happens when that width is centered on 50
  #and increases. Clear as mud?
spread_min <- seq(45,0,-5)
spread_max <- seq(55,100,5)

# This is the vector I was after.
diff <- spread_max-spread_min


# How many simulation replications?
# We'll try 100 replications per sample size/width combination

rep <- 1:100

# Expand grid creates all possible combinations of your vectors
# Very useful

sample_grid <- expand.grid(sample_size,diff,rep)


pow_fun=function(size, diff){
 # Generate our sequence of x variables based on the sample size
      # and width
    x = seq(from = 50-diff/2,
            to = 50+diff/2, 
            by = diff/size)
    
    # Generate y based on what we think would be biologically significant
      # using reasonable variance estimate from the literature.
    # What's "reasonable" will change based on topic and discipline.
    # Look to previous studies on similar topics and species for 
      # good values.
    y=10+x*0.5+rnorm(n = length(x),mean = 0,sd = 20)
    
    # Run the fit
    fit=lm(y~x)
    
    # Find and store the relevant p-value
    sum_fit=summary(fit)
    pval=coef(sum_fit)[2,4]
    sig <- pval
    res <- cbind(size,diff,sig)
    return(res)
}

OK! No that we have our grid and our power function, we need to be able to apply the function to all of the rows in our grid. We’ll use mapply. The apply family of functions are essentially for loops with benefits. mapply will run the function we created for each row in our grid and then spit out the results we asked for. It seems to spit out the transpose of the matrix we’re interested in, so we’ll use t() to get what we want.

Then, we’ll turn it into a data frame and name the columns.

result=mapply(pow_fun,size=sample_grid$Var1,diff=sample_grid$Var2)
result.dat=as.data.frame(t(result))
names(result.dat) <- c("sample_size", "width", "p")

Now that we have our data frame we’ll need to get what we really wanted: the simulated power for each sample size/range combination. We’ll use an ifelse statement to identify which results were “statistically significant”. If it was significance (\(p<0.05\)) then we code it as a \(1\), otherwise \(0\).

We can then use the tidyverse package (specifically dplyr I think) to summarise our indicator variable. If we take the average (mean) over the indicator variable for each sample size/range combination then it will calculate the simulated power.

result.dat2 <- result.dat %>%
  mutate(sig=ifelse(p<0.05, 1,0))

result.dat3 <- result.dat2%>%
  group_by(sample_size,width)%>%
  summarise(power=mean(sig))  

And of course we have to do graphs. Graphs, graphs, graphs!

If your planning to publish a graph with 3 variables you should consider a contour plot like this (3d doesn’t work well in print):

plot_new_contour <- ggplot(data = result.dat3, aes(x=sample_size, y=width, z=power)) + 
  geom_raster(data=result.dat3, aes(fill=power), show.legend = TRUE) +
  scale_fill_gradient(limits=range(result.dat3$power), high = 'blue', low = 'yellow') + 
  geom_contour(aes(colour = ..level..)) +
  scale_colour_gradient(guide = 'none')+
  labs(x = "Sample size", y = "Range/width of x", title = "Contour plot from a simple linear regression power simulation. Power was simulated for different sample sizes and ranges of x.") + theme(title =element_text(size=6, face='bold'), axis.title=element_text(size=12))

neat_plot = direct.label(plot_new_contour, 
                         list("far.from.others.borders", "calc.boxes", "enlarge.box", 
      hjust = 1, vjust = 1, box.color = NA, fill = "transparent", "draw.rects"))
neat_plot

You could also do a 3d plot if you like. This won’t show up in the html

library(rgl)
library(plot3D)
zmat <-  matrix(c(result.dat3$power),byrow=F,nrow=length(sample_size),ncol=length(diff))


persp3d(sample_size,diff,zmat,theta=30, phi=50, axes=TRUE,scale=2, box=TRUE, nticks=5,
        ticktype="detailed",col="orange",xlab="sample_size", ylab="diff", zlab="power",
        main="Power/sample size")