When scientists conduct studies involving functional Magnetic Resonance Imaging (fMRI) technology, they are able to get a sense of what regions of the brain are active during specific points in time. This information can help researchers understand whether tasks or stimuli – say, seeing a happy face versus a sad one – produce measurable changes in certain areas of the brain associated with emotion.
But with this method comes a need to understand the data, and to identify standards to produce more verifiable results and phase out practices that can introduce errors.
At the Center for Healthy Minds, Associate Scientist Jeanette Mumford provides statistical expertise for the Center’s studies involving fMRI and produces videos and workshops showcasing the latest methods in the field.
Mumford shares how challenges in the field are shaping it for the better.
What drew you to statistics and neuroscience?
I was originally going to be a high school math and physics teacher, but then decided to continue my training and attend graduate school. My undergraduate advisor pushed me in the direction of biostatistics since I was more interested in applied work, as opposed to theoretical. Two years into my graduate work in the Department of Biostatistics at the University of Michigan, I had to choose a specialty and was drawn to fMRI because it was new and different. I thought it was an interesting type of data with a lot of unique problems due to the size of the data sets and the time and spatial characteristics.
What is the role of statistics when studying brain data?
Generally, all statistical analyses start with a hypothesis and a data set. For example, one hypothesis might look something like this: if we looked at brain activation between two groups of people, those who were depressed and those who were not depressed, we predict the brain activation when viewing negative images would be larger for the depressed group. The data to confirm that this is true could consist of measurable brain activity when people with depression viewed negative images compared to people without depression.
The model we use would estimate the brain activation while viewing negative images for both groups of people, and we then evaluate whether these brain activation values are statistically different while taking into account the variability in the data. When we view poll results for the upcoming election, we typically view the polling percentages while taking into account the margin of error. A 5-point lead with a 6-point margin of error is different from a 5-point lead with a 1-point margin of error. With fMRI analysis, we’re basically doing the same thing, but use a more analytical process.
Statistically, there are two primary errors that we worry about in our analyses. The first is concluding an effect is statistically significant when, in fact, there is no effect present. In other words, what we call a “false positive.” The second is missing an effect that is truly there. To avoid the second problem, we try to collect data from as many participants as possible, as this increases our ability to find true relationships in our data. For the first problem of false positives, we rely on special statistical methodologies to control the false positive rate.
"When we view poll results for the upcoming election, we typically view the polling percentages while taking into account the margin of error... With fMRI analysis, we’re basically doing the same thing, but use a more analytical process."
Typically, our goal is to control the false positive rate at 5 percent. This is quite easy to do when you’re running a single analysis with standard statistics. In the case of fMRI data, the brain is divided into 100,000 to 200,000 subunits called voxels, and each voxel contains a set of data, consisting of each participant’s brain activation across space. If you’re running 100,000 analyses, with a known error control of 5 percent, that is a very large number of false positives (5,000). Therefore, fMRI analyses require special strategies for determining where statistically significant effects are located in the brain while controlling the false positive rate.
A paper came out recently suggesting that conclusions from many neuroscience studies could be wrong because of a statistical error found in a commonly used computer program. How has this factored into the Center’s research?
The paper, “Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates” by Eklund, Nichols and Knutsson that appeared in the journal PNAS early this year really caused quite a stir. This is a really important paper for our field, and I think it has had a positive impact. First, it is worth mentioning that some splashy wording that was used in the original manuscript overstated the problem, and the authors have submitted a correction. The original wording implied that the validity of 40,000 fMRI studies was questionable, while the revised wording instead focuses on questioning weakly significant findings. In fact, the estimate of the number of questionable results, estimated by Tom Nichols (coauthor of that work), is closer to 3,500.
An important finding in the paper was that one of the most popular software packages used to analyze fMRI data, AFNI, had an error in it that increased the number of false positives. This error was quickly fixed by the AFNI developers. Another finding estimated very large false positive rates for an ad hoc method that is commonly used, although it has no foundation in statistics for determining statistical significance. The rest of the findings provide the information we need to control false positives according to the software we’re using.
So, what does this mean for the fMRI work in this lab? First, focusing on the past. The Cluster Failure results apply to when a whole brain analysis is used, requiring 100,000 tests or more, but many of our analyses focus on a specific region of the brain or a “region of interest,” where only a single model is run and controlling false positives is much easier. Have there been whole brain analyses published by our lab that used the flawed version of AFNI? Since I joined the lab in July of 2014, I have not seen anybody use this method on any projects I have collaborated on.
Moving forward, how does this change things? This new information supplies us with a couple of easy-to-implement strategies for ensuring false positive rates are controlled. We were employing some of these strategies in some of our work previously, but now all of our whole brain analyses will have carefully controlled false positive rates.