Thursday, March 13, 2014

I'm Using the New Statistics

Do you remember your elementary school science project? Mine was about ant poison. I mixed borax with sugar and put that mixture outside our house during the summer in a carefully crafted/aesthetically pleasing "ant motel." My prediction, I think, was that we would kill ants just like in the conventional ant killing brands, but we'd do so in an aesthetically pleasing way. In retrospect, not sure I was cut out for science back then.

Anyway, from what I remember about that process, there was a clear study design and articulation of a hypothesis--a prediction about what I expected to happen in the experiment. Years later, I would learn more about hypothesis testing in undergraduate and graduate statistical courses on my way to a social psychology PhD. For that degree, Null Hypothesis Significance Testing (NHST) would be my go-to method of inferential statistics.

In NHST, I have come to an unhealthy worship of p-values--the statistic expressing the probability of the data showing the observed relationship between variables X and Y, if the null hypothesis (of no relationship) were true. If p < .05 rejoice! If p < .10 claim emerging trends/marginal significance and be cautiously optimistic. If p > .10 find another profession. By NHST standards, an experiment fails or succeeds based solely on this one statistic.

When the Association of Psychological Science proposed using an alternative statistical approach--something called the New Statistics (actually not new, been around for decades)--I was intrigued about the possibility of living an academic life beyond the tyranny of p < .05.

Just what are the new statistics? As an alternative to NHST, the new statistics are a slightly modified approach to research design and data analysis. Rather than designing studies to test hypotheses and verify the existence of effects, researchers are encouraged to measure things with precision. More specifically, instead of verifying whether two groups differ in an experiment, measure the extent the groups differ and provide data on the precision of this estimate--usually this refers to 95 % confidence intervals surrounding the mean. As for p-values, do away with them because (A) information contained within p-values is redundant with information in 95% confidence interval estimates, and (B) p-values are not a reliable way to assess the precision of a research design (they fluctuate too much based on sample size). Finally, researchers are encouraged to use meta-analytic techniques--present effect sizes and confidence intervals across studies within a single paper--to provide the pieces of information that will help build a cumulative science. 

[If you'd like a more detailed description of the new statistics, I advice you to read the paper Geoff Cumming wrote for APS--it's brilliant AND accessible! Huzzah!!!]

Are you using the new statistics? Yes. I just completed a first draft of my second full paper using the new statistics. It was pretty difficult to start because I have been designing studies my entire career focused exclusively on NHST. Converting predictions like In this experiment X and Y will differ to non-NHST language (e.g., we measured the magnitude of the difference between X and Y) is still a challenge for me. 

I also had a little trouble removing the p-values from my results section. In the Cumming (2014) paper, he argues that researchers should give up p-values outright, and not flip between p-values and confidence intervals. This includes scrubbing the word "significant" from your manuscript--results are not significant, rather, the magnitude of differences between groups are measured. I am still trying to convert to this language in my newest papers.

One other difficult thing about using the new statistics is that reporting confidence intervals really requires an embrace of the ambiguity in research: For example, one (traditionally significant at p < .05) correlation in the study has a 95% confidence that goes from [.001 to .23]. That's a huge range of possible correlations that shift drastically from meaningless to medium-sized. Using this statistical approach reveals just how gosh darn important it is for researchers to use larger sample sizes--because large samples decrease the size of the confidence interval band.

What do you like most about the new statistics? One big positive I feel from using the new statistics is that a cumulative science is the top priority. Reporting 95% confidence intervals and ignoring p-values forces researchers to confront the precision of their measurements in a direct way. When I see a 95% confidence interval surrounding my measurements I can be pretty sure that a replication attempt on my research will fall somewhere within this confidence band. Good methods will necessarily yield narrow confidence bands and will make better designed studies much easier to spot and praise.

Also, I like my freedom from the p-value. I'm tired of chasing p < .05 (although, unless I expect all my papers to be published at APS, I'm not free from this entirely) and would rather spend time thinking about designing studies with solid methods that ask interesting social psychological questions. Producing a cumulative science will help all of us do this better, and importantly, our cumulative effort will reveal the truly robust experimental manipulations and theories worth studying in our field. So far, I'm sold on the new statistics and I hope other journals follow the lead of APS!


Cumming, G. (2014). The New Statistics Why and How. Psychological science,25(1), 7-29.

No comments:

Post a Comment