Before we get into my p-curve analysis I just want to make a few observations about research. With what I imagine are few exceptions, psychologists like other researchers, are fundamentally concerned with revealing the truth about the world-- and psychological experience more generally. This means that psychologists are committed to finding the truth about human experience, and by implication, would be firmly against publishing research that they knew was not a representation of that experience. For me, this means that I am always concerned about whether my findings will replicate, I believe they will, and I wouldn't publish them if I didn't believe this.
And yet, despite these motivations to search for truth, there are also real pressures to publish frequently and to present data that look beautiful. After all, frequently publishing beautiful data leads to jobs, prestige, and funding. These pressures could lead a researcher to search every corner of a data set in order to reveal some pattern in line with one's hypotheses. A good researcher engages in that sort of practice.
Of course, sometimes a researcher goes too far, and focuses more on pushing p-values below p < .05-- the conventional level for statistical significance-- and less on whether or not a finding will replicate. The p-curve analysis is designed to determine whether this is happening. The idea behind the p-curve is elegant: A real effect will have a distribution of p-values like the one below:
Does this mean that I don't engage in questionable research practices? No it doesn't. First let me outline the six questionable research practices that Simmons and colleagues (2011) note in their recent paper on false-positive findings:
(1)Terminating data collection only when p < .05
(2) Collecting fewer than 20 observations per condition
(3) Failure to list all variables
(4) Failure to report all experimental conditions
(5) Failure to report analyses with and without eliminating outliers
(6) Failure to report analyses with and without covariates
In my research, I've engaged in more than one of these practices on occasion. For example, I've added 20 observations to an experiment to push my data from p = .06 to p < .05. I've also collected variables that I didn't report in a paper. These tactics aren't necessarily going to lead to false-positive findings, but I can tell you confidently, that when I made the decision to add 20 people, or drop a variable, I did so at least in part because (1) I could justify doing so using common research conventions, and (2) doing so would lead to the presentation of more perfect data. I believe this is precisely what Simmons and colleagues (2011) warn of in their paper.
So there you have it, my own p-curve analysis. In general, I think that the Simonsohn et al., (2011) article does a great job of pointing out that some of our normal data analytic strategies can actually bias our hypothesis testing practices. Knowing this, I am planning to use more conservative statistical strategies going forward, and to be more transparent in my reporting of results. I also think that this analysis points (once again) to the importance of replication. Put simply, the definitive way to know a finding is real or based on biased hypothesis testing is to look across time, laboratory, researcher, and study. Finally, I still think it's ludicrous (as I said here) to judge a job candidate or a single paper based on this technique. I just don't think there are enough observations to make a judgment about biased hypothesis testing based on such a small number of p-values.
What are your reactions to the p-curve analysis? Let me know in the comments!
*Full disclosure #1 - The paper on p-curve analysis is not yet available (unpublished), so I am only conducting this analysis based on what I remember from the 20-min talk at the SPSP conference.
**Full disclosure #2 - I have only conducted this analysis on data that I myself analyzed so as not to implicate my co-authors in statistical techniques that could lead to biased hypothesis testing.
Simmons JP, Nelson LD, & Simonsohn U (2011). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological science, 22 (11), 1359-66 PMID: 22006061