Psych Your Mind: Friday Fun: One Researcher's P-Curve Analysis

Thursday, February 9, 2012

Friday Fun: One Researcher's P-Curve Analysis

It's Me!

Two weeks ago when PYM was at the annual conference of the Society for Personality and Social Psychology, I went to a symposium about false-positive findings in psychology (see my summary here). In the symposium, the speakers discussed the prevalence of research practices that result in biased statistical testing. In that symposium, one of the researchers, Uri Simonsohn, presented a method for catching people who engage in these practices: the P-curve analysis. What follows is a p-curve analysis of one researcher/blogger: Michael W. Kraus!

Before we get into my p-curve analysis I just want to make a few observations about research. With what I imagine are few exceptions, psychologists like other researchers, are fundamentally concerned with revealing the truth about the world-- and psychological experience more generally. This means that psychologists are committed to finding the truth about human experience, and by implication, would be firmly against publishing research that they knew was not a representation of that experience. For me, this means that I am always concerned about whether my findings will replicate, I believe they will, and I wouldn't publish them if I didn't believe this.

And yet, despite these motivations to search for truth, there are also real pressures to publish frequently and to present data that look beautiful. After all, frequently publishing beautiful data leads to jobs, prestige, and funding. These pressures could lead a researcher to search every corner of a data set in order to reveal some pattern in line with one's hypotheses. A good researcher engages in that sort of practice.

Of course, sometimes a researcher goes too far, and focuses more on pushing p-values below p < .05-- the conventional level for statistical significance-- and less on whether or not a finding will replicate. The p-curve analysis is designed to determine whether this is happening. The idea behind the p-curve is elegant: A real effect will have a distribution of p-values like the one below:

This p-curve reflects a low percentage of p-values nearest the conventional level of statistical significance. A questionable effect would have an abnormally high frequency of p-values close to p < .05, relative to past the p < .01 threshold-- suggesting that a researcher is pushing p-values past the threshold for statistical significance, just for the sake of reaching p < .05. Presumably, a distribution that looks different from the theoretical one would be evidence for using biased statistical techniques in data analysis.* Now, let's look at my own p-curve for all of my first-authored empirical papers.**

In the paper on the p-curve analysis, Simonsohn and colleagues presumably have a statistical technique to test the observed distribution against the theoretical one. Since we don't have access yet to the paper, we can only eye-ball the difference between my own distribution and the theoretical one. I'd say that it's looking similar to the theoretical one (phew).

Does this mean that I don't engage in questionable research practices? No it doesn't. First let me outline the six questionable research practices that Simmons and colleagues (2011) note in their recent paper on false-positive findings:

(1)Terminating data collection only when p < .05
(2) Collecting fewer than 20 observations per condition
(3) Failure to list all variables
(4) Failure to report all experimental conditions
(5) Failure to report analyses with and without eliminating outliers
(6) Failure to report analyses with and without covariates

In my research, I've engaged in more than one of these practices on occasion. For example, I've added 20 observations to an experiment to push my data from p = .06 to p < .05. I've also collected variables that I didn't report in a paper. These tactics aren't necessarily going to lead to false-positive findings, but I can tell you confidently, that when I made the decision to add 20 people, or drop a variable, I did so at least in part because (1) I could justify doing so using common research conventions, and (2) doing so would lead to the presentation of more perfect data. I believe this is precisely what Simmons and colleagues (2011) warn of in their paper.

So there you have it, my own p-curve analysis. In general, I think that the Simonsohn et al., (2011) article does a great job of pointing out that some of our normal data analytic strategies can actually bias our hypothesis testing practices. Knowing this, I am planning to use more conservative statistical strategies going forward, and to be more transparent in my reporting of results. I also think that this analysis points (once again) to the importance of replication. Put simply, the definitive way to know a finding is real or based on biased hypothesis testing is to look across time, laboratory, researcher, and study. Finally, I still think it's ludicrous (as I said here) to judge a job candidate or a single paper based on this technique. I just don't think there are enough observations to make a judgment about biased hypothesis testing based on such a small number of p-values.

What are your reactions to the p-curve analysis? Let me know in the comments!

*Full disclosure #1 - The paper on p-curve analysis is not yet available (unpublished), so I am only conducting this analysis based on what I remember from the 20-min talk at the SPSP conference.

**Full disclosure #2 - I have only conducted this analysis on data that I myself analyzed so as not to implicate my co-authors in statistical techniques that could lead to biased hypothesis testing.

Simmons JP, Nelson LD, & Simonsohn U (2011). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological science, 22 (11), 1359-66 PMID: 22006061

12 comments:

Chris CrandallFebruary 10, 2012 at 4:52 PM
If you're convinced to "use more conservative statistical techniques" then the terrorists have won.

There is nothing in their paper that should lead you to this conclusion. Keep in mind that Every move toward reducing Type I error is a move to INCREASE Type II error. It's a value judgment, and you've been convinced to go with the conservatives. Next you'll be voting for Ronald Reagan and bemoaning the end of traditional marriage--you're already moving toward defending the scientific stats quo byaling it harder to find things that are ACTUALLY there by obsessing about Type I error. If science is self-correcting, then Type II error is a much bigger problem.
ReplyDelete
Replies
Norbert SchwarzFebruary 13, 2012 at 1:41 PM
Hmm, why would you conduct an analysis based on an unknown method in a paper that's not yet available, relying on your memory from a talk? Did you know which p-values to use? Are they independent of one another? I suppose the p-values from the same experiment are not independent etc. Much too many open questions to run around doing stuff like that.
ReplyDelete
Replies
thomFebruary 16, 2012 at 1:10 PM
I enjoyed the post and I think you are right to have some reservations. A quick (very crude) simulation suggests that you need at least a hundred or so p values to be confident of getting the theoretically expected profile (and possibly more):

http://psychologicalstatistics.blogspot.com/2012/02/simulating-p-curves-and-detecting-dodgy.html
ReplyDelete
Replies
BTJune 26, 2013 at 6:20 AM
Great post. How did you manage to get the marginal/ns effects in there? I was under the impression that the online calculator leaves those out.
It seems odd to me that the p-curve analysis does not allow for marginal/ns results to be included. I mean, I have more than one published papers in which I report p-values of .06 up to .09. Leaving those out would enhance the chances of being accused of hacking I suppose.
ReplyDelete
Replies