In the first talk, Leslie John of Harvard University bravely discussed the prevalence of questionable data analysis practices in our field. The short answer: People engage in many data collection and analysis strategies that bias hypothesis testing and contribute to the publishing of null findings.
In the second talk, Joe Simmons from UPenn presented findings from a research paper suggesting that false findings are preventable with a few key changes in the way people report results and journals review papers. In summary, Simmons suggested that researchers should report justifications for sample size cutoffs, conduct analyses without covariates (and with), report a list of all measures used in a study, collect samples of sufficient power (n = 20), among other things. In addition, Simmons urged reviewers to put more emphasis on exact (rather than conceptual) replications, and to give authors a little slack for imperfect findings. Simmons further suggests that our natural tendencies to justify any data analytic strategy that works as the correct one, necessitates a mandate to adhere to clear rules for data reporting.
In the final talk, Uri Simonsohn of UPenn discussed what he refers to as “p-hacking.” P-hacking is the idea that if researchers are engaging in questionable analysis practices, then they should have a disproportionate number of findings at or close to the p < .05 threshold for statistical significance, and that this can be relatively easy to detect. Simonsohn then presented data showing that Daniel Kahneman publishes findings that are real, using this p-hacking technique, and then suggested some other uses for this technique—including to assess whether a journal publishes many false positive findings, or whether a job candidate’s data can be trusted.
I think that overall, there are some very interesting talking points worth considering in this symposium. For instance, the premium on perfect data that reviewers sometimes focus on would be nice to reign in. I also think the practice of collecting a million exploratory variables, correlating them, and seeing what relations emerge needs to be stopped. I also find the idea of p-hacking to be absolutely fascinating and I’m planning to do it to myself when I get home from the conference (a topic for a later blog entry).
I do however, have some reservations about the critical points in the symposium. First, I don’t think the researchers are considering the real monetary/time costs for some forms of data collection. For example, if my research question involves a special sample, and my limited access to this hypothetical sample requires that I collect all data during a set time, a researcher might collect a bunch of variables. Then, when the original hypothesis does not find support with the data, I cannot justify (1) not exploring the data or (2) exploring the data, but not publishing those results [Both of these possibilities were proposed by the speakers]. That’s just not a practical solution given the money and time costs of collecting such special set of participants.
I also wondered where a person should stop when they report findings? Do they report the order of the measures? The race/gender of the experimenters? The day of the week? What is enough for total transperancy?
Third, when Simonsohn mentioned using p-hacking to check on whether researchers are faking data or not, I became concerned: A job candidate has very few papers to their name and I wondered if the speakers even know how many research papers it would take to have a reliable estimate about whether someone is falsifying data using this methodology.
Finally, isn’t putting greater emphasis on exact replications a more parsimonious solution? If a person’s findings can be replicated, then by definition, the findings are real. The other good thing about exact replications is that researchers don’t get into a potential witch hunt.
What do you think about these ideas? I’d love to read your comments!