Monday, September 24, 2012

Science Utopia: Some Thoughts About Ethics and Publication Bias

Science Utopia, next exit
Psychology's integrity in the public eye has been rocked by recent high profile discoveries of data fabrication (here, here, and here) and several independent realizations that psychologists (this is not unique to our field) tend to engage in data analytic practices that allow researchers to find positive results (here, here, and here). While it can be argued that these are not really new realizations (here), the net effect has turned psychologists to the important question: How do we reform our science?

It's a hard question to answer in one empirical article, or one blog post, and so that's not the focus here. Instead, what I'd like to do is simply point out what I think are the most promising changes that we, as a science, can adopt right now to move toward a solution that will help prevent future data fabrication or the use of biased hypothesis tests. These are not my ideas mind you, rather, they are ideas brought up in the many discussions of research reform (online and in person) that I have had formally and informally with my colleagues. Where possible, I link to the relevant sources for additional information.


(1) PUBLISH REPLICATIONS
Researchers have long highlighted the importance of replication, but in practice, empirical journals haven't exactly been supportive. For instance, the Journal of Personality and Social Psychology, the flagship journal of our science, actually has a rule that they do not publish replication research. That's criminal.

I think making replications a higher priority with higher visibility remains the best way to improve the integrity of our science. First, fabricated data surely won't replicate, so the publication of replication studies can go a long way to ferreting out fake findings without real data. Second, if the personality psychologist David Funder has his way, journals who publish results that do not replicate should be responsible for publishing the non-replication results. In Funder's words, the journals would have to "clean up their own mess." This would increase the visibility of replication studies, and would also decrease the chance that unreliable results would continue to influence the literature.

In particular, I'm intrigued by an idea that was alluded to in Nosek & Bar-Anan's (2012) paper on Science Utopia (though admittedly, they go much farther than I am suggesting): that original studies should be published along side online replication reports. These replication reports could be easily linked to the original studies online, with minimal cost. So in short, not only is replication a major positive, but the costs to implement this change are actually minimal.

(2) MAKE DATA PUBLIC
Researchers are sometimes reticent to make their data available to other researchers (and sometimes, they place gag orders on data sharing). There are various reasons for these concerns: Some researchers do not want others to poke around their data, not because they have something to hide, but because they are planning to poke around in the data themselves. Thus, making data available to others raises the possibility that others will find interesting results in the data that you painstakingly collected.

I think this is a valid concern but I believe that data should be made available, in the very least, to replicate the original analyses reported in published manuscripts. Uri Simonsohn has recently highlighted the importance of publishing raw data: having raw data allows for the ferreting out of extremely questionable research practices--such as the fabrication of whole data sets. Most federal granting agencies require some form of data sharing. It's time that our science did as well.

(3) ACT MORE LIKE SOCIAL PSYCHOLOGISTS
One of the primary reactions that I see when researchers find out about results that don't replicate, or about someone who has faked data is to treat that person like an outlier--an unethical and immoral windbag of a "researcher" who is so different from the rest of the research community. This is a mistake because we are actually minimizing the unethical research practices problem to a few bad apples. The reality might be much more sobering: Some form of unethical data practice is actually quite common in our field.

Two lines of reasoning lead me to this conclusion: First, most researchers will tell you that they have a file drawer where they keep all their failed studies. This is common practice, but it's also a major problem for the research community because the literature only reflects the studies that work. Second, we tend to engage in data analytic practices that bias statistical tests. According to a study conducted by John and colleagues (2012) these practices are conducted by the majority of researchers in psychology.

So it seems that practices that bias the scientific literature are actually quite widespread in our field. Instead of focusing on the outliers that give our field a black eye, as social psychologists we should be acting more like social psychologists: Specifically, we should be asking the question-- what are the situational factors that make these unethical data practices more likely? Just to name a few factors: (1) Replication research is not published,(2) Researchers are not required to make their data available, (3) Research fame is too important, (4) Whistleblowing incentives are low, and (5) Publication pressure is at an all time high. A few small changes in our field (see #1 and #2 above) could reduce the press of these situational factors on researchers.

(4) STOP THE PEARL-CLUTCHING COUCH FAINTING ROUTINE
So how do psychologists typically react when other researchers fail to replicate their findings or question their data analytic practices? The reaction typically moves in one of two directions: First, some psychologists go on the offensive, claiming that the failed replication attempts are flawed, the researchers who conducted the replications are stupid, and that the journal that published the replications is unethical (here). This reaction is obviously bad for science because it damages the reputation of the journal and the researcher who is doing what amounts to a great service to the scientific community--investigating whether or not a finding is real or fluke. In politics, where the truth seems to have many faces this might be okay, but we're scientists--and the truth is out there to be discovered. Replication studies (and yes, even failures of replication) are about truth finding and so we should encourage, and not disparage, attempts of replication.

The second reaction is the pearl-clutching and couch fainting reaction I referred to above. When researchers are accused of biases in data analysis they typically claim ignorance--they tend to say things like "Oops, I'm not sure where the bias came from!"

I have two responses to that reaction: First, if you honestly don't know where bias comes from in data analysis you should very carefully read one of several blog posts (here and here) or empirical articles (here and here) on the subject. Get your head out from under a rock and start taking responsibility for running studies with better research design and data analytic approaches. Second, if you are aware of the bias in your own data, it's time to start admitting it and making some important changes to the way we conduct analyses and run studies. We can reform the way we do research and we don't need a witch hunt to do it. If responsible researchers can start admitting to themselves that they need to clean up their act, new graduate students will follow. I for one am ready to be a part of the solution.

Here are some helpful links to a number of other blog entries related to this topic:

Random Assignment - Dave Nussbaum
Fraud Detection
Reforming Social Psychology

Personality Interest Group - Espresso - Brent Roberts
More Bem Fallout
Cycling and Social Psychology

Hardest Science - Sanjay Srivastava
Bargh-Doyen Debate
Replication, Period.

SPSP's Official Stance

Psych-Your-Mind posts on this topic
Looking ahead to the future of research
P-curves
Reactions from SPSP 2012

References:

John, L., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23, 524-532 DOI: 10.1177/0956797611430953

Simmons JP, Nelson LD, & Simonsohn U (2011). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological science, 22 (11), 1359-66 PMID: 22006061

6 comments:

  1. Hi Michael - very nice and thoughtful post. I agree on all points. Of course, the hard part is getting these changes implemented. Many of the ideas you proposed have been around for decades, and yet we are still a long way from reform. We face a classic collective action problem: While the system as a whole would benefit from these changes, individual actors/journals/authors do not benefit from being the first mover.

    By far the best way to solve a collective action problem is if an external force changes the incentive structure so that individual actors benefit from reforming. It is clear to me that only the granting agencies can apply this force, by rewarding scientists (using grant preferences) who follow good practices and who submit to good-practice journals.
    http://filedrawer.wordpress.com/2012/04/17/its-the-incentives-structure-people-why-science-reform-must-come-from-the-granting-agencies/

    ReplyDelete
    Replies
    1. Agreed. And i think that changes #1 and #2 could be relatively easy and effective. Thanks for commenting!

      Delete
  2. A great post, these are very important issues. Structural change can be difficult, but possible with the right incentives (directed by grant bodies and/or other government agencies).

    However, at least some of the problem begins with low initial standards. Many scientific fields have come to accept a surprisingly low threshold for declaring a significant effect (i.e., P<.05). Even assuming perfect scientific practices (i.e. no peeking, cherry picking, etc), the potential false positive rate is unacceptably high (i.e., one-in-twenty). This results in far too many questionable results being published in the first place, and therefore a huge burden on any replication programme.

    If we shift some of the burden back on to the researchers, demand a much higher statistical threshold for publishing, we would dramatically reduce the need for replication. This would make any proposed system much more likely to succeed.

    I have also argued that a more conservative statistical buffer zone will help discourage bad scientific practices, and ultimately outright fraud. See http://the-brain-box.blogspot.co.uk/2012/09/must-we-really-accept-1-in-20-false.html

    ReplyDelete
    Replies
    1. That's interesting. Although i think science always needs replication. Thanks for your comments!

      Delete
    2. Chadly Stern & David KalksteinSeptember 25, 2012 at 5:47 PM

      Dear StokesBlog,

      We believe that you are incorrect in your reasoning. You state that, “even assuming perfect scientific practices… the potential false positive rate is unacceptably high (i.e., one-in-twenty)”. However this is an incorrect interpretation of null hypothesis testing.

      Why is it wrong to say that 5% of scientific findings are false positives? It is wrong because Type 1 error is defined conditional on the null hypothesis being true. That is to say, a p-value represents the probability of the data given the null hypothesis, NOT the probability of the null hypothesis given the data.

      With brief consideration it is clear that the null hypothesis is virtually never true. Typically the null hypothesis states that there is no association or difference between separate variables in a study. However, in reality, it is extremely unlikely that the observed association between variables in a given study will be exactly zero.

      We do not mean to imply that the field should exclusively rely on p-values (in fact, we endorse reporting confidence intervals and effect sizes), but we wish to correct thinking that threatens to undermine the integrity of our field.

      For further reading see:

      Cohen, J. (1994). The earth is round (p <.05). American Psychologist, 49(2), 997-1003.

      Cumming, G. & Finch, S. (2005). Inference by eye: Confidence intervals and how to read pictures of data. American Psychologist, 60(2), 170-180.

      Delete
    3. Thanks Chadly Stern and David Kalkstein for clearing this up, and for your comment.

      Delete