By Alexander Danvers

What do you want out of your social psychology research?

The obvious—and dominant—answer is to explain how the mind works. The statistical methods typically employed by psychologists are set up to answer questions related to cause and effect.

But this is not the only way to approach science—or statistical methodology. In a preprint paper currently under review, researchers Tal Yarkoni and Jacob Westfall suggest that psychologists should shift their emphasis in the direction of prediction.

The difference is not just one of semantics—it is one related to how to conduct and evaluate research.

In a typical psychology study, the main criterion for publication is the p-value. Often misinterpreted and debated, in psychology it is typically used to test the null hypothesis and interpreted in a frequentist framework.

Psychology studies attempt to reject the null hypothesis that an effect is zero, and thus show that there is an effect. The p-value corresponds to the long run probability of incorrectly rejecting the null hypothesis.

For example, I might ask: is the effect of emotion on memory zero? If I find that my data are unlikely given the null (a zero effect), I would present this as evidence that there is an effect—emotion affects memory.

On the other hand, Yarkoni and Westfall suggest a criterion for evaluating studies borrowed from the Machine Learning: cross-validation. Cross-validation is a process whereby a predictive model is “tuned” on a training data set, and used to predict data in a test data set. Researchers don’t concern themselves with whether or not their model does a good job predicting the training data—the real test is whether they can predict the test data. 

The most popular form of cross-validation, k-fold cross validation, involves cutting data into slices, one of which is held out for testing, and all the rest of which are used for training. The process is then repeated, with a new slice being held out for training, until training has occurred on each slice. This reduces the amount of data needed to do effective cross-validation.

 

Cross-validation thus inherently values generalization. The question being answered is not what would happen if I repeated this exact experiment over again—it is whether the particular effect would be seen in a different group of people.

For example, I might ask: does the effect of emotion on memory I see in one sample help me predict the effect of emotion on memory in a new sample?

If this sounds to you like another important contemporary issue in psychology, then you may anticipate the analogy Yarkoni and Westfall draw between the “replication crisis” and a predictive approach to psychology.

Yarkoni and Westfall suggest that we can think of studies that do not replicate as over fit. When creating statistical models, a researcher can get a “perfect fit”—defined by 100% predictive accuracy—to any data set simply by adding more predictive variables (parameters). The model can do this by capitalizing on idiosyncratic relationships between variables that are not likely to be seen in new data.

For example, in a particular data set, I might find that having a score of six on positivity of emotional experience—but only among left-handed, French men over six feet tall—perfectly predicts the observed memory score of five. But this was just due to the fact that the one tall, French, left-handed man in the data set scored five on the memory test. It’s unlikely that this will give a perfect prediction for new tall, French, left-handed men.

Because Machine Learning uses prediction as a main criterion for success, over-fitting has been a larger theoretical concern. Cross-validation is a method that implicitly includes a penalty for over-fitting: while the model making specific predictions about the tall, French lefty will do better in the training sample, it will do worse in the test sample. A model that just includes emotion might actually do better predicting new data—because it includes only predictors that generalize.

A statistically significant p-value, on the other hand, doesn’t have this trade-off built into it. When exploring data—as scientists should, in order to uncover patterns—the p-value can lead researchers to believe they have found a real relationship, but instead they have just over-fit their model.

Yarkoni and Westfall’s suggest cross-validation as a criterion for success in psychology as a way to implicitly take replication into account. In a sense, cross-validation is a way of replicating your effect on your own data. Using cross-validation as a measure of evidence can increase the flexibility with which researchers can explore their data—while at the same time making their results more reliable.

Some psychologists have embraced the replication movement in social psychology, but others find it negative and pessimistic. It’s easy to blame others and show how their work isn’t good, but it’s harder to put forward your own ideas and research. Yarkoni and Westfall’s argument for borrowing from Machine Learning approaches in psychology appeals to me because it is proactive. The article isn’t focused on “clearing out” previous studies that are unreliable from the literature: it’s about how to grow better evidence.

Cross-validation as a criterion will not be a panacea. There may be problems with this analytic strategy that become apparent as it is adopted by more researchers. And ultimately predictive models should be used to the extent that they can assist our ultimate scientific goal of explaining mind and behavior. Given the success of Big Data and Machine Learning across science and industry more broadly, it’s time that social psychologists embrace prediction.

Resources on Machine Learning:

Yarkoni and Westfall reference the text Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Frieman. A free PDF copy of this book, with lectures, is available online:

http://statweb.stanford.edu/~tibs/ElemStatLearn/

Every Summer, the Prof. Kevin Grimm at Arizona State University (from whom I received training in Machine Learning), teaches a week-long workshop on Data Mining hosted through the American Psychological Association:

http://www.apa.org/science/resources/ati/data-mining.aspx

One of the most popular offerings on Coursera (which provides free online courses) is Machine Learning by Andrew Ng of Stanford University:

https://www.coursera.org/learn/machine-learning

The lectures from this course are also available via YouTube:

https://www.youtube.com/playlist?list=PLA89DCFA6ADACE599

There is a collaborative, open source ethos behind many Machine Learning materials. More great materials are available—let your search engine be your guide.

 


Alex Danvers is a social psychology PhD student interested in emotions, social interactions, and friendship. He is also interested in applying new methods to his research, including—most recently—Machine Learning techniques. @Alex_Danvers on Twitter.

Reference:

Yarkoni, T., & Westfall, J. (under review). Choosing prediction over explanation in psychology: Lessons from machine learning.