Ryan Boyd's Software Minimizes the Tedium of Language Analysis

Image of Ryan Boyd with the text "Minimizing the Tedium of Language Analysis"

Ryan Boyd is a social psychologist / computational social scientist at the University of Texas at Austin. His research examines language, personality, and motivations. In addition, he is a freelance consultant, analyst, and data scientist with formal training in statistics, data mining, and machine learning.

First, tell us a bit about your work/research.

Most of it deals with language analysis in some form or another. You can extract a lot of information about a person's psychology based on some subtle —and not-so-subtle — patterns in a person's words. For example, you can measure depression from linguistic patterns, as depressed people tend to use a lot of first person singular pronouns, less social language, and so on. Similarly, you can predict with fairly high accuracy whether a convicted criminal will reoffend based on the language they used while planning their original crime. Did they use a lot of language that was certain/determined, etc.? We use a lot of these types of analyses to better understand people and what makes them tick. Another example: what types of psychological processes tend to make a person liberal or conservative? We find that conservatives tend to talk more about power and are more past-focused, whereas liberals tend to talk more about affiliation and are more future-oriented in the language. Essentially, they are focused on different things in their environment, and they view the world differently. These psychological tendencies bleed over into how they talk in the real world with friends, coworkers, on Facebook, and so on.

How does computer software fit in with language analysis?

My work involves using a large number of different tools, programming languages, and methods for automating the analysis of language data. Rather than having someone manually go through a transcript to analyze the language used, we can use some cool computational techniques instead. Many of these techniques have started to catch on in the past few years in different areas of research, including fields outside of psychology.

But as more people have started to use these new language analysis techniques, they have also discovered that language data can be absurdly messy and difficult to work with. Many of the more mainstream text analysis methods have point-and-click software to help analyze your language data, but you first need to get your data into a format that can be analyzed. This process can range in difficulty from “super simple” all the way up to “oh god just kill me now” levels of monotony.

Why is it so monotonous?

There's a long history of language analysis in psychology, particularly in clinical, social, and personality psychology. People have been collecting social interaction data for decades in all kinds of studies — mock jury studies, couples' therapy interactions, you name it. A lot of researchers have started to revisit their old data using these new text analysis methods. The most common format that people keep and format these social interactions is in the form of transcripts. This is the most “natural” way to keep your data, as it preserves who says what, and in what order.

However, most techniques for measuring psychological information from texts requires that you look at each person individually. This means having to manually copy and paste each person's text into separate files. Given that researchers often have hours upon hours worth of transcripts for hundreds (or even thousands) of dyadic and group interactions, this task can be nightmarishly tedious. Anyone who works with transcript data knows this pain all too well. People have felt for a long time that there has to be an easier way to do it, but unfortunately no one had made a tool to help with the process.

What prompted you to take that leap and create a software program?

Thankfully, I rarely work with transcript data and, when I do, it's usually in helping someone else to parse out their data. However, I teach language analysis techniques and run workshops from time to time, and it usually involves learning how to use simple tools to accomplish research goals. At my workshops, it absolutely killed me inside to have people show up who were really excited to work with their transcript data, but soon realized that there weren’t any tools yet to help them work with their data / separate their data out for them. Essentially, people with transcript data all know about this hurdle, and hoped to learn about anything that could make the problem go away, but would come away resigned to their fate of manually copy/pasting lines of text from files.

For a long time, I had wanted to build a tool to help people with this enormous task, but life is short and we're all super busy, so I figured that someone else would handle it in the meantime. But I finally hit a point where I just had to do this – too many people need something like this, and if I didn’t make it, it didn’t look like anyone else was going to. I literally woke up early one day and just hammered it out in a couple of hours. I really do have to emphasize that this program is (conceptually) very simple, but it also solves a long-standing and very common procedural problem in the social sciences.

Might folks outside of psychology have a use for the program?

I've created a lot of other niche text analysis/preparation programs in the past, but I think this one will be useful to many more people. For example, people in business and marketing who conduct focus groups, people in sociology and anthropology who conduct ethnographic interviews, even people in economics often look at language samples from quarterly earnings Q&A’s. Really any field that deals with people in some form are going to have researchers that work with conversational/transcript data, and a significant subset of them will have a need to split it out for later analysis.

What has the response to the program been like so far?

I've been surprised by the positive feedback I've received. Within the first five minutes of posting it, I received probably ten private messages from people who were overflowing with gratitude. My website received hundreds of hits on that first day, which just reaffirms to me that there was definitely an unmet need. I also enjoy that people are surprised to find out that the software is free, as academics and researchers normally expect to pay through the nose for software. It's important to me that researchers not be prevented from doing important work simply because they don't have a massive bank account.

What has been your takeaway from the experience?

In the end, it's a small thing, but even if it makes just one person's day a little easier, then that's good enough for me! And hopefully they will find something constructive to do with all of their newly found free time that would have been spent copying and pasting lines of text, like learning how to knit or starting a psychology wrestling federation.

To learn more about Ryan’s transcript-parsing program ConverSplitter Plus! —and to download a free copy—visit https://conversplitter.ryanb.cc.

To learn more about Ryan and his other software ventures, visit https://ryanb.cc.