Kofi Arhin on Using AI to Moderate Content

Image by iStock/MF3d

In this episode of Lehigh University’s College of Business ilLUminate podcast, host Stephanie Veto talks with Dr. Kofi Arhin about his research on using artificial intelligence to help moderate harmful online content, including hate speech. His most recent paper, co-authored with Dr. Haiyan Jia, Dr. Dominic Packer, and Ph.D. student Karleigh Groves, is currently under review.

Dr. Arhin is an assistant professor in the Decision and Technology Analytics department. His research interests include artificial intelligence design and implementation, information security, ethical issues in IS, human-computer interaction, and web technologies.

Listen to the podcast here and subscribe and download Lehigh Business on Apple Podcasts or wherever you get your podcasts.

Below is an edited excerpt from the conversation. Read the complete podcast transcript [PDF].

Veto: Can you discuss the difference between machine learning and AI?

Arhin: My take is that you can think of AI as the broad umbrella. So when we talk about AI, we are looking at building systems or machines that make decisions like human beings. These systems should exhibit some human intelligence, whether it's decision making or solving problems, right? And then machine learning is a subset of that focused on creating systems that make decisions based on a lot of data or the amount of data that we feed it.

For example, if I want a computer to be able to distinguish between a cat and a dog, which is like one of the popular examples out there, I'm going to feed it with a lot of cat pictures and a lot of dog pictures. And then I'll leave the machine to figure out where the differences are. And so those are different branches of machine learning.

And when you hear supervised, unsupervised, reinforcement learning, all of them do different things, but they are all under the umbrella of AI.

Veto: Can you break down how AI is trained to do certain jobs like moderate hate speech online?

Arhin: When we are training AI models, our goal is to get it to make decisions like humans. What are some decisions that humans make? For example, should I hire this candidate or should I publish this marketing promo? When is the best time to put out this product, and so on?

When it comes to natural content moderation, we are training models to decide whether this content should be made public. Should we allow other people to see this content? Should we approve the publication of this content? And so on and so forth.

Content moderation is a bit broad. It can be anything from deleting posts that are not appropriate to content that is hateful to preventing and blocking accounts or preventing people from seeing a particular post or social media content.

The goal in content moderation for social media on social media platforms is to safeguard the sanity of community members. I think that social media is a great platform if it's used for the right things, right? And so with the benefits, there are also challenges that come with it.

If you are giving everybody access to these platforms, and think of it as a marketplace, you have good products in there, you're going to have bad products in there, and content moderation is focused on taking those bad products from the marketplace so that everyone is safe.

Veto: You're working on a paper with several colleagues, and it's about AI and content moderation. Can you give us a rundown of the study?

Arhin: Before I go on, I'd like to give a shout out to my co-authors. I’m working on this with Dominic Packer in the psychology department, Haiyan Jia in the journalism and communication department, and Karleigh Groves, a Ph.D. student in the psychology department.

Our focus first is to try to understand how generative AI systems make content moderation decisions. We are interested in first, seeing if they exhibit the same inconsistencies and inaccuracies that we see in human labeling of hate speech.

We want to see if they have comparable outcomes in terms of labeling hate speech. And then, we want to know if we can nudge the models to be more strict or more permissive using a theory we call the regulatory focus theory.

What we do in this paper is we curate some hateful speech and non hateful speech all based on some publicly available data. And then we give these models three sets of instructions.

In one instruction, we tell them to make sure they do not miss labeling any hate speech. In one instance, we tell them, "Hey, make sure that you do not falsely claim something is hateful when it's not." Or trying to nudge them to use the construct in the regulatory focus theory. And then we find variations in how these models respond to the instructions.

And so for one set of instructions where we ask them to be very strict, we are seeing that there's very little variation in their output. And then in one set of instructions where we tell them to make sure they do not wrongly label things as hate speech, we are finding that there's a wide variation there just because we are giving them a lot of room.

Now, someone will say that these findings are obvious, but we were just interested to see if they would respond to these stimuli like people do, and also to see if there were opportunities for us to explain biases and inaccuracies in labeling hate speech online.

Tags: data analytics AI social media