Skip to Main Content
PCMag editors select and review products independently. If you buy through affiliate links, we may earn commissions, which help support our testing.

How Google's Jigsaw Is Trying to Detoxify the Internet

The Perspective API from Jigsaw, part of Google parent company Alphabet, gives online comment mods an evolving set of tools to combat abuse and harassment. But this machine learning technology also raises questions about the limits of AI.

January 29, 2019
How Google's Jigsaw Is Trying to Make the Internet Less Toxic

The internet can feel like a toxic place. Trolls descend on comment sections and social media threads to hurl hate speech and harassment, turning potentially enlightening discussions into ad hominem attacks and group pile-ons. Expressing an opinion online often doesn't seem worth the resulting vitriol.

Massive social platforms—including Facebook, Twitter, and YouTube—admit they can't adequately police these issues. They're in an arms race with bots, trolls, and every other undesirable who slips through content filters. Humans are not physically capable of reading every single comment on the web; those who try often regret it.

Tech giants have experimented with various combinations of human moderation, AI algorithms, and filters to wade through the deluge of content flowing through their feeds each day. Jigsaw is trying to find a middle ground. The Alphabet subsidiary and tech incubator, formerly known as Google Ideas, is beginning to prove that machine learning (ML) fashioned into tools for human moderators can change the way we approach the internet's toxicity problem.

Perspective is an API developed by Jigsaw and Google's Counter Abuse Technology team. It uses ML to spot abuse and harassment online, and scores comments based on the perceived impact they might have on a conversation in a bid to make human moderators' lives easier.

Perspective Amidst the Shouting Matches

The open-source tech was first announced in 2017, though development on it started a few years earlier. Some of the first sites to experiment with Perspective have been news publications such as The New York Times and sites such as Wikipedia. But recently, Perspective has found a home on sites like Reddit and comment platform Disqus (which is used on PCMag.com.)

CJ Adams, product manager for Perspective, said the project wanted to examine how people's voices are silenced online. Jigsaw wanted to explore how targeted abuse or a general atmosphere of harassment can create a chilling effect, discouraging people to the point where they feel it's not worth the time or energy to add their voice to a discussion. How often have you seen a tweet, post, or comment and chosen not to respond because fighting trolls and getting Mad Online just isn't worth the aggravation?

"It's very easy to ruin an online conversation," said Adams. "It's easy to jump in, but one person being really mean or toxic could drive other voices out. Maybe 100 people read an article or start a debate, and often you end up with the loudest voices in the room being the only ones left, in an internet that's optimized for likes and shares. So you kind of silence all these voices. Then what's defining the debate is just the loudest voice in the room—the shouting match."

Jigsaw and Google

Jigsaw and Google

It's been a rough year for Jigsaw's sister company, Google, which has grappled with data security issues, employee pushback on its involvement in projects for the Pentagon and China, and revelations over its handling of sexual harassment. Not to mention a contentious Congressional hearing in which CEO Sundar Pichai was grilled by lawmakers.

Over at Jigsaw, Alphabet's altruistic incubator, things have been a bit less dramatic. The team has spent its time examining more technical forms of censorship, such as DNS poisoning with its Intra app and DDoS attacks with Project Shield. With Perspective, the goal is more abstract. Rather than using machine learning to determine what is or isn't against a given set of rules, Perspective's challenge is an intensely subjective one: classifying the emotional impact of language.

To do that, you need natural language processing (NLP), which breaks down a sentence to spot patterns. The Perspective team is confronting problems like confirmation bias, groupthink, and harassing behavior in an environment where technology has amplified their reach and made them harder to solve.

AI Is 'Wrong and Dumb Sometimes'

AI Is 'Wrong and Dumb Sometimes'

Improving online conversations with machine learning isn't a straightforward task. It's still an emerging field of research. Algorithms can be biased, machine learning systems require endless refinement, and the hardest and most important problems are still largely unexplored.

The Conversation AI research group, which created Perspective, started by meeting with newspapers, publishers, and other sites hosting conversations. Some of the first sites to experiment with the technology were The New York Times, Wikipedia, The Guardian, and The Economist.

In 2017, the team opened up the initial Perspective demo via public website as part of an alpha test, letting people type millions of vile, abusive comments into the site. It was kind of like Microsoft's infamous failed Tay chatbot experiment, except instead of tricking the bot into replying with racist tweets, Jigsaw used the crowdsourced virulence as training data to feed its models, helping to identify and categorize different types of online abuse.

The initial public test run did not go smoothly. Wired's "Trolls Across America," which broke down toxicity in commenting across the country based on Perspective scoring, showed how the algorithm inadvertently discriminated against groups by race, gender identity, or sexual orientation.

Adams was candid about the fact that Perspective's initial testing revealed major blind spots and algorithmic bias. Like Amazon's scrapped recruiting tool, which trained on decades of flawed job data and developed an inherent bias against female applicants, the early Perspective models had glaring flaws because of the data on which it was trained.

"In the example of frequently targeted groups, if you looked at the distribution across the comments in the training data set, there were a vanishingly small number of comments that included the word 'gay' or 'feminist' and were using it in a positive way," explained Adams. "Abusive comments use the words as insults. So the ML, looking at the patterns, would say, "Hey, the presence of this word is a pretty good predictor of whether or not this sentiment is toxic."

For example, the alpha algorithm might have mistakenly labeled statements like "I'm a proud gay man," or, "I'm a feminist and transgender" with high toxicity scores. But the publicly transparent training process—while painful—was an invaluable lesson for Jigsaw in the consequences of unintended bias, Adams said.

When training machine-learning models on something as distressing and personal as online abuse and harassment, the existence of algorithmic bias also underscores why AI alone is not the solution. Social companies such as Facebook and YouTube have both touted their platforms' AI content-moderation features only to backtrack amid scandal and course-correct by hiring thousands of human moderators.

Jigsaw's tack is a hybrid of the two. Perspective isn't AI algorithms making decisions in a vacuum; the API is integrated into community-management and content-moderation interfaces to serve as an assistive tool for human moderators. Perspective engineers describe moderating hate speech with and without ML using a haystack analogy: AI helps by automating the sorting process, whittling vast haystacks down while still giving humans the final say over whether a comment is considered abusive or harassment.

"It's this new capability of ML," said Adams. "People talk about how smart AI is, but they often don't talk about all the ways it's wrong and dumb sometimes. From the very beginning, we knew this was going to make a lot of mistakes, and so we said, ‘This tool is helpful for machine-assisted human moderation, but it is not ready to be making automatic decisions.' But it can take the ‘needle in a haystack' problem finding this toxic speech and get it down to a handful of hay.”

What Is a Toxicity Score?

What Is a Toxicity Score?

The most divisive aspect of Perspective's modeling is putting numbers to a variable as subjective as "toxicity." The first thing Adams pointed out is that Perspective's scores are an indication of probability, not severity. Higher numbers represent a higher likelihood that patterns in the text resemble patterns in comments people have tagged as toxic.

As for what "toxic" actually means, the Perspective team defines it broadly as "a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion." But how that manifests can be subtle. In 2018, Jigsaw partnered with the Rhodes Artificial Intelligence Lab (RAIL) to develop ML models that can pick up more ambiguous forms of threatening or hateful speech, such as a dismissive, condescending, or sarcastic comment that's not openly hostile.

Up to this point, most of Perspective's models have been trained by asking people to rate internet comments on a scale from "very toxic" to "very healthy." Developers can then calibrate the model to flag comments above a certain threshold, from 0.0 to 1.0. A score above 0.9 indicates high probability of toxicity, and a score of 0.5 or below means a far lower degree of algorithmic certainty. Perspective also uses what's called score normalization, which gives developers a consistent baseline from which to interpret scores. Adams explained that depending on forum or website, developers can mix and match models. So when a community doesn't mind profanity, that attribute can be weighed down.

Adams showed me a demo moderation interface integrated with the Perspective API. In the admin panel, next to the options to sort comments by top, newest, and so on, is a small flag icon to sort by toxicity. There's also a built-in feedback mechanism for the human moderator to tell Perspective it scored a comment incorrectly and improve the model over time.

He clicked through a demo interface for moderating Wikipedia Talk page comments scored by different Perspective models, and a histogram graph breaking down which comments are likely to be an attack on a page author or an attack on another commenter.

"We want to build machine-assisted moderation tools to flag things for a human to review, but we don't want some central definition or someone to say what is good and bad," said Adams. "So if I sort by toxicity, you see mean comments come to the top. But if you care more about, let's say, identity attacks or threats than metrics like swearing, maybe you wouldn't use a general toxicity model. These are the ingredients that you can mix. We offer these, and developers weight them."

The RAIL experiment is taking a more granular approach. The Oxford grad students are building a data set of tens of thousands of comments from Canadian newspaper Globe and Mail's comments section and Wikipedia Talk pages. They're asking human "annotators" to answer questions about each comment related to five sub-attributes of "unhealthy content": hostile or insulting (trolls), dismissive, condescending or patronizing, sarcastic, and unfair generalizations.

Homing in on these more subtle attributes has revealed new complex problems with unintended bias toward specific groups and false positives with sarcastic comments. It's part of AI growing pains, feeding models more and more data to help it understand implied, indirect meanings behind human speech. The team is still combing through and annotating thousands of comments, and it plans to release the final dataset early this year.

"What we want to work toward is something where the community can score a set of comments, and then we can make them a custom mix of Perspective models to match," said Adams.

Reddit's Curious Testbed

Social Media Fight

Reddit is a microcosm of everything that's good and terrible about the internet. There's a subreddit community for every topic and niche, bizarre interest you can think of. Jigsaw doesn't work with Reddit on a corporate level, but one of the most intriguing places in which Perspective's AI moderation is being tested is on a subreddit called r/changemyview.

Surprisingly, there are corners of the internet where genuine debate and discussion still happen. Change My View, or CMV, is not like most other subreddits. The idea is to post an opinion you accept may be flawed or are open to having changed, then to listen to and understand other points of view to see whether they can change your mind on an issue. The threads range from mundane topics such as the proper viewing order for Star Wars movies to serious discussions on issues including racism, politics, gun control, and religion.

Change My View is an interesting testbed for Perspective because the subreddit has its own detailed set of rules for starting and moderating conversations that incite argument and heated debate by design. Kal Turnbull, who goes by u/Snorrrlax on Reddit, is the founder and one of the moderators of r/changemyview. Turnbull told PCMag that the Perspective API lines up particularly well with the sub's Rule 2, which basically prohibits rude or hostile speech.

"It sounds like a simple rule, but there's a lot of nuance to it," said Turnbull, who is based in Scotland. “It's hard to automate this rule without being clever about language. Reddit gives you this thing called AutoModerator, where you can set up filters and keywords for flagging. But there are so many false positives, and it can be quite hard to catch, because someone can say a bad word without insulting someone, and they can also insult someone without using any bad words.”

Jigsaw reached out to Turnbull in March 2018. The collaboration began with Rule 2, but soon the team was building Perspective models for other rules as well. It's not a full integration of the open-source Perspective API but rather a Reddit bot that lets moderators flag comments scored above a given toxicity threshold.

For the past six years, Turnbull and the other mods have been doing all of this manually from the queue of AutoModerator reports (flagged keywords) and user reports. Jigsaw used years of rule-violation notes from moderators, which they tracked through a browser extension, and built Perspective models based on that data combined with some of Perspective's existing toxicity models. Throughout 2018, the CMV mods gave feedback on issues such as excess false positives, and Jigsaw tweaked the scoring thresholds while continuing to model more of CMV's rules.

Complex Judgments in Online Debate

Complex Judgments in Online Debate
(The Perspective bot integrated into the Reddit moderator interface.)

Perspective isn't live for all of the subreddit's rule moderation. Some of the more complicated or abstract rules are still beyond the scope of what this kind of ML can understand.

Rule 4, for example, governs the sub's Delta points system, while Rule B stops users from playing devil's advocate or using a post for "soapboaxing." Nuanced moderation like that requires contextual data and plain ol' human understanding, to discern whether someone is arguing a point for genuine reasons or simply trolling.

For the foreseeable future, we'll still need human mods. These more complex judgment scenarios are where the CMV moderators are beginning to see cracks in the AI modeling, and more clever automation could determine whether all of this is scalable.

"I think the reason why this is so complicated is because it's a combination of our judgment on their original post and their interactions throughout the entire conversation. So it's not just one comment that triggers a model," said Turnbull. "If an argument is going back and forth, and at the end is a comment saying 'thank you' or an acknowledgement, we let it go even if a rule was broken earlier in the thread. Or a light-hearted joke that in context might appear to be rude—it's a nice little human thing, and that's something the bot doesn't get yet."

Change My View is the only subreddit actively using Perspective ML models for moderation at the moment, although Adams said the team has received access requests from several others. The specific rule set of CMV made it an ideal test case, but Perspective models are malleable; individual subreddits can customize the scoring algorithm to match their community guidelines.

The next step for Turnbull is taking CMV off Reddit because the community is outgrowing it, he said. For the past six months, the moderators' newly formed startup has been working with Jigsaw on a dedicated site with deeper functionality than Reddit's mod interface and bots can provide.

The project is still only in alpha testing, but Turnbull talked about features such as proactive alerts when a user is typing a comment that might break a rule, built-in reporting to give moderators more context, and historical data to make decisions. Turnbull stressed that there are no plans to shut down or migrate the subreddit, but he's excited about the new experiment.

All the Comments Fit to Print

Depending on the day of the week, The New York Times' website gets anywhere from 12,000 to more than 18,000 comments. Until mid-2017, the paper's comments sections were moderated by a full-time community management staff who read every single comment and decided whether to approve or reject it.

Bassey Etim, who until this month was the community editor for the Times, spent a decade at the Community desk and was its editor since 2014. At the height of a weekday, the team might have a few people moderating comments on opinion stories while other tackled news stories. A spreadsheet split up and tracked different responsibilities, but the team of approximately a dozen people were constantly reassigned or moved around depending on the top news of the moment. They also fed tidbits from the comments back to reporters for potential story fodder.

Eventually, it became clear that this was more than 12 humans could handle. Comment sections on stories would have to close after reaching a maximum number of comments the team could moderate.

The newspaper's audience development group had already been experimenting with machine learning for basic, obvious comment approvals, but Etim said it wasn't particularly smart or customizable. The Times first announced its partnership with Jigsaw in September 2016. Since then, its comments sections have expanded from appearing on less than 10 percent of all stories to around 30 percent today and climbing.

From Jigsaw's perspective, the incubator saw the opportunity to feed Perspective anonymized data from millions of comments per day, moderated by professionals who could help refine the process. In exchange for the anonymized ML training data, Jigsaw and Times worked together to build a platform called Moderator, which rolled out in June 2017.

Inside Moderator, the NYT Comment Interface

Inside Moderator, the NYT Comment Interface
(Image courtesy of The New York Times)

Moderator combines Perspective's models with more than 16 million anonymized, moderated Times comments going back to 2007.

What the community team actually sees in the Moderator interface is a dashboard with an interactive histogram chart that visualizes the comment breakdown above a certain threshold. They can drag the slider back and forth, for instance, to automatically approve all comments with only a 0 to 20 percent summary score, which is based on a combination of a comment's potential for obscenity, toxicity, and likelihood to be rejected. There are quick moderation buttons below to approve or reject a comment, defer it, or tag the comment, to continue improving Perspective's modeling.

"For each section of the website, we analyzed incoming comments and the way Perspective would tag them. We used both the public Perspective models and our own models unique to The New York Times," said Etim. "I would analyze comments from each section and try to find the cutoff point where we'd be comfortable saying, 'OK, everything above this probability using these specific toxicity tags, like obscenity for instance, we're going to approve."

Machine learning is approving a comparatively small percentage of comments (around 25 percent or so, Etim said) as the Times works to roll out comments on more stories and ultimately even to customize how the models filter and approve comments for different sections of the site. The models only approve comments; rejection is still handled entirely by human moderators.

Those manual comment cutoffs are gone. Comments typically close on a story either 24 hours after it publishes online or the day after it publishes in print, Etim said.

'We're Not Replacing You With Machines'

The next phase is building more features into the system to help moderators prioritize which comments to look at first. Increasingly, automating what has always been a manual process has enabled moderators to spend time proactively working with reporters to reply to comments. It's created a feedback loop where comments lead to follow-up reporting and additional stories—can save and reallocate resources to create more journalism.

"Moderator and Perspective have made the Times a lot more responsive to readers concerns, because we have the resources to do that, whether it's by writing stories ourselves or working with reporters to figure out stories," said Etim. "The cool thing about this project is that we didn't lay anybody off. We're not replacing you with machines. We're simply using the humans we have more efficiently and to make the really tough decisions."

The paper is open to working with other publications to help the rest of the industry implement this kind of technology. It can help local news outlets with limited resources to maintain comment sections without a large dedicated staff and to use comments as the Times does, to find potential leads and fuel grassroots journalism.

Etim likened AI-assisted moderation to giving a farmer a mechanical plow versus a spade. You can do the job a lot better with a plow.

"If Perspective can evolve in the right way, it can, hopefully, create at least a set of guidelines that are repeatable for small outlets," he said. "It's a long game, but we've already set up a lot of the foundation to be a part of that reader experience. Then maybe these local papers can have comments again and establish a little beachhead against the major social players."

Screaming Into the Abyss

Social Media Thumbs Up Down

At this point, most of us have seen people attacked or harassed on social media for voicing an opinion. Nobody wants it to happen to them, except trolls who thrive on that sort of thing. And we've learned that shouting at a stranger who's never going to listen to a rational argument isn't a valuable use of our time.

Perspective is trying to upend that dynamic, but CJ Adams said the broader goal is to publish data, research, and new open-source UX models to create new structures of conversation—a daunting task. Making the internet a healthy place that's worth people's time means scaling these systems beyond news comment sections and subreddits. Ultimately, the AI tools must be able to handle the gargantuan social apps and networks that dominate our everyday digital interactions.

Putting aside what Facebook, Twitter, and other social giants are doing internally, the most direct way to accomplish this is to push the technology from moderators to users themselves. Adams pointed to the Coral Project for an idea of what that might look like.

The Coral Project was initially founded as a collaboration between the Mozilla Foundation, The New York Times and the Washington Post. Coral is building open-source tools such as its Talk platform to encourage online discussion and give news sites an alternative to shutting down comment sections. Talk currently powers platforms for nearly 50 online publishers, including the Post, New York Magazine, The Wall Street Journal, and The Intercept.

Earlier this month, Vox Media acquired the Coral Project from the Mozilla Foundation; it plans to "deeply integrate" it into Chorus, its content management and storytelling platform.

Perspective has a plugin for the Coral Project that uses the same underlying tech—ML-based toxicity scoring and thresholds—to give users proactive suggestions as they're typing, Adams said. So when a user is writing a comment containing phrases flagged as abuse or harassment, a notification might pop up for the user saying, "Before you post this, be sure to remember our community guidelines" or "The language in this comment may violate our community guidelines. Our moderation team will review it shortly."

"That little nudge can help people just take that second to think, but it also doesn't block anyone," said Adams. "It's not stopping the discussion."

It's a mechanism that video game chat and streaming platforms have integrated to stem abuse and harassment. Twitter users could clearly benefit from such a system, too.

It speaks to an idea that MIT research scientist Andrew Lippmann brought up in PCMag's Future Issue: He talked about built-in mechanisms that would let people stop and think before they shared something online, to help stem the spread of misinformation. The concept applies to online discussion, too. We've created frictionless communication systems capable of amplifying a statement's reach exponentially in an instant, but sometimes a little friction can be a good thing, Lippmann said.

Perspective isn't about using AI as a blanket solution. It's a way to mold ML models into tools for humans to help them curate their own experiences. But one counterpoint is that if you make it even easier for people to tune out the online noise they don't like, the internet will become even more of an echo chamber than it already is.

Asked whether tools like Perspective could ultimately exacerbate this, Adams said he believes online echo chambers exists because there are no mechanisms to host a discussion where people can meaningfully disagree.

"The path of least resistance is 'These people are fighting. Let's just let them agree with themselves in their own corners. Let people silo themselves,'" he said. "You let people shout everyone else out of the room, or you shut down the discussion. We want Perspective to create a third option."

Adams laid out a sample scenario. If you ask a room of 1,000 people, "How many of you read something today that you really cared about?" most internet users will point to an article, a tweet, a post, or something they read online. But if you then ask them, "How many of you thought it was worth your time to comment on it or have a discussion?" all the hands in the room will go down.

"For so many of us, it's just not worth the effort. The structure of discussion that we have right now just means it's a liability. If you have a current reasonable thought or something you want to share, for most people, they don't want to take part," said Adams. "That means that of that 1,000 people that could be in the room, you have only a handful represented in the discussion; let's say, 10 people. I have deep faith that we can build a structure that lets that other 990 back into the discussion and does it in a way that they find worth their time."

Get Our Best Stories!

Sign up for What's New Now to get our top stories delivered to your inbox every morning.

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.


Thanks for signing up!

Your subscription has been confirmed. Keep an eye on your inbox!

Sign up for other newsletters

TRENDING

About Rob Marvin

Associate Features Editor

Rob Marvin is PCMag's Associate Features Editor. He writes features, news, and trend stories on all manner of emerging technologies. Beats include: startups, business and venture capital, blockchain and cryptocurrencies, AI, augmented and virtual reality, IoT and automation, legal cannabis tech, social media, streaming, security, mobile commerce, M&A, and entertainment. Rob was previously Assistant Editor and Associate Editor in PCMag's Business section. Prior to that, he served as an editor at SD Times. He graduated from Syracuse University's S.I. Newhouse School of Public Communications. You can also find his business and tech coverage on Entrepreneur and Fox Business. Rob is also an unabashed nerd who does occasional entertainment writing for Geek.com on movies, TV, and culture. Once a year you can find him on a couch with friends marathoning The Lord of the Rings trilogy--extended editions. Follow Rob on Twitter at @rjmarvin1.

Read Rob's full bio

Read the latest from Rob Marvin