The AI That Could Heal a Divided Internet

Credit - Illustration By Dan Page for TIME

In the 1990s and early 2000s, technologists made the world a grand promise: new communications technologies would strengthen democracy, undermine authoritarianism, and lead to a new era of human flourishing. But today, few people would agree that the internet has lived up to that lofty goal.

Today, on social media platforms, content tends to be ranked by how much engagement it receives. Over the last two decades politics, the media, and culture have all been reshaped to meet a single, overriding incentive: posts that provoke an emotional response often rise to the top.

Efforts to improve the health of online spaces have long focused on content moderation, the practice of detecting and removing bad content. Tech companies hired workers and built AI to identify hate speech, incitement to violence, and harassment. That worked imperfectly, but it stopped the worst toxicity from flooding our feeds.

There was one problem: while these AIs helped remove the bad, they didn’t elevate the good. “Do you see an internet that is working, where we are having conversations that are healthy or productive?” asks Yasmin Green, the CEO of Google’s Jigsaw unit, which was founded in 2010 with a remit to address threats to open societies. “No. You see an internet that is driving us further and further apart.”

What if there were another way?

Jigsaw believes it has found one. On Monday, the Google subsidiary revealed a new set of AI tools, or classifiers, that can score posts based on the likelihood that they contain good content: Is a post nuanced? Does it contain evidence-based reasoning? Does it share a personal story, or foster human compassion? By returning a numerical score (from 0 to 1) representing the likelihood of a post containing each of those virtues and others, these new AI tools could allow the designers of online spaces to rank posts in a new way. Instead of posts that receive the most likes or comments rising to the top, platforms could—in an effort to foster a better community—choose to put the most nuanced comments, or the most compassionate ones, first.

Read More: How Americans Can Tackle Political Division Together

The breakthrough was made possible by recent advances in large language models (LLMs), the type of AI that underpins chatbots like ChatGPT. In the past, even training an AI to detect simple forms of toxicity, like whether a post was racist, required millions of labeled examples. Those older forms of AI were often brittle and ineffectual, not to mention expensive to develop. But the new generation of LLMs can identify even complex linguistic concepts out of the box, and calibrating them to perform specific tasks is far cheaper than it used to be. Jigsaw’s new classifiers can identify “attributes” like whether a post contains a personal story, curiosity, nuance, compassion, reasoning, affinity, or respect. “It's starting to become feasible to talk about something like building a classifier for compassion, or curiosity, or nuance,” says Jonathan Stray, a senior scientist at the Berkeley Center for Human-Compatible AI. “These fuzzy, contextual, know-it-when-I-see-it kind of concepts— we're getting much better at detecting those.”

This new ability could be a watershed for the internet. Green, and a growing chorus of academics who study the effects of social media on public discourse, argue that content moderation is “necessary but not sufficient” to make the internet a better place. Finding a way to boost positive content, they say, could have cascading positive effects both at the personal level—our relationships with each other—but also at the scale of society. “By changing the way that content is ranked, if you can do it in a broad enough way, you might be able to change the media economics of the entire system,” says Stray, who did not work on the Jigsaw project. “If enough of the algorithmic distribution channels disfavored divisive rhetoric, it just wouldn’t be worth it to produce it any more.”


One morning in late March, Tin Acosta joins a video call from Jigsaw’s offices in New York City. On the conference room wall behind her, there is a large photograph from the 2003 Rose Revolution in Georgia, when peaceful protestors toppled the country’s Soviet-era government. Other rooms have similar photos of people in Syria, Iran, Cuba and North Korea “using tech and their voices to secure their freedom,” Jigsaw’s press officer, who is also in the room, tells me. The photos are intended as a reminder of Jigsaw’s mission to use technology as a force for good, and its duty to serve people in both democracies and repressive societies.

On her laptop, Acosta fires up a demonstration of Jigsaw’s new classifiers. Using a database of 380 comments from a recent Reddit thread, the Jigsaw senior product manager begins to demonstrate how ranking the posts using different classifiers would change the sorts of comments that rise to the top. The thread’s original poster had asked for life-affirming movie recommendations. Sorted by the default ranking on Reddit—posts that have received the most upvotes—the top comments are short, and contain little beyond the titles of popular movies. Then Acosta clicks a drop-down menu, and selects Jigsaw’s reasoning classifier. The posts reshuffle. Now, the top comments are more detailed. “You start to see people being really thoughtful about their responses,” Acosta says. “Here’s somebody talking about School of Rock—not just the content of the plot, but also the ways in which the movie has changed his life and made him fall in love with music.” (TIME agreed not to quote directly from the comments, which Jigsaw said were used for demonstrative purposes only and had not been used to train its AI models.)

Acosta chooses another classifier, one of her favorites: whether a post contains a personal story. The top comment is now from a user describing how, under both a heavy blanket and the influence of drugs, they had ugly-cried so hard at Ke Huy Quan’s monologue in Everything Everywhere All at Once that they’d had to pause the movie multiple times. Another top comment describes how a movie trailer had inspired them to quit a job they were miserable with. Another tells the story of how a movie reminded them of their sister, who had died 10 years earlier. “This is a really great way to look through a conversation and understand it a little better than [ranking by] engagement or recency,” Acosta says.

For the classifiers to have an impact on the wider internet, they would require buy-in from the biggest tech companies, which are all locked in a zero-sum competition for our attention. Even though they were developed inside Google, the tech giant has no plans to start using them to help rank its YouTube comments, Green says. Instead, Jigsaw is making the tools freely available for independent developers, in the hopes that smaller online spaces, like message boards and newspaper comment sections, will build up an evidence base that the new forms of ranking are popular with users.

Read More: The Subreddit /r/Collapse Has Become the Doomscrolling Capital of the Internet. Can Its Users Break Free?

There are some reasons to be skeptical. For all its flaws, ranking by engagement is egalitarian. Popular posts get amplified regardless of their content, and in this way social media has allowed marginalized groups to gain a voice long denied to them by traditional media. Introducing AI into the mix could threaten this state of affairs. A wide body of research shows that LLMs have plenty of ingrained biases; if applied too hastily, Jigsaw’s classifiers might end up boosting voices that are already prominent online, thus further marginalizing those that aren’t. The classifiers could also exacerbate the problem of AI-generated content flooding the internet, by providing spammers with an easy recipe for AI-generated content that’s likely to get amplified. Even if Jigsaw evades those problems, tinkering with online speech has become a political minefield. Both conservatives and liberals are convinced their posts are being censored; meanwhile, tech companies are under fire for making unaccountable decisions that affect the global public square. Jigsaw argues that its new tools may allow tech platforms to rely less on the controversial practice of content moderation. But there’s no getting away from the fact that changing what kind of speech gets rewarded online will always have political opponents.

Still, academics say that given a chance, Jigsaw’s new AI tools could result in a paradigm shift for social media. Elevating more desirable forms of online speech could create new incentives for more positive online—and possibly offline—social norms. If a platform amplifies toxic comments, “then people get the signal they should do terrible things,” says Ravi Iyer, a technologist at the University of Southern California who helps run the nonprofit Psychology of Technology Research Network. “If the top comments are informative and useful, then people follow the norm and create more informative and useful comments.”


The new algorithms have come a long way from Jigsaw’s earlier work. In 2017, the Google unit released Perspective API, an algorithm for detecting toxicity. The free tool was widely used, including by the New York Times, to downrank or remove negative comments under articles. But experimenting with the tool, which is still available online, reveals the ways that AI tools can carry hidden biases. “You’re a f-cking hypocrite” is, according to the classifier, 96% likely to be a toxic phrase. But many other hateful phrases, according to the tool, are likely to be non-toxic, including the neo-Nazi slogan “Jews will not replace us” (41%) and transphobic language like “trans women are men” (36%). The tool breaks when confronted with a slur that is commonly directed at South Asians in the U.K. and Canada, returning the error message: “We don't yet support that language, but we're working on it!”

To be sure, 2017 was a very different era for AI. Jigsaw has made efforts to mitigate biases in its new classifiers, which are unlikely to make such basic errors. Its team tested the new classifiers on a set of comments that were identical except for the names of different identity groups, and said it found no hint of bias. Still, the patchy effectiveness of the older Perspective API serves as a reminder of the pitfalls of relying on AI to make value judgments about language. Even today’s powerful LLMs are not free from bias, and their fluency can often conceal their limitations. They can discriminate against African American English; they function poorly in some non-English languages; and they can treat equally-capable job candidates differently based on their names alone. More work will be required to ensure Jigsaw’s new AIs don’t have less visible forms of bias. “Of course, there are things that you have to watch out for,” says Iyer, who did not work on the Jigsaw project. “How do we make sure that [each classifier] captures the diversity of ways that people express these concepts?”

In a paper published earlier this month, Acosta and her colleagues set out to test how readers would respond to a list of comments ranked using Jigsaw’s new classifiers, compared to comments sorted by recency. They found that readers preferred the comments sorted by the classifiers, finding them to be more informative, respectful, trustworthy, and interesting. But they also found that ranking comments by just one classifier on its own, like reasoning, could put users off. In its press release launching the classifiers on Monday, Jigsaw says it intends for its tools to be mixed and matched. That’s possible because all they do is return scores between zero and one—so it’s possible to write a formula that combines several scores together into a single number, and use that number as a ranking signal. Web developers could choose to rank comments using a carefully-calibrated mixture of compassion, respect, and curiosity, for example. They could also throw engagement into the mix as well – to make sure that posts that receive lots of likes still get boosted too.


Just as removing negative content from the internet has received its fair share of pushback, boosting certain forms of “desirable” content is likely to prompt complaints that tech companies are putting their thumbs on the political scales. Jigsaw is quick to point out that its classifiers are not only apolitical, but also propose to boost types of content that few people would take issue with. In tests, Jigsaw found the tools did not disproportionately boost comments that were seen by users as unfavorable to Republicans or Democrats. “We have a track record of delivering a product that’s useful for publishers across the political spectrum,” Green says. “The emphasis is on opening up conversations.” Still, the question of power remains: who gets to decide which kinds of content are desirable? Jigsaw’s hope is that by releasing the technology publicly, different online spaces can each choose what works for them—thus avoiding any one hegemonic platform taking that decision on behalf of the entire internet.

For Stray, the Berkeley scientist, there is a tantalizing prospect to an internet where positive content gets boosted. Many people, he says, think of online misinformation as leading to polarization. And it can. “But it also works the other way around,” he says. The demand for low-quality information arises, at least in part, because people are already polarized. If the tools result in people becoming less polarized, “then that should actually change the demand-side for certain types of lower quality content.” It’s hypothetical, he cautions, but it could lead to a virtuous circle, where declining demand for misinformation feeds a declining supply.

Why would platforms agree to implement these changes? Almost by definition, ranking by engagement is the most effective way to keep users onsite, thus keeping eyeballs on the ads that drive up revenue. For the big platforms, that means both the continued flow of profits, and the fact that users aren’t spending time with a competitor’s app. Replacing engagement-based ranking with something less engaging seems like a tough ask for companies already battling to keep their users’ attention.

That’s true, Stray says. But, he notes that there are different forms of engagement. There’s short-term engagement, which is easy for platforms to optimize for: is a tweak to a platform likely to make users spend more time scrolling during the next hour? Platforms can and do make changes to boost their short-term engagement, Stray says—but those kinds of changes often mean boosting low-quality, engagement-bait types of content, which tend to put users off in the long term.

The alternative is long-term engagement. How might a change to a platform influence a user’s likelihood of spending more time scrolling during the next three months? Long-term engagement is healthier, but far harder to optimize for, because it’s harder to isolate the connection between cause and effect. Many different factors are acting upon the user at the same time. Large platforms want users to be returning over the long term, Stray says, and for them to cultivate healthy relationships with their products. But it’s difficult to measure, so optimizing for short-term engagement is often an easier choice.

Jigsaw’s new algorithms could change that calculus. “The hope is, if we get better at building products that people want to use in the long run, that will offset the race to the bottom,” Stray says. “At least somewhat.”

More From TIME

Write to Billy Perrigo at billy.perrigo@time.com.

Advertisement