The advent of robotic hall monitors

LLMs will be used for content moderation. It will probably make the internet a more depressing place.

Sep 27, 2023

In some of the earlier posts, I spoke about what I dubbed the “asymmetry of utility” for large-language models (LLMs). I think that the recent advances in machine learning hold great promise, but I worry that the bar for adversarial uses of the technology — such as content-farming or faking human interactions online — is considerably lower than the bar for profoundly enriching our lives.

On the topics of the potential downsides of LLMs, my other worry has to do with content moderation. Content moderation is a necessity for almost all online platforms of note; some of the work is fairly mechanical and uncontroversial, essentially ridding the community of actors who come in bad faith. Other efforts have to do with preventing societal harms, such as child exploitation or suicide. But an increasing proportion of the work focuses on policing sincere but unwanted speech, from novel takes on space lasers to questionable theories about COVID-19.

In small communities, moderation tasks are usually handled through manual reviews. But from the perspective of the operators of large online platforms, human reviewers are a liability: they are expensive, their decisions are inconsistent, and they can abuse the privileges they’re entrusted with. Being a content reviewer is also a fairly awful job to have; imagine sifting through an endless stream of child pornography, violence, or self-harm every single day for years.

It follows that there is an enormous pressure to augment or replace human reviews with automation — and indeed, most large platforms already use a mix of manually-coded heuristics and ML classifiers to throttle suspect content and flag certain posts for extra checks. Still, because the automation is fairly rudimentary and error-prone, the mechanisms are applied in a narrowly-tailored way. The end result is that there is plenty of room for unfettered online expression. As long as you don’t profoundly upset others, you can almost always say your piece without having a censor swoop in.

With the advent of modern LLMs, this balance is about to shift. The language models are precise enough, and mimic our understanding of text and images well enough, that they can outperform a contract worker adjudicating their 1,000th report — and they can do this while tirelessly checking every nook and cranny of the service in real time. Further, the accuracy of the models might be good enough to build a closed system where an LLM is the judge, jury, and the executioner. This architecture would eliminate the operational risks associated with human workers peeking at users’ messages — and thus eliminate the most significant objection against proactive monitoring of private 1:1 messages. Heck, such a robotic hallway monitor could be even baked into an end-to-end-encrypted chat. And why wouldn’t you do it, if it gets European regulators off your back?

If it sounds far-fetched, consider what’s already coming true. A recent release from Blizzard Entertainment touted the success of their automated speech policing system in Overwatch. And make no mistake, it is a success story: I have no doubt that the system is faster, more dependable, and more thorough than any human-driven process could ever be. Given the intense scrutiny that other tech companies are facing today, their peers will probably have no choice but to follow suit. If they refuse, their platforms will be singled out as the refuge for child pornographers, drug dealers, and Russian propagandists; there will be some truth to that charge too.

I don’t fear that the LLMs will enforce standards any higher than what’s common today; we will probably train them to let some things slide. I’m also not concerned about their error rates; again, human reviewers are prone to mistakes. I worry about a more fundamental chilling effect on online speech that is a simple consequence of coverage: if you can never escape the watchful eye of the robo-censor, you are never truly free to speak your mind.

Mark Risher

Sep 28, 2023

+1000 Michal, an automated system that can enact summary judgment on anyone is pretty terrifying— it was bad enough when we had individual trust and safety agents who could do the same. The incentives to block are much higher than the cost of arbitrating appeals (another asymmetry )

At Google and beyond I was trying to create a habeas corpus kind of system where you could confront your accusers; wonder if we could use LLMs in that capacity somehow, or if the hall monitors would still be policing themselves in that scenario

Expand full comment

Jakub

We'll be using Paradyzjan Koalang in no time ;)

lcamtuf’s thing

Discussion about this post