The advent of robotic hall monitors
LLMs will be used for content moderation. It will probably make the internet a more depressing place.
In some of the earlier posts, I spoke about what I dubbed the “asymmetry of utility” for large-language models (LLMs). I think that the recent advances in machine learning hold great promise, but I worry that the bar for adversarial uses of the technology — such as content-farming or faking human interactions online — is considerably lower than the bar for profoundly enriching our lives.
There’s also a related fear I have, and it has to do with content moderation. Content moderation is a necessity for almost all online platforms of note. Some of the work is fairly mechanical and uncontroversial, essentially ridding the community of actors who come in bad faith. Other efforts have to do with preventing societal harms, such as child exploitation or suicide. But an increasing proportion of the work focuses on policing earnest but unwanted speech, from novel takes on space lasers to questionable theories about COVID-19.
In small communities, moderation tasks are usually handled through manual reviews. But from the perspective of those invested in policing large online platforms, human moderators are a liability. They are expensive, their decisions are inconsistent, and they can abuse the privileges they’re entrusted with. Being a content reviewer is also a fairly awful job to have: imagine sifting through an endless stream of child pornography, violence, or self-harm every single day for years.
It follows that there is an enormous pressure to augment or replace human reviews with automation — and indeed, most large platforms already use a mix of manually-coded heuristics and ML classifiers to throttle suspect content and flag certain posts for extra checks. Still, because the automation is error-prone, the mechanisms are applied in a narrowly-tailored way. The end result is that there is plenty of room for unfettered online expression: as long as you don’t profoundly upset others, you can speak your mind without strangers monitoring your every thought.
With the advent of modern LLMs, this balance is about to shift. The language models are precise enough, and mimic our understanding of text and images well enough, that they can probably outperform a contract worker adjudicating their 1,000th report — and they can do this while tirelessly checking every nook and cranny of the service in real time. Further, the accuracy of the models might be good enough to build a closed system where an LLM is the judge, jury, and the executioner — eliminating once and for all any concerns associated with human workers peeking at users’ messages. Heck, such a robotic hallway monitor could be even baked into an end-to-end-encrypted chat if it gets European regulators off your back.
If it sounds far-fetched, consider that it’s already coming true. A recent release from Blizzard Entertainment touted the success of their automated speech policing system in Overwatch. And make no mistake, it is a success story: I have no doubt that the system is faster, more dependable, and more thorough than any human-driven process could ever be. Given the intense scrutiny that other tech companies are facing today, their peers will probably have no choice but to follow suit. If they refuse, their platforms will be singled out as the refuge for child pornographers, drug dealers, and Russian propagandists; there will be some truth to that charge too.
What I ultimately fear is a profound chilling effect on online speech. It’s not that the LLMs will necessarily enforce standards any higher than what we abide by today; we will probably train them to let some things slide. It’s that the unclear boundaries and the high cost of mistakes, coupled with the utter inability to fly under the radar, will have us constantly biting our tongues.
+1000 Michal, an automated system that can enact summary judgment on anyone is pretty terrifying— it was bad enough when we had individual trust and safety agents who could do the same. The incentives to block are much higher than the cost of arbitrating appeals (another asymmetry )
At Google and beyond I was trying to create a habeas corpus kind of system where you could confront your accusers; wonder if we could use LLMs in that capacity somehow, or if the hall monitors would still be policing themselves in that scenario
We'll be using Paradyzjan Koalang in no time ;)