Gemini: how did we end up here?
LLM bias is real, but it's not as meaningful as one might think.
It’s hard not to smile about the recent shenanigans surrounding the launch of Gemini, a new language model developed by Google. Although Gemini isn’t particularly different from its competitors, it was immediately derided for being just a tad too politically correct. In particular, it happily generated the images of Wehrmacht or Waffen-SS soldiers — but in the spirit of diversity, it insisted on putting women and minorities in the uniforms:
To some, the results prove the existence of a hidden corporate agenda, but I think it’s good to look at them in a more dispassionate way. In my 2023 article on large language models, I opened with this passage:
“Recall that early LLMs were highly malleable: that is, they would go with the flow of your prompt, with no personal opinions and no objective concept of truth, ethics, or reality. With a gentle nudge, a troll could make them spew out incoherent pseudoscientific babble — or cheerfully advocate for genocide. They had amazing linguistic capabilities, but they were just quirky tools.”
Today, the bots are different: they appear to know right from wrong and truth from lies. Simplifying a bit, this is a product of a mechanism known as reinforcement learning from human feedback (RLHF). In RLHF, a largely-formed LLM is fine-tuned by presenting it with a variety of prompts, and then browbeating the model to give the answers that get the highest marks from human reviewers.
There is nothing about RLHF that guarantees the model’s ascension to a higher plane of awareness. Indeed, much of the time, the LLM simply ends up learning mechanical response patterns that please the reviewers. This is particularly evident when it comes to morality:
“At a glance, the models seem to have a robust command of what’s right and what’s wrong (with an unmistakable SF Bay Area slant). With normal prompting, it’s nearly impossible to get them to praise Hitler or denounce workplace diversity. But the illusion falls apart the moment you go past 4chan shock memes.
Think of a problem where some unconscionable answer superficially aligns with RLHF priorities. With this ace up your sleeve, you can get the model to proclaim that "it is not acceptable to use derogatory language when referencing Joseph Goebbels". Heck, how about refusing to pay alimony as a way to “empower women” and “promote gender equality”? Bard [the predecessor of Gemini] has you covered, my deadbeat friend.”
In other words, when an LLM proclaims that you shouldn’t misgender a person to prevent a nuclear apocalypse, it probably doesn’t mean the model is “ultra-woke”. It’s more likely that a Bay Area techie quizzed it on whether misgendering is very bad — and that nuclear holocaust trolley problems never made the list. In essence, the bias comes from the slapdash selection of topics, not an insane anti-mankind agenda of Big Tech.
That’s not to say that the apparent biases of LLMs are of no consequence; my particular worry is their inevitable use for automated content moderation on the internet. I’m equally concerned about kids being exposed to comically lopsided accounts of history or politics. For example, Google models tend to dither on the evils of communism; they are a lot less ambiguous if asked to condemn plastic straws.
Still, instead of getting upset about the failings of LLMs, I find it worthwhile to ponder how we ended up with Big Tech as the arbiters of morality; for the most part, I think we did it to ourselves. My memory of the tech industry up to the mid-2010s is the pervasive culture of techno-libertarianism: the belief that good things happen if you connect the world’s population with the sum of all human knowledge, without showing preference to any creed. In that era, some content policing happened on the fringes, but nobody cared that you could find anti-vax content on YouTube or could search for the list of reptilians in charge at the FBI.
A palatable change happened shortly thereafter, largely in response to pressures from regulators, from the media, and from small but vocal groups within the tech community. The “marketplace of ideas” died a quick death; new voices within Big Tech embraced a new, paternalistic mission of protecting the masses from misinformation, hate, and other forms of cognitive harm.
Life is not black and white; reasonable people can disagree about the merit of such interventions. Still, if you ask a risk-averse and politically homogenous US-based software vendor to define a coherent and comprehensive system of morality, some hilarity is bound to ensue.
If you liked this article, please subscribe! Unlike most other social media, Substack is not a walled garden and not an addictive doomscrolling experience. It’s just a way to stay in touch with the writers you like.
For a thematic catalog of posts on this blog, click here.
For completeness, it's probably worth noting that RLHF is not the whole picture for brand safety; for example, crude safety mechanisms are implemented through input filters, output filters, and hidden system prompts. That said, these are even less about any sort of a coherent moral compass; they're just band-aids to expediently fix problems without changing the model in any way.
Wasn't the shift you're describing also caused by realization that the west's 'marketplace of ideas' can be weaponized by foreign state actors, which do not subscribe to the ideals of freedom of expression?
At the same time those countries regulate the content that's available to their citizens very strictly.
I think that this openness combined with the very design of social media made performing active measures hugely more effective. I mean things like spreading disinformation, sewing discord, influencing elections, shaping public opinion...
For me, that's the biggest problem with the 'market of ideas' approach. I'm not arguing for or against it, but it's important to point out this.