The serious science of trolling LLMs
The internet's oldest pastime finally has a purpose -- and it's more serious than AI companies would like to admit.
I recently had an interesting conversation with some old friends. We were talking about trolling large language models — that is, the practice of fiddling with the prompt to get the machine to say something outrageous or nonsensical, and then publicly displaying the result to earn retweets and likes.
My friends’ view was that it’s a profoundly pointless activity: unlike humans, language models can’t be humiliated and do not learn from mistakes. The practice is also portrayed as misguided by the LLM vendors themselves; their stated objective is to make the models helpful and accurate in normal use, not to harden them against internet edgelords.
In our discussion, I took a contrarian view — and while I don’t want to turn this into a one-sided polemic, I figured it’s useful to write down my thoughts. In short, I believe that the revealed objectives of LLM vendors differ starkly from what they say. They pour considerable resources into responding to every trollish LLM transcript that goes viral on the internet — and it’s worthwhile to ponder why.
The root issue is that in commercial applications, your customers don’t need the LLMs to be flawless — but they need to know how and when they fail. Yet, because the models’ internals are inscrutable, the proxy measurement we rely on is the extent to which the models appear human-like. An LLM stuck in the uncanny valley is bound to scare the customers away.
Responding to this incentive, vendors engage in sleight-of-hand. The models are made to appear more human by forcing them to feign emotions, profusely apologize for mistakes, or even respond with scripted jokes that mask the LLMs’ inability to write anything resembling humor:
In other words, the LLM business is to some extent predicated on deception; we are not supposed to know where the magic ends and where cheap tricks begin. The vendors’ hope is that with time, we will reach full human-LLM parity; and until then, it’s OK to fudge it a bit. From this perspective, the viral examples that make it patently clear that the models don’t reason like humans are not just PR annoyances; they are a threat to product strategy.
Far from being a waste of time, internet trolling is becoming a legitimate scientific pursuit. When a model aces a human benchmark, it’s hard to know how much of this can be credited to reasoning and how much of it boils down to manual interventions or to recall from a vast training data set. It’s when it fails at a simple task that we know what the limitations are — and trolls are the torch-bearers of this new enlightenment.
If you liked this article, please subscribe! Unlike most other social media, Substack is not a walled garden and not an addictive doomscrolling experience. It’s just a way to stay in touch with the writers you like.
I “trolled” my company’s (corporate, in-house badged) “AI” by asking it questions whose answers strictly relied on elementary principles of propositional logic (embedded in a word problem). It failed. It also seemed to be rather poor at arithmetic in some cases.
Knowing a little about how “AI” works, It seems to me that claims that it can reason, in any familiar sense of that term, are simple wishcasting. Manipulation of vectors and matrices, while predicated on logic and arithmetic, cannot produce anything but vectors and matrices no matter how fast those manipulations can be performed — any apparent logical inferences are not derived but rather encoded in the inputs.
LLM’s usually generate text using a random number generator. Particularly funny trolls could happen by chance. Is it science to conclude much of anything from one sample from a stochastic process?
I suppose it depends what you’re doing. I do use “stop in the debugger at a random time” as a poor man’s profiler sometimes. It’s limited, but it often works to find simple, bad performance issues.
Maybe we should be asking when it makes sense to use random output at all? There are appropriate uses of randomness, but it’s for generating things to try, the “generate” phase of a generate-and-check algorithm. For AI chat, the computer generates hints and you manually check the result.
If we called them “random text generators,” maybe they would be used more appropriately.