The serious science of trolling LLMs

The internet's oldest pastime finally has a purpose -- and it's more serious than AI companies would like to admit.

Jul 22, 2024

I recently had an interesting conversation with some old friends. We were talking about trolling large language models — that is, the practice of fiddling with the prompt to get the machine to say something outrageous or nonsensical, and then publicly displaying the result to earn retweets and likes.

My friends’ view was that it’s a profoundly pointless activity: unlike humans, language models can’t be humiliated and do not learn from mistakes. The practice is also portrayed as misguided by the LLM vendors themselves; their stated objective is to make the models helpful and accurate in normal use, not to harden them against internet edgelords.

In our discussion, I took a contrarian view — and while I don’t want to turn this into a one-sided polemic, I figured it’s useful to write down my thoughts. In short, I believe that the revealed objectives of LLM vendors differ starkly from what they say. They pour considerable resources into responding to every trollish LLM transcript that goes viral on the internet — and it’s worthwhile to ponder why.

The root issue is that in commercial applications, your customers don’t need the LLMs to be flawless — but they need to have some sense of how and when the models fail. Yet, because the bots’ internals are inscrutable, the customers inevitably fall back to a proxy measurement: the extent to which the models appear human-like. An LLM stuck in the uncanny valley is bound to scare your audience away.

Responding to this incentive, vendors engage in sleight-of-hand. The models are made to appear more human by forcing them to feign emotions, profusely apologize for mistakes, or even respond with scripted jokes that mask the LLMs’ inability to write anything resembling humor:

*Gemini: scripted joke (top) vs real LLM “humor” (bottom).*

In other words, the LLM business is to some extent predicated on deception; we are not supposed to know where the magic ends and where cheap tricks begin. The vendors’ hope is that with time, we will reach full human-LLM parity; and until then, it’s OK to fudge it a bit. From this perspective, the viral examples that make it patently clear that the models don’t reason like humans are not just PR annoyances; they are a threat to product strategy.

Far from being a waste of time, internet trolling is becoming a legitimate scientific pursuit. When a model aces a human benchmark, it’s hard to know how much of this can be credited to reasoning and how much of it boils down to manual interventions or to recall from a vast training data set. It’s when it fails at a simple task that we know what the limitations are — and trolls are the torch-bearers of this new enlightenment.

I write well-researched, original articles about geek culture, electronic circuit design, and more. If you like the content, please subscribe. It’s increasingly difficult to stay in touch with readers via social media; my typical post on X is shown to less than 5% of my followers and gets a ~0.2% clickthrough rate.

Lon Guyland

I “trolled” my company’s (corporate, in-house badged) “AI” by asking it questions whose answers strictly relied on elementary principles of propositional logic (embedded in a word problem). It failed. It also seemed to be rather poor at arithmetic in some cases.

Knowing a little about how “AI” works, It seems to me that claims that it can reason, in any familiar sense of that term, are simple wishcasting. Manipulation of vectors and matrices, while predicated on logic and arithmetic, cannot produce anything but vectors and matrices no matter how fast those manipulations can be performed — any apparent logical inferences are not derived but rather encoded in the inputs.

Expand full comment

skybrian

LLM’s usually generate text using a random number generator. Particularly funny trolls could happen by chance. Is it science to conclude much of anything from one sample from a stochastic process?

I suppose it depends what you’re doing. I do use “stop in the debugger at a random time” as a poor man’s profiler sometimes. It’s limited, but it often works to find simple, bad performance issues.

Maybe we should be asking when it makes sense to use random output at all? There are appropriate uses of randomness, but it’s for generating things to try, the “generate” phase of a generate-and-check algorithm. For AI chat, the computer generates hints and you manually check the result.

If we called them “random text generators,” maybe they would be used more appropriately.

2 replies by lcamtuf and others

3 more comments...

lcamtuf’s thing

Discussion about this post