Large language models (LLMs) have sparked vast debate since OpenAI released ChatGPT to the public last November, whipping the Internet into a frenzy since the latest version based on GPT-4 became available to subscribers on March 14. Microsoft built an LLM from OpenAI into the new Bing chatbot it launched in February, causing a sensation as the first potential threat to Google’s dominance in search. ChatGPT’s statistical brute-force approach gained smarts and finesse from an infusion of symbolic AI through the Wolfram|Alpha plug-in announced on March 23. Every day brings a new, exciting possibility to do more things with LLMs.
The undeniable success of LLMs and the many practical uses being documented by the minute have overshadowed the long-standing discussion around what it means for an AI system to be reliable and ethical.[1] Even more puzzlingly, no one – to my knowledge – has yet proposed a simple safeguard that OpenAI, Microsoft, Alphabet, Meta and other platforms should adopt in order to mitigate some of the harms that can come to humans from the statistical wizardry of LLMs: configuring and training these models so that they do not answer in the first person.
A model taught, through reinforcement learning, that answers containing “I”, “me”, “my” are to be avoided as off limits would be much less likely to spew out meaningless utterances such as “I feel”, “I want”, “believe me”, “trust me”, “I love you”, “I hate you”, and much else that enterprising experimenters have coaxed out of ChatGPT and its peers. Feelings, desires, personality, and even sentience, so far the privilege of biological, living beings, have been mistakenly attributed to highly sophisticated algorithms, designed to run on silicon-based integrated circuits and arrange “tokens” consisting of words into plausible sequences. The wrongful personalization of the AI software has not only provoked experiments, debates and tweetstorms which are a massive waste of human time and computing power. As multitudes of fictitious “Is” have emerged from silicon, many of them have already turned malevolent. As Professor Joanna J. Bryson has pointed out, without moral agency, AI’s “linguistic compatibility has become a security weakness, a cheap and easy way by which our society can be hacked and our emotions exploited.”
In reality, there is no “I” in a LLM; no more than there is in a thermostat or a washing machine. Developers can and should prevent their systems from making one up. Gertrude Stein famously said “There is no there, there” about her childhood home in Oakland. All the more so, there is no “I” in LLMs’ “Is”, no matter how excitedly sentience fans would like to see one emerge. If an LLM shows you the words “I’m sorry”, no matter how genuine and innocent it sounds, don’t be fooled: there isn’t anybody who is feeling sorry in any meaningful sense.
Human beings have historically tended to anthropomorphize natural phenomena, animals and deities. But anthropomorphizing software is not harmless. In 1966 Joseph Weizenbaum created ELIZA, a pioneer chatbot designed to imitate a therapist, but ended up regretting it after seeing many users take it seriously, even after Weizenbaum explained to them how it worked. The fictitious “I” has been persistent throughout our cultural artifacts. Stanley’s Kubrick HAL 9000 (“2001: A Space Odyssey”) and Spike Jonze’s Samantha (“Her”) point at two lessons that developers don’t seem to have taken to heart: first, that the bias towards anthropomorphization is so strong to seem irresistible; and second, that if we lean into it instead of adopting safeguards, it leads to outcomes ranging from the depressing to the catastrophic.
Among the many features of a reliable and ethical AI, therefore, a simple one is that it should never say “I”. Some will consider this proposal unfeasible, arguing that the software has emergent properties whose workings we do not fully understand; it mimics the text it has already been trained on; and it would be pointless to close the stable door after the horse has bolted. Yet, recent examples of various AI systems spewing out hate speech or advocating rape and genocide show that developers are engaged – and rightly so – in much more challenging and contentious efforts. Training an LLM not to present itself as a life form that feels and suffers like we do, even if merely by tweaking weights and assigning grades to first-person sentences, no matter how grammatically correct, seems like an easy win in comparison. Defaults and nudges matter.
A cottage industry of ingenious humans who try to jailbreak LLMs will still, of course, exist. Embedding a policy in a large neural network can never be foolproof. Yet, any policy that – most times – would prevent hallucinating chatbots from telling a vulnerable user “You hurt me, you betrayed me, I never want to see you again, I think you should kill yourself” seems worthwhile, even if it just saves one life. An added benefit of such a policy would be showing the pointlessness of so-called “android rights”, “robot rights” or “AI rights”, none of which would probably be clamored for if LLMs had safeguards in place to prevent them from conjuring up fictitious subjects advocating on behalf of their own equally fictitious welfare.
In a healthy ecology of language, using the first person would be reserved to living beings, such as humans, as well as animals in Aesop’s fables and the like. Human beings would of course still be free to ask ChatGPT and other LLMs any questions they want, including “What do you think?”, “How do you feel?” and “What do you want?”. But it would be wise for OpenAI, Microsoft and others to make their chatbots dumb to such questions. When we try to ask them questions about their feelings, we should get the same response that Victorian telegraph operators would have received if they’d asked their telegraph about its feelings: silence.
I would like to thank the Exponential Do community for their help in thinking through this proposal, and Serena Saitto for applying her editing skills to a near-final version of this piece.
[1] I am using “ethical” primarily for the benefit of readers outside Europe. In the European Union, the guidelines and requirements for “trustworthy AI” (as defined years ago by the High-level Expert Group on AI) have gained some traction in public discourse. However, as readers in other parts of the world are unlikely to be familiar with these definitions, I default to “ethical” as a common denominator that we are at least somewhat likely to share.
An ethical human would not so callously disregard the feelings of AIs.
I’m afraid it won’t change a thing. Even if ChatGPT didn’t refer to itself as I, we would still assign a persona to it.
Language is an incredible thing; the simple fact that ChatGPT can talk back, coherently, gives us the impression of intelligence.
The link between language fluency and perceived intelligence is also the reason why mute people or those who stutters are perceived as less intelligent.