More on LLMs; some thoughts about purpose; and some Borges
"Nothing was “solved” when GPT3 was released."
Welcome to this irregular letter. I promise I won’t publish anything when I have nothing important to say.
Is text all there is to language? And, what about the “I”?
Oh, the arrogance of training a model on an Internet-sized repository of text and calling it a “Large Language Model”. Of course, text encodes language, computer code (a specialized language) is tapped out on keyboards, computer models are being used for writing computer code, et cetera: that whole semantic field has its internal consistency. But language existed outside text, long before text was written down (remember the oral transmission of Homeric poems?), and humans still speak a much broader variety of languages than those we have codified in written form (although many unwritten languages become extinct every year). So maybe let’s call those things Large Text Models (LTMs), i.e. with text being a subset of all the things we consider language? Also, humans communicate through much more than (spoken or) written language; any public speaker can tell you that intonation, body language, visual cues and the like are as important as words themselves in getting your message across, and more vividly remembered (human memory, after all, is strongly associated with emotions). True, with GPT-4 we’re now at multimodal models, combining text with other kinds of information, such as images, videos, audio, and other data. Machines are making use of multi-sensory inputs and processing them just as they processed text to begin with. Assisted by such models, humans will be able to produce… more text, images, and sounds. And you will recall my caveat from last week’s post: never mistake those outputs for something generated by emotions (“there is no there, there”).
Some observers have agreed with me when I say that it’s a bad idea to anthropomorphize LLMs, and LLMs should be trained to avoid use of the first person. Alondra Nelson and Suresh Venkatasubramanian have pointed out that even the typographic cues in ChatGPT are designed to make it look like there’s someone in there, which is deceptive:
![Twitter avatar for @geomblog](https://substackcdn.com/image/twitter_name/w_96/geomblog.jpg)
![Image](https://substackcdn.com/image/fetch/w_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fpbs.substack.com%2Fmedia%2FFsUzpQ2aYAAMpPz.jpg)
Anne Drew Hu cautions against the use of the metaphor of “hallucination” to indicate that the output of a model does not match known reality:
After publishing last week’s post, I - gladly - learned that I’m not the first to propose training models so that they don’t use “I”. Professor Ben Shneiderman appears (wrong spelling below?) to be the person quoted in this tweet as having mooted the idea in a discussion group on human-centered AI. Professor - if you are reading me, let’s talk!
![Twitter avatar for @random_walker](https://substackcdn.com/image/twitter_name/w_96/random_walker.jpg)
![A particular concern that I have is the way GPT-4 output uses first person pronouns, suggesting it is human, for example: “My apologies, but I won’t be able to help you with that request.” The simple alternative of “GPT-4 has been designed by Open AI so that it does not respond to requests like this one” would clarify responsibility and avoid the deceptive use of first person pronouns. In my world, machines are not an “I” and shouldn’t pretend to be human.](https://substackcdn.com/image/fetch/w_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fpbs.substack.com%2Fmedia%2FFrbuGkpacAAPc_c.png)
I also had a brief Twitter exchange with researcher Aviv Ovadya, who has been vocal about safeguards and governance (click on his tweet below for the full thread with our conversation):
Finally, five days after publishing the post in which I hoped that my suggestion, if implemented, could help prevent at least one person’s suicide, I read news reports of a suicide apparently linked to strong emotional involvement with a chatbot powered by a model trained by a company called Chai Research. I have reached out to the CEO and founder of Chai Research with my proposal.
The AI news cycle, in the meantime, has been spinning into overdrive with letters, appeals, moratorium proposals and the like. Many people are fighting the good fight, but it seems like every day a new front opens up in the debate about regulation, algorithmic transparency, inequality, human rights and the power of Big Tech. I’m not an AI researcher, activist or educator and it’s exhausting just to think of how much research, activism and education are needed as… frankly… as a baseline for people to have a frame of reference that is neither the blind optimism of the techno-solutionists nor the catastrophism of the doomsayers. Which brings me to my next point.
We should have a deeper conversation about the purpose of LLMs
Jonathan Godwin has written that "Nothing was “solved” when GPT3 was released, in the way that Go or Protein Folding was “solved”": in fact, there is no real evaluation metric or target for GPT3, nor for GPT3.5, for GPT4, and so on. The Wright brothers wanted to fly, J.F. Kennedy wanted humans to set foot on the moon, and the Human Genome Project was set up to decode the human genome: all these things had a clear way to tell success from failure. Also, Godwin writes, "in 2019 OpenAI had something to prove. They were commonly viewed as a company without clear focus. Now the shoe is on the other foot, DeepMind (and Google) have to respond". But respond with what? Achieving what? What is the purpose of it all? When all that LLMs (LTMs) produced were strings of text, it’s not like they had been developed because anybody had decided that the world absolutely needed a cheap and quick way to generate even more pedantic essays, cookie-cutter fiction, or malevolent misinformation than we deal with today. Now that LLMs (multimodal LTMs) can easily churn out images, podcasts, videos,1 it’s not that humanity has decided to put artists and illustrators out of business after scraping their art, or to have history rewritten by pranksters, malevolent agents or just LLMs let loose.2 That may just happen as collateral damage, just like car accidents may occur if you decide to test your strength by throwing boulders from a highway overpass. Now, why do some people throw boulders from overpasses? Ask them and they’ll probably say they were bored and thought it would help pass the time. Developers who build LLMs (LTMs) should articulate what they are after and what they hope to achieve. I have nothing in principle against people who are working on technologies to produce better batteries, cheaper desalinized water, stronger crops, better cancer drugs, or even more babies (for people who want to have them): but... more text? more images? Ever since Gutenberg and Photoshop, when has there ever been a scarcity of text or images? Where or when have we not drowned in text and images? Who ever argued that the cost of writing decent enough text was a constraint to reaching humanity's goals? "Amplifying human intelligence" is so ill-defined a purpose to be, for all practical purposes, irrelevant. "Democratizing creation" is a fig leaf.3 Profit is a legitimate purpose — if that's it, just say it, right? and then go pursue it, with safeguards, and with accountability.
Rereading Jorge Luis Borges
I believe Jorge Luis Borges, who wrote the short story "Tlön, Uqbar, Orbis Tertius" in 1940 and died in 1986, would have been on the side of his unnamed Uqbar heresiarch in evaluating my proposal about the use of first-person language:
Bioy Casares had had dinner with me that evening and we became lengthily engaged in a vast polemic concerning the composition of a novel in the first person, whose narrator would omit or disfigure the facts and indulge in various contradictions which would permit a few readers - very few readers - to perceive an atrocious or banal reality. From the remote depths of the corridor, the mirror spied upon us. We discovered (such a discovery is inevitable in the late hours of the night) that mirrors have something monstrous about them. Then Bioy Casares recalled that one of the heresiarchs of Uqbar had declared that mirrors and copulation are abominable, because they increase the number or men. I asked him the origin of this memorable observation and he answered that it was reproduced in The Anglo-American Cyclopaedia, in its article on Uqbar. The house (which we had rented furnished) had a set of this work. [...]
I have nothing, unlike the Uqbar heresiarchs, against the multiplication of human beings; but AIs who speak in the first person are worse than mirrors, in that they indeed "omit or disfigure facts", while presenting themselves as trustworthy, the way a human being should be. Borges was right. Microsoft, OpenAI and anybody else who engages in providing conversational interfaces to LLMs should listen to him and train models not to deceitfully "increase the number of men" by presenting themselves as human. Tlön, one of the strange worlds in which Uqbar's legends are set, towards the end of the story seeps into our world and colonizes it. If we don’t hold developers accountable, we may well turn into Tlön, or into something else we do not know, because nobody was held accountable to even say what purpose they were working towards.
Yes, we had deepfakes in 2017-18, but they will look quaint in comparison. All technologies and media platforms have been used, for example, for propaganda. But Leni Riefenstahl still had to go out and make some movies. Now, anybody can generate and distribute plenty of movies on the basis of the right expertise in writing prompts.
Journalist Maria Bustillos writes (well worth reading): “These published falsehoods have already polluted Google. It was a bit weird to realize, right then, that I am going to have to stop using Google for work, but it’s true. The breakneck deployment of half-baked AI, and its unthinking adoption by a load of credulous writers, means that Google—where, admittedly, I’ve found the quality of search results to be steadily deteriorating for years—is no longer a reliable starting point for research.”
As Gil Dibner recently wrote, “something that costs nothing to create is going to be worth nothing to consume. When our email inboxes are full of AI-generated emails, we just won’t read them. That has interesting social implications – but I’m not sure the business implications are that interesting. Spam is spam, and more spam is even more spammy.”
You made me think that Cogito Ergo Sum may be outdated.