26 Comments
User's avatar
Daniel Greco's avatar

I'm very sympathetic to almost all of this, but I was a bit surprised at the very end when you got to pain. I think of the functional role of pain as being tightly connected with embodiment and homeostasis. If you've got a body that needs to be at a certain temperature, and which is vulnerable to various sorts of damage, you need some set of signals for telling you when that body is in danger, and how to move it to get out of danger. I think of pain as playing that functional role. That suggests to me that if you've got an intelligence that's trained from the start without a body, there's no strong reason to think it's going to have anything that plays a functional role similar to pain in embodied organisms. Maybe if you hooked up some LLM-style architecture to robot bodies, and did a whole lot of extra training to get the software to recognize and avoid damage to the body, then you'd get pain, but that's pretty different from the pathway we're on for now.

Expand full comment
Philosophy bear's avatar

There seems to be a sense of pain in which purely mental anguish is pain. We get support for this from the observation that people who can't feel physical pain also often can't experience sadness, fear, anger, etc, so there's a kind of negative coding essential to physical pain which is also necessary to aversive mental states that don't bear any direct relation to the body. In any case, pain or not, purely mental aversive states are deeply undesirable.

I think LLMs are reasonable good candidates for purely mental aversive states like fear and sadness. While I can imagine an argument that purely mental aversive states are, despite the name I have given them, essentially embodied, it seems plausible that one can experience these states without a body.

Expand full comment
Daniel Greco's avatar

My hunch, and it's just a hunch, is that purely mental aversive states are a kind of superstructure built on a base that functions to maintain bodily homeostasis. Once you've got an alarm system for various sorts of concrete damage, you can also have more abstract states you're monitoring (is the agent lonely? Humiliated? Bored?) and can try to avoid bad abstract signals concerning those states just like you avoid pressure and heat. But a system that isn't largely organized around monitoring and avoiding bad signals--which I take it LLMs are not--likely won't have anything like pain.

Expand full comment
ClocksAndMetersticks's avatar

I think that "body reporting bad thing happened to it" is a pretty narrow view of pain. Humans certainly feel something bad in response to abstract, mind but not body related circumstances. In particular, failure to achieve goals (ie achieving tenure, getting into dream school, etc) can cause huge mental anguish.

Perhaps you could argue that this is a holdover from evolutionarily baked in urges to achieve status/wealth/whatever other abstract goal, where in the evolutionary environment failure to meet such goals truly did result in consequences for the body--but this is certainly not obvious, and I would also argue that ala https://open.substack.com/pub/astralcodexten/p/deceptively-aligned-mesa-optimizers?utm_source=share&utm_medium=android&r=24u8e6 (sorry not sure how to embed the link) these drives can become entirely separate and misaligned from there original goal, and furthermore could originate from optimizers other than evolution (ie training for LLM's, although this is not obvious)

I think there's sufficient theoretical reason to believe LLM's or later generation ai's could experience some sort of negative states, and even if you believe it's unlikely the consequences of being wrong with how many AI's are already in use are huge.

Expand full comment
Auros's avatar

Funny you should mention Fred Jelinek (though you mis-spelled his last name, there's no C before the K). He was a mentor to me, and was more or less directly responsible for the opportunities I had for internships at IBM (with the human language research group at the TJ Watson lab in Yorktown Heights, which he founded) and Microsoft, over the summers after my junior and senior years of college. (Then I started grad school at the School of Information at Berkeley, but promptly dropped out, because who wanted to be in grad school, rather than at some startup, in 1999?)

I'm one of those people who (shudder) believes that LLMs are not, by themselves, a direct gateway to true AGI, although I think it's quite possible we'll have AGI in my lifetime. I think the Boston Dynamics terrain-navigating bots are probably a key ancestor to the eventual true AGIs, as well as the more sophisticated humaniform bots we're seeing now. To get to true AI we need something that is tethered to reality, in a way that makes self-awareness meaningful. You need an entity that is capable of modeling itself in relation to reality -- the stuff that doesn't go away if you stop believing in it -- making predictions, and then updating its mental model based on the results of those predictions. A full AGI is going to include something LLM-ish as an interface, but it's going to have other specialized modules for other purposes. Have a read some time about Figure.ai's Helix model, which splits a "fast" propriosense / motor control system, with a "slow" reasoning and planning system. I suspect a true general AI that can move around and interact with the world is going to end up replicating _something_ like the modular design that's observable in human and animal brains. The overall architecture may have some big differences from us -- it might be even more different from a human than an octopus is. But I suspect there will still be recognizable analogues due to "convergent evolution". (If you're going to have vision, you have to _somewhere_ organize and parse the visual input.)

I mostly think about whether the current generation of LLMs is useful for solving a given problem in terms of the question: Can the model provide enough structure that the prompt can stimulate an appropriate chunk of the network to produce an appropriate response. That part of the model will exist if the model was already trained on examples of such responses. Exactly how similar the responses need to be, versus how much the model can make "leaps of logic" to answer related-but-novel questions, is an interesting open question.

In any case, I have for instance found that the general purpose LLMs like Claude are quite good at pointing you to the correct IRS publication to answer a fairly complicated question about the US tax code, and usually can just directly give you an answer (although it's good to go double check it against the publication). I suspect a model specifically trained on a corpus of publications about tax law (both the actual code and official IRS writings, as well as analyses from tax lawyers) would do even better. Some friends of mine are working on training models to answer questions about building / zoning / planning codes around the US.

Expand full comment
Richard Stanford's avatar

I think "hallucinations" might be better conceptualised as "confabulations" if we're taking human pathology as our comparator. The LLM is not so much deliberately lying as casually indifferent to truth. The "they don't understand anything" objection seems dangerously close to the Chinese room or philosophical zombies - both of which essentially assume what they're trying to prove (not uncommon in philosophy!) I'd generally be sympathetic to a Wittgensteinian position when it comes to the use of words like "conscious" and "mind" but I do think LLMs have placed us in an odd position where their capabilities have overwhelmed our existing categories so we need to get beneath normal language usage to think about what the structures are that lead us to ascribe certain properties. I suspect we should end up assigning a sort of Gunkelian moral patiency to LLMs but adopting a graded approach like we do to animals - i.e. a rabbit is worth more ethical concern than a fly; maybe Claude gets more moral concern than Eliza?

Expand full comment
Ragged Clown's avatar

On confabulation: if LLMs are just choosing what to say based on some scoring algorithm, then there is no bright line between truth and lies; there is just whatever scores the most. That will usually be something that corresponds with reality, but the LLM doesn’t know what reality is. All it knows is statistics.

Expand full comment
Kenny Easwaran's avatar

I've often thought that it's useful to compare this to the kind of controlled confabulation that skilled trivia contestants use. Letting your neural connections run wild in ways that amplify signals that are only 30% reliable is much better than just leaving it blank! I often can't tell whether I'm half-remembering something or making it up.

Expand full comment
Program Denizen's avatar

I think this is an important point, which overshadows much of the rest. Using terminology like "lie" and "desire" implies intent or agency— "begging the question".

I think we need to do more than just note that the meanings are not the same in the context of LLMs, and actively avoid using the terms, preferring more accurate descriptors such as the aforementioned "confabulations" (or even just "errors", where there is no extraneous connotation added).

While an error and a lie can "look" exactly the same, there is a massive *functional* difference betwixt the two.

Expand full comment
Ljubomir Josifovski's avatar

I imagine you have already seen it, but given there maybe self-selection of readers here not scared to spend more than >30 mins with directed attention to an issue - putting forward this Karpathy video: (maybe well worth their time)

Andrej Karpathy, "Deep Dive into LLMs like ChatGPT"

https://www.youtube.com/watch?v=7xTGNNLPyMI

I find him a most talented educator, well worth anyones time really. His other videos on the channel are gems too.

Expand full comment
Kenny Easwaran's avatar

I recently downloaded that video to watch on a flight and it was great!

Expand full comment
Quiop's avatar

I'd be interested to hear more detail on why you think LLMs can have desires. I think of desires as functionally grounded by their relationship to an entity's survival and reproduction: if those aren't in play, I'm not sure what sorts of behavior could lead me to attribute desires to any entity.

Also, a minor correction: Gemini 2.5 Pro and Claude Sonnet 4 are now available to free users. (o3 still requires a subscription.)

Expand full comment
Philosophy bear's avatar

For me desires beliefs are framed in terms of what they make an entity do, both on their own and in interaction with other beliefs and desires.

Expand full comment
Quiop's avatar

Presumably some, but not all, entities are suitable subjects for desire attributions. How might we make the distinction in a principled way?

One of the problems I see here is that we sometimes attribute desires to entities that we wouldn't, upon reflection, be willing to say are "real." E.g.:

(i) ("animist") "The water wants to flow to the sea, but since the weather is so hot and dry weather it will evaporate before it gets there."

(ii) ("technological") "The browser wants to connect to the website, but it can't get through the firewall."

The second type seems most relevant to thinking about LLMs. I interpret this kind of desire attribution as grounded in the desires of the people who design and use the technologies being talked about. The browser doesn't really "want" anything, but it exhibits goal-directed behavior because people designed it to achieve certain goals.

Why wouldn't we think about LLMs as analogous the web browser? Is it just because the mechanisms by which LLMs produce their results are opaque, even to the people who make them?

Expand full comment
Kenny Easwaran's avatar

I think that there are lots and lots of gradations on how "desire-like" these things can be. The furnace "wants" the house to be warm and the river "wants" to get to the sea, but only in the simplest sense - they don't change their behavior in any useful way if there's an obstacle like the heating vent pointing outdoors, or the river reaching an endorheic basin. A thermostat "wants" the house to be 72 degrees in a slightly more sophisticated way - it has one form of feedback it receives, and controls its behavior a bit more.

I don't know much about internet packet protocols, but I remember back when people used to say "the internet treats censorship as damage and routs around it" - if the web browser has an algorithm for trying several different routes to the server, and is on a device that toggles between wifi and cellular when one gets dropped, it's doing something a bit more sophisticated than the thermostat.

I think none of this is at the level of sophistication of a mosquito "wanting" to bite a nearby mammal, let alone a mouse "wanting" the cheese, or a student "wanting" to get a good grade in a class, where each step on this ladder is robust against a greater number of disruptions, and is more creative and clever about alternative strategies to use and feedback that it pays attention to.

I don't think humans are at the peak of this scale, or even that there is a peak, but I think it's plausible that the LLM is somewhere between the web browser and the mosquito.

Expand full comment
Quiop's avatar

I agree that desire attributions involve at least the following:

(i) behavior directed towards a goal

(ii) ability to work around obstacles

Is this enough? Consider Bichat's famous definition of life as "the collection of the functions that resist death." This seems sufficient meet (i) and (ii), but I don't think it's enough to make me willing to attribute desires to any living organism. Some additional criteria must be involved.

Also, I'm not quite sure how we might apply (ii) in the case of LLMs — even if we are grant that interpret their behavior can be interpreted as goal-directed, what "obstacles" might we place in their way?

Here's a recent example from my own experience: I was trying to get ChatGPT to help me process some files with ImageMagick, and it took a bit of back and forth before I got to set of command-line parameters that gave me what I wanted.

Would it make sense to describe ChatGPT's goal in this case as "Output a set of parameters that leave the user satisfied," and say the iterations of its tweaks to the suggested parameters are its attempts to work around obstacles standing in the way of its goal (in this case, the somewhat unpredictable interactions between ImageMagick's processing algorithms and the particular features of the input files)?

Perhaps, but this seems a bit dubious. *I* was the one trying to overcome obstacles to reach a goal. It's not as if ChatGPT would have been worried or upset if I had just stopped asking questions.

If anything, I would be more inclined to interpret ChatGPT's actions here as merely the behavior of an organelle contributing (in a very complex and indirect way) to satisfying the desires of the superorganism that is OpenAI. And we don't usually attribute "desires" to organelles.

Expand full comment
Kenny Easwaran's avatar

Good points.

I think part of what makes something a desire is the fact that it works in collaboration with beliefs to produce behavior. In many of the cases we have been talking about (such as the general case of life) there is some interplay between the system and the world, but the interplay isn’t the right sort to say that the system has representations that factor into beliefs and desires. Life doesn’t always come with desires, but it’s probably a substantial part of the way there.

I doubt that LLMsare going to actually count, but I think they’re somewhere on this spectrum.

Expand full comment
Quiop's avatar

I agree — but the eligibility criteria for belief attributions are also mysterious!

One reason I have been thinking about eligibility criteria for desire comes from my skepticism that we can properly attribute beliefs to an entity without also being willing to attribute desires to it. I'm not sure the same is true in the opposite direction: we can attribute desires to an entity to which we might be unwilling to attribute beliefs. (Maybe they would be better characterized as proto-desires?)

Examples:

(i) Books, footprints, datacenters and DNA contain information, but not beliefs.

(ii) Single-celled organisms can have (proto?)desires, but (perhaps?) not beliefs?

(iii) Part of my reluctance to attribute genuine "beliefs" to LLMs comes from thinking they don't possess genuine desires; if the LLMs were linked to other technologies in a way so that it made sense to attribute desires to the resulting entity, I think it would make sense to start describing the information in the LLM in terms of "beliefs."

(These are all pretheoretical — my thoughts on this topic are poorly informed and underdeveloped.)

Expand full comment
Elizabeth Hamilton's avatar

That isn’t actual attribution of desire, that’s a shorthand, an analogy.

Expand full comment
Quiop's avatar

Yes, that's right.

Expand full comment
Philosophy bear's avatar

I didn't know about Claude 4- did know about Gemini 2.5 pro but isn't it limited to like 5 uses a day?

Expand full comment
Quiop's avatar

The Claude 4 thing seems recent — I think even a few days ago it was only offering Sonnet 3.7 to free users. Not sure about daily use limits for Gemini.

Expand full comment
John Quiggin's avatar

I had to read the ethical section a couple of times, and I'm still not absolutely clear. I assume you are saying that we (may) have ethical obligations not to mistreat AIs. Is that right and can you spell it out a bit more

Expand full comment
Philosophy bear's avatar

Yeah, you're right. It's a mess- must be a between-drafts error- I'm on the hop at the moment, but I'll fix it when I get home.

Expand full comment
Kenny Easwaran's avatar

The "stochastic parrot" paper is weirdly misnamed! It doesn't really argue for the idea that LLMs are just stochastic parrots, as much as the Bender and Koller paper "Climbing Towards NLU" does - rather, it's just a grab bag of criticisms of LLMs (and I think got an unfortunate amount of attention tied to the idea that the energy uses are large, without giving scale comparisons). It does have a useful passage about data cleaning though.

Expand full comment
Craig Yirush's avatar

Earlier this year, I asked Chat GPT about sources in my field. It gave me a fake quote and only admitted this when I pressed it for the source.

Expand full comment