Against John Searle, Gary Marcus, the Chinese Room thought experiment and its world

Jun 15, 2022

The title is a play on “Against the Airport and its World” and is in no way intended as a slight against any named author, both of whom I respect intellectually, and do not know enough about interpersonally to evaluate as people.

There was a grammatical error in the title previously that is still reflected in the Subreddit link. The copyeditor has been executed and his line extirpated.

Searle's Chinese Room Thought Experiment: A Twist | by Rachel Anne Williams | Medium

The other day I gave an argument that it may be that the differences between whatever LaMDA is and true personhood may be more quantitative than qualitative. But there’s an old argument that no model which is based purely on processing text and outputting text can understand anything. If such models can’t understand the text they work with, then any claim they may have to personhood is at least tenuous, indeed let us grant, at least provisionally, scrapped.

That argument is the Chinese Room Argument. Gary Marcus, for example, invokes it here. To be clear, Marcus, unlike Searle does not think that no AI could be sentient, but he does think, as far as I can tell, that a pure text-in, text-out model could not be sentient for Chinese Room-related reasons. Such models merely associate text with text- they are a “giant spreadsheet” in his memorable phrase. Thus they have a purely syntactic not semantic character.

I will try to explain why I find the Chinese Room argument unconvincing, not just as proof that AI couldn’t be intelligent, but even as proof that a language model alone can’t be intelligent.

The Chinese Room argument, summarised by Searle as reprinted in the Stanford Encylopedia of Philosophy goes:

Imagine a native English speaker who knows no Chinese locked in a room full of boxes of Chinese symbols (a data base) together with a book of instructions for manipulating the symbols (the program). Imagine that people outside the room send in other Chinese symbols which, unknown to the person in the room, are questions in Chinese (the input). And imagine that by following the instructions in the program the man in the room is able to pass out Chinese symbols which are correct answers to the questions (the output). The program enables the person in the room to pass the Turing Test for understanding Chinese but he does not understand a word of Chinese.

In the original the program effectively constituted a lookup table. “Output these words in response to these inputs”.

I’ve always thought that two replies- taken jointly - capture the essence of what is wrong with the Chinese Room thought experiment.

The whole room reply: It is not the individual in the room who understands Chinese, but the room itself. This reply owes to many people, too numerous to list here.
The cognitive structure reply: The problem with the Chinese room thought experiment is that it depends upon a lookup table for all possible inputs. If the Chinese room used instead of some kind of internal model of how things relate to each other in the world in order to give its replies, it would understand Chinese- and, moreover, large swathes of the world. This reply, I believe, owes to David Braddon-Mitchell and to Frank Jackson.

The summary of the two replies I’ve endorsed, taken together, is:

The Chinese Room Operator does not understand Chinese. However, if a system with a model of interrelations of things in the world were used instead, the room as a whole, but not the operator, could be said to understand Chinese.

There need be nothing mysterious about this modeling relationship I mention here. It’s just the same kind of modeling a computer does when it predicts the weather. Roughly speaking I think X models Y if X contains parts that are isomorphic to the parts of Y, and these stand in isomorphic relationships with each other (especially the same or analogous causal relationships) that the parts of Y do. Also, the inputs and outputs of the system causally relate to the thing modeled in the appropriate way.

It is certainly possible in principle for a language model to contain such world models. It also seems to me likely that actually existing language models can be said to contain these kinds of models implicitly, though very likely not at a sufficient level of sophistication to count as people. Think about how even a simple feed-forward, fully connected neural network could model many things through its weights and biases, and through the relationships between its inputs, outputs and the world.

What are different layers in neural networks? | i2tutorials

Indeed, we know that these language models contain such world models at least to a degree. We have found nodes that correspond to variables like “positive sentiment” and “negative sentiment'“. The modeling relationship doesn’t have to be so crude as “one node, one concept” to count, but in some cases, it is.

The memorisation response

Let me briefly deal with one reply to the whole room argument that Searle makes- what if the operator of the Chinese room memorized the books and applied them? She could now function outside the room as if she were in it, but surely she wouldn’t understand Chinese. Now it might seem like I can dismiss this reply out of hand because my reply to the Chinese room combines a point about functional structure, a look-up table is not good enough. Nothing obliges me to say that if the operator memorized the lookup tables, they’d understand Chinese.

But this alone doesn’t beat Searle’s counterargument because it is possible that she calculates the answer with a model representing parts of the world, but she (or at least her English-speaking half) does not understand these calculations. Imagine that instead of memorizing a lookup table, she had memorized a vast sequence of abstract relationships- perhaps represented by complex geometric shapes, which she moves around in her mind according to rules in an abstract environment to decide what she will say next in Chinese. Let’s say that the shapes in this model implicitly represent things in the real world, with relationships between each other that are isomorphic to relationships between real things, and appropriate relationships to inputs and outputs.

Now Searle says “look, this operator still doesn’t understand Chinese, but she has the right cognitive processes according to you.”

But I have a reply- In this case I’d say that she’s effectively been bifurcated into two people, one of which doesn’t have semantic access to the meanings of what the other says. When she runs the program of interacting abstract shapes that tell her what to say in Chinese, she is bringing another person into being. This other person is separated from her, because it can’t interface with her mental processes in the right way [This “the operator is bifurcated” response is not new- c.f. many such as Haugeland who gives a more elegant and general version of it].

Making the conclusion intuitive

Let me try to make this conclusion more effective through a digression.

It is not by the redness of red that you understand the apple, it is by the relationships between different aspects of your sensory experience. The best analogy here, perhaps, is music. Unless you have perfect pitch, you wouldn’t be able to distinguish between c4 and f4 if I played them on a piano for you. You might not even be able to distinguish between c4 and c5. What you can distinguish are the relationships between notes. You will most likely be able to instantly hear the difference between me playing C4 then C#4 and me playing C4 then D4 (the interval C4-C#4 will sound sinister because it is a minor interval. The interval between C4 and D4 will sound harmonious because it is a major interval. You will know that both are rising in pitch. Your understanding comes from the relationships between bits of your experience and other bits of your experience.

I think much of the prejudice against the Chinese room comes from the fact that it receives its input in text:

Consider this judgment by Gary Marcus on claims that LaMDA possesses a kind of sentience:

Nonsense. Neither LaMDA nor any of its cousins (GPT-3) are remotely intelligent. All they do is match patterns, drawn from massive statistical databases of human language. The patterns might be cool, but language these systems utter doesn’t actually mean anything at all. And it sure as hell doesn’t mean that these systems are sentient. Which doesn’t mean that human beings can’t be taken in. In our book Rebooting AI, Ernie Davis and I called this human tendency to be suckered by The Gullibility Gap — a pernicious, modern version of pareidolia, the anthromorphic bias that allows humans to see Mother Theresa in an image of a cinnamon bun. Indeed, someone well-known at Google, Blake LeMoine, originally charged with studying how “safe” the system is, appears to have fallen in love with LaMDA, as if it were a family member or a colleague. (Newsflash: it’s not; it’s a spreadsheet for words.)

But all we humans do is match patterns in sensory experiences. True, we do so with inductive biases that help us to understand the world by predisposing us to see it in such ways, but LaMDA also contains inductive biases. The prejudice comes, in part, I think, from the fact that it’s patterns in texts, and not, say, pictures or sounds.

Now it’s important to remember that there really is nothing qualitatively different between a passage containing text, and an image because both can easily include each other. Consider this sentence. “The image is six hundred pixels by six hundred pixels. At point 1,1 there is red 116. At point 1,2 there is red 103”…” and so on. Such a sentence conveys all the information in the image. Of course, there are quantitative reasons this won’t be feasible in many cases, but they are only quantitative.

I don’t see any reason in principle that you can’t build an excellent model of the world through relationships between text alone. As I wrote a long time ago:

In hindsight, it makes a certain sense that reams and reams of text alone can be used to build the capabilities needed to answer questions like these. A lot of people remind us that these programs are really just statistical analyses of the co-occurrence of words, however complex and glorified. However, we should not forget that the statistical relationships between words in a language are isomorphic to the relations between things in the world—that isomorphism is why language works. This is to say the patterns in language use mirror the patterns of how things are. Models are transitive—if x models y, and y models z, then x models z. The upshot of these facts are that if you have a really good statistical model of how words relate to each other, that model is also implicitly a model of the world, and so we shouldn't surprised that such a model grants a kind of "understanding" about how the world works.

Now that’s an oversimplification in some ways (what about false statements, deliberate or otherwise), but in the main the point holds. Even if false narratives, things normally relate to each other in the same way they relate in the real world, generally you’ll only start walking on the ceiling if that’s key to the story, for example. The relationships between things in the world are implicit in the relationships between words in text, especially over large corpora. Not only is it possible in principle for a language model to use these, I think it’s very possible that, in practice, backpropagation could arrive at them. In fact, I find it hard to imagine the alternative, especially if you’re going to produce language to answer complex questions with answers that are more than superficially plausible.

Note: In this section, I have glossed over the theory-ladeness of perception in this section and treated perception as if it were a series of discrete “sense data” that we relate statistically, but I don’t think it would create any problems for my argument to expand it to include a more realistic view of perception. This approach just makes exposition easier.

What about qualia

I think another part of the force of the Chinese room thought experiment comes from qualia. In this world of text associated with text in which the Chinese room lives where is the redness of red? I have two responses here.

The first is that I’m not convinced that being a person requires qualia, I think that if philosophical zombies are possible, they still count as persons, and have at least some claim to ethical consideration.

The second is that qualia are poorly understood. They essentially amount to the non-functional part of experience, the redness of red that would remain even if you swapped red and green in a way that made no difference to behavior, in the famous inverted spectrum argument. Currently, we have no real leads in solving the hard problem. Thus who can say that there couldn’t be hypothetical language models that feel the wordiness of certain kinds of words? Maybe verbs are sharp and adjectives are soft. We haven’t got a theory of qualia that would rule this out.

I’d urge interested readers to read more about functionalism, probably our best current theory in the philosophy of mind. I think it puts many of these problems in perspective.

Edit: An excellent study recently came to my attention showing that when GPT-2 is taught to play chess by receiving the moves of games (in text form) as input, it knows where the pieces are, that is to say, it contains a model of the board state at any given time. https://arxiv.org/abs/2102.13249 As the authors of that paper suggest, this is a toy case that gives us evidence these word machines work by world modeling.

Appendix 1: My prior article on modeling and language

I wrote this essay back in 2019- before GPT-3. Since then I think it has held up very well. I thought I'd re-share it to see what people think has changed since then, in relation to the topics covered in this essay, and see if time has uncovered any new flaws in my reasoning.

Natural Language Processing (NLP) per Wikipedia:

“Is a sub-field of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.”

The field has seen tremendous advances during the recent explosion of progress in machine learning techniques.

Here are some of its more impressive recent achievements:

A) The Winograd Schema is a test of common sense reasoning—easy for humans, but historically almost impossible for computers—which requires the test taker to indicate which noun an ambiguous pronoun stands for. The correct answer hinges on a single word, which is different between two separate versions of the question. For example:

The city councilmen refused the demonstrators a permit because they feared violence.

The city councilmen refused the demonstrators a permit because they advocated violence.

Who does the pronoun “They” refer to in each of the instances?

The Winograd schema test was originally intended to be a more rigorous replacement for the Turing test, because it seems to require deep knowledge of how things fit together in the world, and the ability to reason about that knowledge in a linguistic context. Recent advances in NLP have allowed computers to achieve near human scores:(https://gluebenchmark.com/leaderboard/).

B) The New York Regent’s science exam is a test requiring both scientific knowledge and reasoning skills, covering an extremely broad range of topics. Some of the questions include:

1.Which equipment will best separate a mixture of iron filings and black pepper? (1) magnet (2) filter paper (3) triplebeam balance (4) voltmeter

2. Which form of energy is produced when a rubber band vibrates? (1) chemical (2) light (3) electrical (4) sound

3. Because copper is a metal, it is (1) liquid at room temperature (2) nonreactive with other substances (3) a poor conductor of electricity (4) a good conductor of heat

4. Which process in an apple tree primarily results from cell division? (1) growth (2) photosynthesis (3) gas exchange (4) waste removal

On the 8th grade, non-diagram based questions of the test, a program was recently able to score 90%. ( https://arxiv.org/pdf/1909.01958.pdf )

It’s not just about answer selection either. Progress in text generation has been impressive. See, for example, some of the text samples created by Megatron: https://arxiv.org/pdf/1909.08053.pdf

Much of this progress has been rapid. Big progress on the Winograd schema, for example, still looked like it might be decades away back in (from memory) much of 2018. The computer science is advancing very fast, but it’s not clear our concepts have kept up.

I found this relatively sudden progress in NLP surprising. In my head—and maybe this was naive—I had thought that, in order to attempt these sorts of tasks with any facility, it wouldn’t be sufficient to simply feed a computer lots of text. Instead, any “proper” attempt to understand language would have to integrate different modalities of experience and understanding, like visual and auditory, in order to build up a full picture of how things relate to each other in the world. Only on the basis of this extra-linguistic grounding could it deal flexibly with problems involving rich meanings—we might call this the multi-modality thesis. Whether the multi-modality thesis is true for some kinds of problems or not, it’s certainly true for far fewer problems than I, and many others, had suspected.

I think science-fictiony speculations generally backed me up on this (false) hunch. Most people imagined that this kind of high-level language “understanding” would be the capstone of AI research, the thing that comes after the program already has a sophisticated extra-linguistic model of the world. This sort of just seemed obvious—a great example of how assumptions you didn’t even know you were making can ruin attempts to predict the future.

In hindsight it makes a certain sense that reams and reams of text alone can be used to build the capabilities needed to answer questions like these. A lot of people remind us that these programs are really just statistical analyses of the co-occurence of words, however complex and glorified. However we should not forget that the statistical relationships between words in a language are isomorphic to the relations between things in the world—that isomorphism is why language works. This is to say the patterns in language use mirror the patterns of how things are(1). Models are transitive—if x models y, and y models z, then x models z. The upshot of these facts are that if you have a really good statistical model of how words relate to each other, that model is also implicitly a model of the world, and so we shouldn't surprised that such a model grants a kind of "understanding" about how the world works.

It might be instructive to think about what it would take to create a program which has a model of eighth grade science sufficient to understand and answer questions about hundreds of different things like “growth is driven by cell division”, and “What can magnets be used for” that wasn’t NLP led. It would be a nightmare of many different (probably handcrafted) models. Speaking somewhat loosely, language allows for intellectual capacities to be greatly compressed that's why it works. From this point of view, it shouldn’t be surprising that some of the first signs of really broad capacity—common sense reasoning, wide ranging problem solving etc., have been found in language based programs—words and their relationships are just a vastly more efficient way of representing knowledge than the alternatives.

So I find myself wondering if language is not the crown of general intelligence, but a potential shortcut to it.

A couple of weeks ago I finished this essay, read through it, and decided it was not good enough to publish. The point about language being isomorphic to the world, and that therefore any sufficiently good model of language is a model of the world, is important, but it’s kind of abstract, and far from original.

Then today I read this report by Scott Alexander of having trained GPT-2 (a language program) to play chess. I realised this was the perfect example. GPT-2 has no (visual) understanding of things like the arrangement of a chess board. But if you feed it enough sequences of alphanumerically encoded games—1.Kt-f3, d5 and so on—it begins to understand patterns in these strings of characters which are isomorphic to chess itself. Thus, for all intents and purposes, it develops a model of the rules and strategy of chess in terms of the statistical relations between linguistic objects like "d5", "Kt" and so on. In this particular case, the relationship is quite strict and invariant- the "rules" of chess become the "grammar" of chess notation.

Exactly how strong this approach is—whether GPT-2 is capable of some limited analysis, or can only overfit openings—remains to be seen. We might have a better idea as it is optimized — for example, once it is fed board states instead of sequences of moves. Either way though, it illustrates the point about isomorphism.

Of course everyday language stands in a woollier relation to sheep, pine cones, desire and quarks than the formal language of chess moves stands in relation to chess moves, and the patterns are far more complex. Modality, uncertainty, vagueness and other complexities enter- not to mention people asserting false sentences all the time- but the isomorphism between world and language is there, even if inexact.

Postscript—The Chinese Room Argument

After similar arguments are made, someone usually mentions the Chinese room thought experiment. There are, I think, two useful things to say about it:

A) The thought experiment is an argument about understanding in itself, separate from capacity to handle tasks, a difficult thing to quantify or understand. It’s unclear that there is a practical upshot for what AI can actually do.

B) A lot of the power of the thought experiment hinges on the fact that the room solves questions using a lookup table, this stacks the deck. Perhaps we be more willing to say that the room as a whole understood language if it formed an (implicit) model of how things are, and of the current context, and used those models to answer questions? Even if this doesn’t deal with all the intuition that the room cannot understand Chinese, I think it takes a bite from it (Frank Jackson, I believe, has made this argument).

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

(1)—Strictly of course only the patterns in true sentences mirror, or are isomorphic to, the arrangement of the world, but most sentences people utter are at least approximately true.

Appendix 2: My recent article on claims of LaMDA and sentience

Blake Lemoine is an engineer who worked for Google. He is claiming LaMDA, a language model he worked with, was sentient. Google put him on unpaid leave. Most people think his claim is absurd because language models are models of what word is most likely to follow a prior sequence of words (see, for example, GPT-3). How could such a thing be sentient? Moreover, there are unmistakable oddities and logical gaps in the behavior of LaMDA in the very transcripts that Lemoine is relying on- some proof of personhood then!

Just spitballing here, putting a hypothesis forward in a spirit of play and humility, but I wonder if Lemoine’s claim is not as absurd as many think. The concept of sentience is quite elusive, so let’s leave it behind for something slightly better understood- personhood. I think that it is conceivable that LaMDA contains persons. However my reasons, unlike Blake Lemonie’s, have little to do with a given conversation in which the model claimed to be sentient or a person.

When a language model is guessing the next token given that transformers are black boxes, we can’t rule out the possibility it is simulating interacting beliefs, desires, and emotions of the hypothetical author it is “roleplaying”. Simulation in this sense is quite a minimal concept, all that is needed are structures that interact and influence each other in a way isomorphic, at a very high level of abstraction to the interactions of desires and emotions in a real person. It is conceivable that it has built such a model of interacting mental states as the most accurate way to predict the next word of text. After all, language models seem to have built an implicit model of how things are related in the world (a world model) through very high-level models of how words co-occur with each other. Simulation of a person might be the best way to guess what a person would say next.

This might have precedent in human psychology. Perhaps the most popular account of human theory of mind capabilities is the simulation theory of folk psychology- c.f. Alvin Goldman. According to this theory, we predict what people will do in a given situation by simulating them. This makes intuitive sense. The human mind contains many working parts, for a process so complex, running a model of it seems like the best way to make a prediction as to what it will do.

But if you accept that a working person simulation is a person, which many do, it follows that LaMDA contains a person or many people, or perhaps one should say it creates a person every time it has to predict the next token. Note, however, that in whatever way you phrase it, it is not that LaMDA itself is a person on this model. Rather a good emulation of a person (and thus a person) might be part of it.

Now let me double back to scale down a previous claim. It’s not quite that a working person emulation is a person, it’s that a working person emulation over a certain degree of complexity is a person.

We need to add this stipulation because if every emulation were a person, it would be likely that you and I also contain multiple people. Perhaps personhood is a matter of degree, with no sharp boundaries, like the term “heap”. The more complex the simulated mass of beliefs, desires and other mental states is, the more like a real person it is. If LaMDA is simulating people, whether or not those simulations are themselves people will depend on whether they cross the complexity threshold. To some degree, this may be a purely verbal question.

This brings us back to the objection that LaMDA’s behavior in the transcripts involves jumps a real person wouldn’t make. This probably represents, at least in part, failures of its model of persons, either through insufficient detail or through the inclusion of inaccurate detail. Do these breakdowns in the model mean that no personhood is present? That’s a matter of degree, it’s a bit like asking whether something is enough of a heap to count- very hard to answer.

To summarise:

1. I don't see how we can rule out the possibility Lambada runs something like a person model to predict what a writer would write next, with interacting virtual components isomorphic to beliefs, desires, and other mental states. I believe that the transformer architecture is flexible enough to run such a simulation, as shown by the fact that it can clearly achieve a kind of world model through modeling the associations of words.

2. I don’t think we can rule out the possibility that the model of a person invoked could be quite a sophisticated one.

3. I also don’t think we can rule out the view that a model or simulation of a person, above a certain threshold of sophistication, is itself a person.

On the basis of these considerations, I don’t think the claim LaMDA is a person, or rather ‘contains’ in some sense persons, is as absurd as it may appear at first blush. This has little to do with Lemoine's route to the claim, but it is not counterposed to it. There’s nothing particularly special about LaMDA claiming to be a person, but the conversations that led Lemoine to agree with it involve a degree of “psychological” “depth”, which might illustrate the complexity of the required simulation.

Edit: I should be clearer about what I mean by saying a model only has to be abstract and high level to count as a model of a person. I don’t mean sensible models of persons can be simple or, lacking in detail. Rather, I mean that the relationship of isomorphism that is required is an abstract one. For example, if the machine is modeling an interacting set of beliefs, desires, habits, etc. to guess what an author would say next, the components of the model do not have to be explicitly labeled as “belief” “desire” etc. Instead, they just have to interact with each other in corresponding patterns to those that beliefs, desires, and habits really do, or rather an approximation of such. In other words, they have to function like beliefs, desires, habits, etc.

Edit x2:

On another thread, @TheAncientGreek wrote: “We already disbelieve in momentary persons. In the original imitation game, the one that the Turing test is based on, people answer questions as if they are historical figures, and the other players have to guess who they are pretending to be. But no one thinks a player briefly becomes Napoleon.”

I responded: “I believe that in the process of simulating another person you effectively create a quasi-person who is separated from true personhood only by a matter of degree. Humans seem to guess what other people would do by simulating them, according to our best current models of how folk psychology works. These emulations of other don't count as persons, but not for any qualitative reason, only due to a matter of degree.

If we were much more intelligent and better at simulating others than we are, then we really would temporarily create a "Napoleon" when we pretended to be him. A caveat here is important, it's not Napoleon, it's a being psychologically similar to Napoleon (if we are good imitators).”

I’ve included my response here because I think it’s probably the most important objection to my argument here.

Edit x3:

I say it in the body of the essay, but let me spell it out again. My claim is not:

>LaMDA is a person

My claim is more like:

>LaMDA creates simulations of persons to answer questions that differ from real people primarily on a quantitative rather than a qualitative dimension. Whether you want to say it crosses the line is a matter of degree.

It very probably doesn’t, on a fair drawing of the line, reach personhood. But it’s much more interesting to me that it’s only a matter of degree between it and personhood than that it doesn’t happen to reach that degree, if that makes sense.

The Ancient Geek

Jun 16, 2022Edited

You keep treating possibilities as actualities. LamDa might be simulating people without being programmed or prompted to, the CR might have full semantics without a single symbol being grounded ..but they might not.

Expand full comment

Jun 16, 2022

There isn't any fundamental.doubt about what computers are doing, because they are computers. Computers can't have strongly emergent properties. You can peek inside the box and see what's going on.

The weakness of the systems reply to the CR is that it forces you to accept that a system that is nothing but a look up table has consciousness.. or that a system without a single grounded symbol has semantics. (Searle can close the loophole about encoding images by stipulation).

Likewise, there is no reason to suppose that LamDa is simulating a person every time it answers a request -- it's not designed to do that, and it's not going to do so inexplicably because it's a computer, and you can examine what it's doing.

2 replies by Philosophy bear and others

3 more comments...

Philosophy bear

Against John Searle, Gary Marcus, the Chinese Room thought experiment and its world

Appendix 1: My prior article on modeling and language

Appendix 2: My recent article on claims of LaMDA and sentience

Discussion about this post