Feb 1, 2023·edited Feb 1, 2023Liked by Philosophy bear

As someone who works with language models, and has spent some time mucking about in their innards, I'd like to split the "world model" thing into subquestions:

1) Do stacks of encoder and/or decoder layers have a world model in them?

2) Do (a) the outputs vanilla GPTs generate naively, or (b) the outputs some GPTs are tuned to generate using RLHF or similar techniques, draw upon that world model in consistent and reasonable way, or are they just kind of BSing their way through it in ways seem humanlike to humans?

The answer to (1) is definitely yes; there's lots of things language models can do that they wouldn't be able to do if they didn't have a world model in them.

I think the answer to (2a) is mostly no; vanilla GPT-3's output lurches around conceptually in ways that are often bizarre and incoherent.

The answer is (2b) is unclear, but I lean toward yes. A system that has access to a world model and is rewarded for behaving as if it is drawing upon that world model is probably drawing upon that world model at least to some degree.

ChatGPT still has some serious issues using its world model, though - it's extremely prone to making things up that don't exist; it seems to really want to give affirmative-seeming answers to questions ("does Python library X provide functionality for Y?") even if it should be giving a negative answer. So it may be drawing upon the world model when it can, but BSing in certain cases because it wants to say "yes" even though its world model tells it "no".

Expand full comment

Yeah. The answer to 2b is definitely "a bit of both".

What I can't believe is that 1 is still in debate. It's a solved issue!

Expand full comment

as a counterpoint, you may find this article interesting to read: https://deoxyribose.github.io/No-Shortcuts-to-Knowledge/

Expand full comment

"I'm a philosopher with an interest in language and AI. "

You don't seem to be on the LibDem network. Get on it, quickly:


Only elf-eared philosophers are reliable.

Expand full comment

Gary Marcus et al.'s arguments aren't philosophical, they're empirical. They look at the model's output and point out examples demonstrating that your reason #1 just does not hold.

As for reasons #2 and #3, nobody disagrees (philosophically, I'm sure some technical "no, the current size actually isn't large enough" arguments exist, but some finite size clearly should be enough), but something being possible in principle is not the same as it being true in practice, and the burden of proof regarding something actually being true in practice is on the proponent. The problem is, there's none, the models all being black boxes. So we're stuck with examining their output, and it contains enough confidently spouted bullshit of the "[someone] died in 2018, so he couldn't have been alive in 2001" kind to suggest they're just reciting formulas, and whenever they "pass" some test of understanding, it happens by chance (or intentional overfitting, we know the models are highly curated) rather than some general ability.

Expand full comment

I do think that people are making the philosophical argument, and I definitely think Marcus has made it on occasion (his "glorified spreadsheet" line, and many of his acolytes have definitely made it).

I see the empirical argument as largely one for machine learning specialists and cognitive scientists, but I think team LLM is winning that at the moment. The argument against reeks too much of cherry-picking, sober analysis of the leaderboards etc. suggests that the range of questions that can trip up large language models is ever shrinking.

Expand full comment

Is the ChatGPT response you quote part of the canned dialog/controller scripts, or actually generated? Its style seems similar to the other boilerplate text that the system returns when asked if it is sentient or about specific websites: these texts are hardcoded and better seen as part of the scaffolding, not as interesting outputs (unless we are interested in studying which narrative OpenAI is pushing).

If similar inputs even when phrased differently lead to largely identical output then I would tend to classify these as boilerplate.

Expand full comment

Perhaps another way of reasoning about the underlying 'theory of world' is to imagine a situation where alien archaeologists come to an extinct Earth, armed with language models. They manage to identify vast stores of data which they conclude represent a large chunk of linguistic corpa and then run them through their highly advanced language models with big onboard supercomputers. These models have the disadvantage of having to sort through an alien coding structure, but this should not be an in principle insurmountable problem.

The archaeologists can then inquire through these models about the nature of this extinct Earth society, where they have no context to define for them weird things like the "United Nations" or "Canada". I would think that this archaeological language model generates a model of the world – as well as a model of language that predicts the data structures that they encounter.

Expand full comment