FYI, you can currently access the most…

May 19, 2024

RANT: For a long time, the public understanding of LLM’s has been hampered by the following phenomenon.

8 Comments

May 21, 2024

The bar for AGI is getting pretty high; being able to do anything any human can do. Nobody's that capable. First example that comes to mind is languages: a serious polyglot might be fluent in five or ten, but there ~7000 not counting conlangs.

Expand full comment

May 19, 2024

Btw, had to log in (I already have an account) to get anything other than 3.5.

Expand full comment

Solomon Maxwell

May 19, 2024

Poe is better. Not with single prompts usually, but because you can use their bots and run multiple LLMs inside the same context and history, and it's been free.

Expand full comment

John Quiggin

May 19, 2024

Is there really that big a difference between 3.5 and 4.0. I'm currently using the FastGPT option in Kagi, which works pretty well. I'm not sure what the underlying LLM is

Expand full comment

Reply (2)

Philosophy bear

May 19, 2024

One quantitative way of summing up the gap is that 3.5 gets about 70% on the MMLU, and GPT-4o gets about 90%. On the ultra hard GPQA, 3.5 gets 28.1%- not significantly different from chance- and 4o gets 53%.

Here's an illustration of the gap. I asked GPT-3.5

"Tell me, does an LLM have more or fewer reasons to be a philosophical skeptic than a human?"

It responded:

"The inclination toward philosophical skepticism isn't inherently tied to being human or non-human. Rather, it's about questioning assumptions, critically examining beliefs, and demanding evidence for claims. An LLM (Master of Laws) might engage in philosophical skepticism just as a human would, albeit with potentially different perspectives and motivations.

Humans may have certain cognitive biases or emotional attachments that influence their skepticism, whereas an LLM, being an artificial intelligence, might approach skepticism from a more purely rational standpoint, devoid of emotional biases. However, an LLM's skepticism could also stem from limitations in its programming, data availability, or understanding of human concepts.

In essence, the reasons for philosophical skepticism can vary widely among individuals, regardless of whether they are human or artificial intelligence. It depends more on the individual's mindset, experiences, and intellectual disposition rather than their specific nature as a human or an LLM."

Which I thought was a pretty bad answer. ChatGPT-4o was, in my view, a fair bit better, though far from perfect. Partly this was simply because it was longer, but I do think it tackled the main points at least in passing:

"The question of whether a large language model (LLM) like myself has more or fewer reasons to be a philosophical skeptic than a human involves examining the nature of skepticism and the foundational differences between humans and LLMs.

Understanding Philosophical Skepticism

Philosophical skepticism questions the possibility of certainty in knowledge. Skeptics argue that humans can never have absolute knowledge about the world due to limitations in perception, reasoning, and the potential for deception.

Reasons for Skepticism in Humans

Sensory Deception: Human senses can be deceived (e.g., optical illusions).

Cognitive Biases: Humans are prone to various cognitive biases that distort perception and reasoning.

Subjectivity: Personal experiences and emotions influence human understanding, leading to subjective interpretations.

Complexity of Reality: The complexity and potentially unknowable aspects of the universe make absolute knowledge difficult.

Philosophical Arguments: Thought experiments like Descartes' "evil demon" or the "brain in a vat" highlight the possibility of being fundamentally deceived about reality.

Reasons for Skepticism in an LLM

Data Dependence: An LLM's knowledge is entirely dependent on the data it has been trained on. If the data is incomplete or biased, the LLM's outputs will reflect those limitations.

Algorithmic Constraints: The algorithms governing LLMs have limitations and may not capture the full complexity of human experience or reality.

Lack of Sensory Experience: LLMs lack direct sensory experience, relying solely on text-based information. This absence of direct interaction with the world could be seen as a reason for skepticism.

Absence of Subjectivity: While this could be an advantage, the lack of personal experience and consciousness might make an LLM's "understanding" fundamentally different from human understanding, potentially more detached and abstract.

Comparative Analysis

Certainty and Data Quality: Humans often have direct, albeit fallible, sensory experiences, while LLMs rely on second-hand, pre-processed data. This makes LLMs dependent on the quality and comprehensiveness of their training data.

Biases and Errors: Humans are prone to cognitive biases, but LLMs are susceptible to biases in their training data and algorithmic processes.

Foundational Beliefs: Humans form beliefs based on a mix of sensory input, emotion, and reason, which can be inconsistent. LLMs generate responses based on probabilistic patterns in data, lacking true beliefs or convictions.

Existential Reflection: Humans engage in existential reflection and question their own existence and knowledge, which is a core aspect of philosophical skepticism. LLMs do not possess self-awareness or existential concerns; they operate based on their programming without subjective experience.

Conclusion

In conclusion, whether an LLM has more or fewer reasons to be a philosophical skeptic than a human depends on the perspective taken. LLMs have clear epistemological limitations due to their reliance on data and algorithms, which could be seen as grounds for skepticism about their "knowledge." However, they lack the subjective experiences and cognitive biases that contribute to human skepticism.

Humans, with their direct but fallible experiences and cognitive complexities, have rich grounds for skepticism but also possess the tools for self-reflection and philosophical inquiry. LLMs, on the other hand, operate without consciousness or personal experience, making their "skepticism" purely theoretical and algorithmic rather than a genuine philosophical stance."

Expand full comment

Reply (1)

Solomon Maxwell

May 19, 2024

It's better, but using multiple LLMs, bots/GPTs and prompts is better.

Expand full comment

Quiop

May 19, 2024Edited

There really is a big difference. If 3.5 is a mediocre high-schooler who didn't pay attention in class and is trying to finish the assignment the night before it's due by pulling random stuff off the internet, 4.0 is a reasonably intelligent undergraduate who has done most of the reading but sometimes needs to pay more attention to detail.

I recently crowd-sourced a survey of LLM responses to a simple question about historical linguistics, to which 3.5 gave a pretty terrible response — you can see the answers in this thread:

https://www.astralcodexten.com/p/open-thread-329/comment/56268387

[Edit: On the other hand, I asked ChatGPT-4o to give me a metaphor for the difference between ChatGPT3.5 and ChatGPT4.0, and the response was laughably inept:

"Comparing ChatGPT 3.5 to ChatGPT 4.0 is like upgrading from a reliable compact car to a sleek, high-performance sports car. Both will get you where you need to go, but the 4.0 version offers faster acceleration, smoother handling, and a more advanced set of features under the hood, making for an overall more exhilarating and refined driving experience."]

Expand full comment

Hoopdawg

May 19, 2024

When I want to play tripping up chatbots, I just go to https://chat.lmsys.org nowadays. (That way you not only can see what the state-of-the-art can do, it incentivizes you to calibrate the question difficulty at around the frontier of its capabilities.)

(But that's not how most of those conversations play out. How they usually play out is that someone in the comments points out that if they rephrase the question by providing enough background information and nudges, they can induce the chatbot to answer correctly, and the pro-LLM crowd leaves satisfied.)

Expand full comment