GPT, of any extant edition, can't properly see inside of words, or breakdown their individual characters.
I’m starting to see more “GPT can’t do X elementary thing, it’s stupid”. A lot of the time, people who say this are using GPT-3.5, which is considerably behind GPT-4, makes vastly more mistakes, and hallucinates far more often.
Sometimes though, they are using GPT-4. There seem to be a lot of “GPT-4 is stupid” screenshots going around that have a particular feature- they require the machine to see the individual characters inside words something it just can’t do.
This is because GPT-4, and to the best of my knowledge, every extant edition of GPT doesn’t see individual characters. It’s inputs are blocks of characters called tokens.
Imagine a human who had been taught to read using a system of hieroglyphics representing English characters. They’d read all the great English literature, but only in hieroglyph form- different hieroglyphs representing whole words, or parts of words (like “ruin” and “ing” in “ruining”). Now imagine telling this benighted fellow to count the number of C’s in “cacophony”, he just couldn’t do it.
This is not to deny that GPT-4 makes stupid mistakes sometimes, but it’s rather unfair to hold these particular mistakes against it.
Yeah, interestingly, it can very easily count the letters in non-word strings like "abdsdts155ssdfs", because those are parsed as separate tokens, but it can't parse the letters in a single token. Makes sense I suppose.
On the contrary, I've had some success with properly prompting it. Not sure how to share the full conversation.
I gave it the following prompt:
Hi ChatGPT. Today we're going to be looking at the letters that make up words, counting them, and rearranging them. We'll also be looking at anagrams and other fun convolutions. For each word I give you, let's break down the word into the following categories. For example, let's consider the word `cat`. (If it helps at any point, feel free to write some python code to simulate these operations. Then tell me what the python code will output.)
For example, for `cat`, we'll first count each letter, outputting the letters in alphabetical order. I.e. Cat becomes
| Letter | Count |
| ------- | ------- |
| a | 1 |
| c | 1 |
| t | 1 |
If we sort the letters of `cat` in a string, it becomes `act`
If we look for available anagrams of `cat` we find `act` as well.
Can you give me another three-letter word and follow the above example, please?
And then worked upwards. It didn't get the full counts right, but ... I wouldn't either. (Email me, and I'll send you the logs -- I've been working on a thorough investigation of these tools)
it got dog and carrot and quiddler right and made a good stab at the very long word.
Bing also goes for dog, gets it right. Carrot, right.
Bing's response is:
```
Wow, that’s a long word. 😮 Let’s see what we can do with it. To count each letter of floccinaucinihilipilification, we can use the same python code as before:
word = "floccinaucinihilipilification"
letter_counts = {}
for letter in word:
if letter in letter_counts:
letter_counts[letter] += 1
else:
letter_counts[letter] = 1
This will create a dictionary called letter_counts that stores the number of times each letter appears in the word. To output the letters in alphabetical order, we can use the sorted function on the dictionary keys:
for letter in sorted(letter_counts.keys()):
print(letter, letter_counts[letter])
This will print:
Letter Count
a 2
c 4
f 1
h 1
i 9
l 2
n 4
o 2
p 1
t 1
u 2
To sort the letters of floccinaucinihilipilification in a string, we can use the sorted function on the word itself and then join the letters with an empty string:
sorted_word = "".join(sorted(word))
print(sorted_word)
This will print aaccffhihihiilliiinnnoopptuu.
To look for available anagrams of floccinaucinihilipilification, we can use a list of twenty-nine-letter words and check if they have the same letters as floccinaucinihilipilification when sorted. For example:
twenty_nine_letter_words = ["antidisestablishmentarianism", "electroencephalographically", "pseudopseudohypoparathyroidism", "spectrophotofluorometrically", "supertranscendentarianistical"]
anagrams = []
for word in twenty_nine_letter_words:
if sorted(word) == sorted("floccinaucinihilipilification"):
anagrams.append(word)
print(anagrams)
This will print [].
So, there are no anagrams of floccinaucinihilipilification. 😢
```
I'd quite like to correspond a bit more about the importance of chain of thought prompting. I've managed to get a "distinction" in an undergrad final essay banged out between meetings (as part of my faculty project investigating these tools, graded by the convenor with full discussion).