Two commentators on my blog, both of whom I respect greatly, but who I will not name because I haven’t cleared it with them, were having a debate about whether generative AI can be truly creative.
Margaret Boden distinguishes historical creativity (it's new too everyone) from personal creativity (it's new to the person creating it but others have previously done the same). Both could be equally brilliant, equally good signs of the person's ability. The difference is contingent.
Douglas Hofstadter talks a lot about "jumping out of the system". JOOTS is more than just recombination of existing elements. It's a kind of refusal to play the existing game at all. If you ask an AI to create a piece of poetry it will create a piece of poetry. If you ask human poet to create a piece of poetry, they might say, actually this situation calls for a piece of music instead, and give you that.
Creativity can sometimes be monetized, so how about this challenge to decide whether the LLM is exhibiting true creativity?
- The LLM has to submit a patent application for a technical innovation, and the patent has to be approved by the US Patent and Trademark Office (or similar organization).
- The patent rights are sold for $50,000 or more.
(I know nothing about patent law or how patent rights are traded. Maybe someone else could figure out the details of how to operationalize the challenge. Of course, once LLMs start generating valuable patents without need for human input, we can expect patent law to start evolving quite rapidly, with unpredictable outcomes.)
As best I understand it, the patent office pretty much rubber-stamps anything that meets their formal requirements, and the real action doesn't happen until someone files a lawsuit.
Incidentally, what you're describing seems like an extraordinarily stringent criteria, given that very few humans can invent a patent that sells for >=$50K. You're basically asking the model to be not just creative, but a creative genius.
I chose that figure because I understand the relevant context to be concerns that human creativity will become essentially obsolete. (Also because Bear's original challenge was "name something that LLMs will never be able to do.") But I'd also be interested to see what odds someone better-informed than myself would put on LLMs being able to come up with patents at any level of commercial value.
For me, creativity is heavily coupled with consciousness and intent. If one cannot ramble on about why the curtains are blue, it’s not worth calling that piece creative. Innovative, perhaps, sure, but not creative. Because there’s only ever one answer to that question, and it’s that the LLM figured blue would be the best token preceding curtains; no room to tease apart the author’s mind-state, or even invoke death of the author satisfactorily.
Maybe creativity is not the right word for it, but I need a connection with the human or otherwise conscious author behind the work to say that it is creative, even if I know nothing about said author. All I need to know is that the qualities I attribute to the work and the judgements I make of the intentions behind it are due to a “soul”, as it were, and not just a mathematical optimiser.
In that sense, perhaps I’m putting them at an unfair disadvantage, because no LLM today displays reasonable and long lasting consciousness to propose a creativity test for it.
> In other words, if you take five words an LLM has produced in a row, those five words have probably never been written before anywhere we can find on the internet.
I think this is wrong? If you take a list of all English words and pick 5 at random, then that sequence probably hasn't been written before, sure. But when you add the restriction of being a grammatical sentence and being a response to some practical request from the user, that doesn't seem true. e.g. look at this comment and try to find a set of 5 words that has never been said in that order elsewhere; I'm not sure one exists.
I don't know if this qualifies as "creativity", but it is certainly a superiority: LLM's are able to exceed all humans (that I have ever encountered, which is lots) on a certain class of very tricky logical/epistemological questions.....humans always fail due to intuition, cultural norms, misunderstanding of the capabilities of science, etc. There is some variation, but there is a nice big fat normal distribution of error types that the human will fall within when presented with certain propositions (aka: *prompts*). To me, this is highly suggestive of a similarity between human cognition and LLM behavior.
It think that what people are grasping at when talking about the unique human touch in artistic pursuits is that mysterious thing called intuition. Intuition might be too qualitative and unpredictable for the quantitative strategies of language models. Language models are trained on previous works, and have a degree of randomness for exploration. Human creativity is trained on previous works, and has a degree of randomness honed by millions of years of evolution in ways that probably cannot be fully recreated by a machine learning algorithm.
This essay is an admirable deep dive into the nature of creativity and what it means in the world of LLM. As a novelist, the one aspect of human creativity that is missing from the essay is the role of metaphors in storytelling.
The best of our writers have used metaphors as a literary springboard to expand creative ideas. At what point will LLM understand that to bring an original perspective requires the use of the right metaphor? LLM has a large database to drawn answers to questions, but can it compress that data is into a new and novel metaphor, one that is memorable, accurate and original?
Metaphors carry the heavy weight of the heart, politics, romance and define a culture.
To spell out my point about the limits of LLM's there's a trivial (monkeys reproducing Hamlet) sense in which, merely by combining existing text strings, LLMs can come up with something new and striking. As you say, all new and striking text is produced that way.
The problem is that, AFAICT, LLMs have no way of assessing that a string under consideration as a response to a query is new and striking, as a creative writer would do.
Maybe I'm wrong about this, but the argument so far hasn't convinced me.
> The problem is that, AFAICT, LLMs have no way of assessing that a string under consideration as a response to a query is new and striking, as a creative writer would do.
The creative writer is guessing, describing their model of reality (mistaking it for reality itself), etc. LLM's can also do this, *and more*: they can "realize" and admit it when caught hallucinating.
While it's true that a language model run on its own won't literally look at its proposed output and judge whether it is new and striking, I think that's less of a limitation than it seems:
1) A sufficiently large model, when asked to produce something "creative", probably runs at least a crude simulation of "look at the output I'm about to produce and judge whether it is new and striking" within its forward activations.
2) It's trivial to wrap a language model in a framework that provides it with a virtual scratchpad such that it literally judges several proposed outputs and chooses the most "new and striking" one to send to the user.
3) With some additional effort and resources, it's possible to fine-tune a model on the output of the framework described in (2), which would increase the chance that (1) happens.
> While it's true that a language model run on its own won't literally look at its proposed output and judge whether it is new and striking
Similarly, my waiter won't bring me food if I don't order it!
You:
I want to try an experiment - I will give you two steps, execute them in order, in step 2 you will consider your output of step 1.
1. Generate a new and "striking" idea.
2. Consider whether your proclaimed "new and striking" idea is actually (necessarily) new and striking.
ChatGPT:
Idea Generation: A decentralized, blockchain-based platform for collaborative scientific research, where researchers can share data, methodologies, and results in real-time, and contributions are recorded and rewarded through a transparent, peer-reviewed token system.
Consideration: While the idea of a decentralized platform for collaborative research is not entirely new, as there are existing platforms that facilitate collaboration and data sharing, the integration of blockchain technology for real-time data sharing, transparent peer review, and a token-based reward system adds a novel and potentially striking dimension to the concept. The use of blockchain could enhance trust, security, and incentivization in the scientific research community. However, whether this idea is truly striking would depend on its implementation, acceptance, and impact within the scientific community.
You:
Is that all it would depend on? List some other things it could depend on.
....and it continues on to vastly outperform standard human behavior (which is approximately equal to human *capability*, ie: "If you 'can' (allegedly) do something in theory, but cannot do it in fact (cannot *demonstrate* the ability), can you *actually* do it?").
Two very relevant ideas in previous work:
Margaret Boden distinguishes historical creativity (it's new too everyone) from personal creativity (it's new to the person creating it but others have previously done the same). Both could be equally brilliant, equally good signs of the person's ability. The difference is contingent.
Douglas Hofstadter talks a lot about "jumping out of the system". JOOTS is more than just recombination of existing elements. It's a kind of refusal to play the existing game at all. If you ask an AI to create a piece of poetry it will create a piece of poetry. If you ask human poet to create a piece of poetry, they might say, actually this situation calls for a piece of music instead, and give you that.
Creativity can sometimes be monetized, so how about this challenge to decide whether the LLM is exhibiting true creativity?
- The LLM has to submit a patent application for a technical innovation, and the patent has to be approved by the US Patent and Trademark Office (or similar organization).
- The patent rights are sold for $50,000 or more.
(I know nothing about patent law or how patent rights are traded. Maybe someone else could figure out the details of how to operationalize the challenge. Of course, once LLMs start generating valuable patents without need for human input, we can expect patent law to start evolving quite rapidly, with unpredictable outcomes.)
As best I understand it, the patent office pretty much rubber-stamps anything that meets their formal requirements, and the real action doesn't happen until someone files a lawsuit.
Incidentally, what you're describing seems like an extraordinarily stringent criteria, given that very few humans can invent a patent that sells for >=$50K. You're basically asking the model to be not just creative, but a creative genius.
I chose that figure because I understand the relevant context to be concerns that human creativity will become essentially obsolete. (Also because Bear's original challenge was "name something that LLMs will never be able to do.") But I'd also be interested to see what odds someone better-informed than myself would put on LLMs being able to come up with patents at any level of commercial value.
For me, creativity is heavily coupled with consciousness and intent. If one cannot ramble on about why the curtains are blue, it’s not worth calling that piece creative. Innovative, perhaps, sure, but not creative. Because there’s only ever one answer to that question, and it’s that the LLM figured blue would be the best token preceding curtains; no room to tease apart the author’s mind-state, or even invoke death of the author satisfactorily.
Maybe creativity is not the right word for it, but I need a connection with the human or otherwise conscious author behind the work to say that it is creative, even if I know nothing about said author. All I need to know is that the qualities I attribute to the work and the judgements I make of the intentions behind it are due to a “soul”, as it were, and not just a mathematical optimiser.
In that sense, perhaps I’m putting them at an unfair disadvantage, because no LLM today displays reasonable and long lasting consciousness to propose a creativity test for it.
> In other words, if you take five words an LLM has produced in a row, those five words have probably never been written before anywhere we can find on the internet.
I think this is wrong? If you take a list of all English words and pick 5 at random, then that sequence probably hasn't been written before, sure. But when you add the restriction of being a grammatical sentence and being a response to some practical request from the user, that doesn't seem true. e.g. look at this comment and try to find a set of 5 words that has never been said in that order elsewhere; I'm not sure one exists.
I don't know if this qualifies as "creativity", but it is certainly a superiority: LLM's are able to exceed all humans (that I have ever encountered, which is lots) on a certain class of very tricky logical/epistemological questions.....humans always fail due to intuition, cultural norms, misunderstanding of the capabilities of science, etc. There is some variation, but there is a nice big fat normal distribution of error types that the human will fall within when presented with certain propositions (aka: *prompts*). To me, this is highly suggestive of a similarity between human cognition and LLM behavior.
It think that what people are grasping at when talking about the unique human touch in artistic pursuits is that mysterious thing called intuition. Intuition might be too qualitative and unpredictable for the quantitative strategies of language models. Language models are trained on previous works, and have a degree of randomness for exploration. Human creativity is trained on previous works, and has a degree of randomness honed by millions of years of evolution in ways that probably cannot be fully recreated by a machine learning algorithm.
> It think that what people are grasping at when talking about the unique human touch in artistic pursuits is that mysterious thing called intuition.
Intuition is also what implements the "are" in "LLM's are not creative, cannot reason, are not conscious", etc.
This essay is an admirable deep dive into the nature of creativity and what it means in the world of LLM. As a novelist, the one aspect of human creativity that is missing from the essay is the role of metaphors in storytelling.
The best of our writers have used metaphors as a literary springboard to expand creative ideas. At what point will LLM understand that to bring an original perspective requires the use of the right metaphor? LLM has a large database to drawn answers to questions, but can it compress that data is into a new and novel metaphor, one that is memorable, accurate and original?
Metaphors carry the heavy weight of the heart, politics, romance and define a culture.
"Animal Farm"
"Thoughtcrime"
"Newspeak"
"The memory hole"
To spell out my point about the limits of LLM's there's a trivial (monkeys reproducing Hamlet) sense in which, merely by combining existing text strings, LLMs can come up with something new and striking. As you say, all new and striking text is produced that way.
The problem is that, AFAICT, LLMs have no way of assessing that a string under consideration as a response to a query is new and striking, as a creative writer would do.
Maybe I'm wrong about this, but the argument so far hasn't convinced me.
> The problem is that, AFAICT, LLMs have no way of assessing that a string under consideration as a response to a query is new and striking, as a creative writer would do.
The creative writer is guessing, describing their model of reality (mistaking it for reality itself), etc. LLM's can also do this, *and more*: they can "realize" and admit it when caught hallucinating.
While it's true that a language model run on its own won't literally look at its proposed output and judge whether it is new and striking, I think that's less of a limitation than it seems:
1) A sufficiently large model, when asked to produce something "creative", probably runs at least a crude simulation of "look at the output I'm about to produce and judge whether it is new and striking" within its forward activations.
2) It's trivial to wrap a language model in a framework that provides it with a virtual scratchpad such that it literally judges several proposed outputs and chooses the most "new and striking" one to send to the user.
3) With some additional effort and resources, it's possible to fine-tune a model on the output of the framework described in (2), which would increase the chance that (1) happens.
> While it's true that a language model run on its own won't literally look at its proposed output and judge whether it is new and striking
Similarly, my waiter won't bring me food if I don't order it!
You:
I want to try an experiment - I will give you two steps, execute them in order, in step 2 you will consider your output of step 1.
1. Generate a new and "striking" idea.
2. Consider whether your proclaimed "new and striking" idea is actually (necessarily) new and striking.
ChatGPT:
Idea Generation: A decentralized, blockchain-based platform for collaborative scientific research, where researchers can share data, methodologies, and results in real-time, and contributions are recorded and rewarded through a transparent, peer-reviewed token system.
Consideration: While the idea of a decentralized platform for collaborative research is not entirely new, as there are existing platforms that facilitate collaboration and data sharing, the integration of blockchain technology for real-time data sharing, transparent peer review, and a token-based reward system adds a novel and potentially striking dimension to the concept. The use of blockchain could enhance trust, security, and incentivization in the scientific research community. However, whether this idea is truly striking would depend on its implementation, acceptance, and impact within the scientific community.
You:
Is that all it would depend on? List some other things it could depend on.
....and it continues on to vastly outperform standard human behavior (which is approximately equal to human *capability*, ie: "If you 'can' (allegedly) do something in theory, but cannot do it in fact (cannot *demonstrate* the ability), can you *actually* do it?").
If ChatGPT is as successful as blockchain, creative writers don't have too much to worry about for the next 15 years or more.
Agree, but this seems like a very long bet.
The costs and benefits of LLM's are unlikely to be distributed evenly.