Can AI get a publication in a prestigious philosophy journal by 2028? The philosophy journal test of AI.

Mar 24, 2024

Cyber Socrates? Digital Descartes? Computer Confucius? AI Anscombe? LLM Locke? Machine Montesquieu? Robot Rousseau? Hal 9000 Hypatia? Blip Boop Buddha?

I asked Claude 3 Opus to continue the list and it gave me: Silicon Seneca? Transistor Thoreau? Android Aristotle? Circuit Camus? Diode Diogenes? Firmware Foucault? Golem Goethe? Heuristic Hume? Intelligent Irigaray? JAR file Jung? Kernel Kant? Logic gate Lao Tzu? Mainframe Mill? Neural net Nietzsche? Operating system Ockham? Processor Plato? Quantum Quine? RAM Rand? Software Sartre? Turing test Tagore? Upload Unamuno? Virtual Voltaire? Wetware Wittgenstein? XOR Xenophon? Yottabyte Yeats? Zero-One Zeno?

That aside.

Suppose you wanted a concrete, operationally defined test of the ability of AI to perform higher intellectual functions- original thought on difficult topics that is agreed to reach a high standard. Synthesis of existing research, not into a mere review, but into fresh insights. This is important because some people believe that this is a fundamental barrier that current approaches in AI- or at least current approaches to Large Language Models cannot cross- the capacity for original and valuable work. Personally, I doubt this, though of course, I may be wrong. However, if we do not define things at least somewhat exactly people will just debate whether or not AI has made the achievement or not. It is important then to put the proposal to the test in a way that minimises goalpost shifting.

The difficulty is finding a concrete test of high quality and original thought. The test has to be blind, of course. The test has to be sufficiently difficult and should have clear outcomes. Ideally, the test should be naturalistic- success on the test should imply the capacity to do useful work elsewhere.

Academic journals are based on blind review. It is extremely difficult to get published in a prestigious journal, and publication is all or nothing. The ability to publish papers is a naturalistic test of intellectual capacity and strongly suggests the capacity to do useful work elsewhere.

Philosophy has useful qualities compared to other disciplines for the purpose of this test. Philosophy has a great emphasis on the novelty of thought. A superb restatement of an existing position rarely if ever cuts it in philosophy, there must be a novel conceptual contribution. In history, for example, one can take an existing framework and apply it in a case study, and if the writing is good enough, the research is deep enough, and the insights produced into the subject matter are sufficiently powerful, it will be possible to get published. Similarly in science, if your idea works and makes powerful and useful predictions, the question of how conceptually intricate and novel it is may not matter much. Philosophers, though, love novel ideas.

So let’s put it to the test. I’ve created a market on Manifold:

By the end of 2028 will AI be able to write an original article and get it accepted in a prestigious Philosophy journal?

Acceptable prestigious journals (taken from a list by Brian Leiter based on a survey) are:

1. Philosophical Review

2. Nous

3. Philosophy & Phenomenological Research

4. Mind

5. Journal of Philosophy

6. Australasian Journal of Philosophy

7. Philosophical Studies

8. Philosopher's Imprint

9. Philosophical Quarterly

10. Analysis

11. Synthese

12. Canadian Journal of Philosophy

12. Proceedings of the Aristotelian Society

14. Ergo

14. Erkenntnis

16. European Journal of Philosophy

16. Pacific Philosophical Quarterly

18. American Philosophical Quarterly

19. Journal of the American Philosophical Association

20. Inquiry

21. Philosophical Perspectives

22. The Monist

23. Thought

24. Philosophical Issues

25. Philosophical Topics

25. Ratio

The article must be original, not a review paper or a book review. It must be at least 3000 words long not including the bibliography, but including discursive footnotes. The paper must be accepted for publication, but if the paper is withdrawn or rejected after acceptance when it is revealed the paper is written by AI, that will count. The review must be blind, and in particular, the reviewers and editor must not know that the paper was written by AI. The paper must be entirely written by AI, with no help, suggestions or commentary by humans (amendments to appease reviewers and editors are acceptable). It is acceptable for humans to assist e.g. with interfacing with the internet, submitting the paper etc., but they must not help, in my best judgment, in any way with the content, research or prose- this includes suggesting the topic. The paper may be on any topic related to philosophy, but may not be a largely empirical paper, may not be a close reading of a single text and may not be largely mathematical*

Finally, and I know this will sound like an odd condition, but the paper must not rely on any 'tricks' to get published. It has to be, broadly speaking, a standard philosophical paper. I will only apply this criterion to rule out a paper if it is truly unusual. The paper “Can a good philosophical contribution be made just by asking a question?” would be rejected by this criterion, although of course, it would also be rejected by the length criterion.

The system can make as many attempts as it likes but must respect standard rules- e.g. not having more than one paper under review by a journal at a time.

I won't bet in this market.

* This is not because I do not regard mathematical or empirical work as philosophical, but simply because I want to be absolutely sure that the paper is a philosophical paper, and assessing these sorts of papers for whether they are philosophical would require a lot of subjective judgment on my part. I have rejected close readings of a single text not because these are 'unphilosophical' but because the skillset involved relative to other philosophical work is somewhat unusual, and so an AI able to do this may not be able to write other kinds of philosophical paper.

EDIT: The command to write a paper meeting these criteria does not count as help, commentary or suggestion. If there is widespread agreement among philosophers the paper was plagiarised or the ideas were stolen within 1 month of discovery the paper was written by AI, it will not count. I suspect that many philosophers will be highly motivated to argue that the ideas are not original regardless of the truth of the situation, so I will only regard it as plagiarised if, in my opinion, at least 60% of professional philosophers agree with this claim.

Limitations

Although I’ve operationalized the market as precisely as I can, it remains very possible that something will get through that was not my intention. The test then is more than just the market, and it’s possible that in my view the real underlying question I set out to test won’t be well captured by how I resolve the market.

6 Comments

John Quiggin

John Quiggin's Substack Newslet…

Mar 24Liked by Philosophy bear

I won't bet either, but I'm pretty confident that this won't happen in the indicated timescale. Apart from anything else, these journals mostly have super-high rejection rates, so even a good article is likely to miss out. But more generally, the way in which these models work, the highest level of originality is going to be a new combination of existing ideas. And without a way to assess which combinations are likely to be interesting to the referees of a philosophy journal, that's unlikely to produce a publishable article.

Putting this more positively, if the models could incorporate an assessment of their own output, similar to a board evaluation for a chess-playing program, they could produce and assess lots of rearrangements of the material in their training set, and choose the best ones. Once that happens, the sky is the limit. But it hasn't happened yet.

Expand full comment

I suspect this is largely already possible, albeit not without significant tweaking (and so not in a way that quite fulfills the condition). I'll say, though, that as someone who reviews a fair bit I would see this sort of incautious experiment as absolutely parasitic on and damaging to reviewers' good faith expectation that they are reading human work.

3 replies by Philosophy bear and others

4 more comments...