Edit: While I wrote this essay with university in mind, much of it also applies to schools, though I know far less about that topic.
The only solution to AI-enabled cheating is exams, and maybe a few other assessment types (e.g. viva voces- oral examination on a written work). This can be shown through the elimination of alternatives.
Various technologies that have been developed to detect AI cheating already don’t work reliably. They will likely get worse in the future, as there is every reason to think the detection versus concealment race is a losing one. Even if, miraculously, such devices do start working and stay working, students can bypass detection through paraphrasing.
Various tells, signs, and strategies that lecturers have developed to detect AI cheating A) will become less reliable over time and B) already work a lot less well for GPT-4, which is miles ahead of GPT-3. The combined effect of both overreliance on tells and on technology will be numerous false positives. These will almost certainly hit disadvantaged students harder.
So many times now I have read someone saying that we need “innovative” assessment strategies in order to deal with the threat of AI-enabled cheating. Exactly what these ‘innovations’ are is always left up in the air. If someone could fill in this promissory note, they would have by now. It’s not a solution or even a strategy, it’s a vague wish for one. Here’s an innovation: exams. If anyone can think of a way to get essays back without the risk of AI cheating, so be it, but I’ll believe it when I hear it.
Even more so for vague suggestions of ‘working with’ AI rather than against it. We’re already at the point where a lot of students would do better by having AI write the whole thing then going with“AI writes, student edits and adds to the output” which many are hoping will work as an assesment model. This approach will only become more flawed over time, as the proportion of students who can do better than AI in any important respect falls and falls.
The suggestion that we ditch formal assessment and grading altogether, while perhaps worthy of consideration in a better society, simply will not work given the economic pressures and incentives present under capitalism.
The few alternatives suggested that would technically work are far too high intensity to be affordable. For example, in theory, a student could work closely with a mentor academic on a project and discuss it with them over multiple meetings and then be graded on their collaboration with the research project. In practice, this would require quadrupling (minimum) the number of academics on a back-of-the-envelope calculation.
Track changes can be faked. It takes a little work, but not a lot. I imagine that soon there will be AIs that can do it automatically for you. This is especially true if you do the bare minimum of extra effort- paraphrasing the AI.
Maybe one-day technological alternatives will be developed- e.g., AI that monitors a student throughout the entire creation process and asks them questions about what they’re doing and their reasoning. Such alternatives don’t exist yet.
The Lasseiz Faire approach- let people do what they want, and police the truly egregious cases you catch- is not only biased against the honest, it’s also biased against students who can’t afford access to fancier language models (e.g. GPT-4 instead of GPT-3.5). Saying “It’s your loss if you cheat” is cold comfort to the student who misses out on a postgraduate scholarship or a job to a cheater.
Adding to all of the above, it’s worth noting that even if ChatGPT did not exist, the integrity of the assessment system would already be at breaking point due to contract cheating. It’s my view that the conditions structurally force exams- the quicker the change happens, the fairer and less destructive the transition will be.
Exams don’t have to be closed books or sans computers, they just have to be done without internet access (1). Exams certainly aren’t a perfect instrument, but they’re far from the pedagogical horror show they’re often made out to be(2).
It’s not true that exams, by their nature, test only shallow understanding. Even multiple-choice questions can be structured in such a way as to require deep analysis and application of knowledge. Exam essays can provide scope for creativity and original thinking.
At the risk of sounding callous, one of the advantages of exams is that they will kick out students who have been getting by on plagiarism and contract cheating. This is one of the reasons why the university establishment has been so keen to move away from exams. Exams get in the way of the symbiotic relationship between students who just want a piece of paper with a degree written on it and administrators who just want their cash.
I’m all for humane, supportive pedagogy. Bluntly though, not everyone has the cognitive firepower to get a degree, even when all reasonable support has been given. Others lack the capacity for sustained effort, or lack both, or lack other requirements still. If we want degrees to keep their meaning, these people need to go. It’s kinder to these people that they go sooner rather than later, and we should regard this as a positive of exams.
None of this rules out, nor should rule out, disability adjustments for exams, including extra time, readers, etc. For a very small minority of students for whom exams- however, adjusted- are inappropriate, alternatives such as oral examination could be considered.
If you disagree, I would challenge you to outline an alternative, concrete plan that doesn’t rely on gestures, wishes, or assumptions about the limitations of ChatGPT that are even now out of date.
Footnotes
(1) In the future when these models are running on ordinary laptops, it will also be necessary to run exams on university- not student computers, but this is a relatively small expense
(2) Anecdotally: I remember much more from my psychology major than I otherwise would have because I knew that I would be examined on it, and I had to know the content as a whole, and not just the subjects I wanted to write an essay on.
I disagree. But my argument is I am not very good at exams unless they are written compositional exams. Choice exams like multiple choice, unless there is such an elementary obvious choice lead to multiple possibilities that might be correct, how do I know the answer I am supposed to answer?
Take for instance, the following question:
When did the United States become a nation?
a) 1774
b) 1776
c) 1781
d) 1783
e) 1789
In 1774, hostilities with England more or less began, in 1776 we declared freedom but were still embroiled in a conflict and certainly not an independent nation, because history is replete with declared independence that fail to successfully establish independence. In 1781, hostilities pretty much ceased, but there was no treaty until 1783. But at the time there was no real nation, the United States, but 13 separate governmental entities allied in an even looser confederation than today's European Union. No in 1789, with the ratification of the constitution by 12 of those governmental entities and the start of the new government did the United States begin as a country. Of course, I would be tested to be incorrect. But of the five dates mentioned as possibilities, 1776 as the beginning of the United States is the least correct. If I could be tested where I could explain my position, perhaps i might not fail---but only in the test presenter was willing to entertain such an idea and not automatically reject any but the supposed truthful answer of 1776.