This one's for the error files- but only technically- image generators and plagarism
A few weeks ago I wrote something on image models, plagiarism and artistic culture. See here:
In that article I made the statement:
It’s unsurprising that Midjourney can’t copy very well, even when I try to use it to plagiarize a famous work. Transformer models generally can’t reproduce the images in their training sets because they don’t store artworks.
At the time, this is what the evidence said. Since then though, there has been a paper revealing that image models can reproduce some of their inputs:
But here’s the thing. We’re talking about a handful of images relative to the training dataset, most. As one of the authors of the paper himself notes:
So while I could try to quibble my way out of it (“I said generally”) I’m going to take the L on this one. These models do memorize a (relative handful) of images. It seems to be mostly, but not exclusively, images for which it has a small sample.
However, while this is a fascinating paper, the implications really not as profound as some people who want these machines to turn out to be plagiarism generators are making out. In particular, I see no reason why we should infer anything about the images that aren’t copies from this- at least not without a lot of further argument. It does not make the claim that these models are “collage generators” true.