This one's for the error files- but only technically- image generators and plagarism

Feb 03, 2023

A few weeks ago I wrote something on image models, plagiarism and artistic culture. See here:

The tragedy of the artistic intellect and artificial intellect

Is AI art just stolen from human art? Let me put it this way. Imagine you had a fellow who grew up in a black room, the only thing that they ever saw was images on a screen. They say saw millions of images, sometimes with captions explaining what they were. Then, one day they were taught to draw, given captions without images, and asked to create images…

3 years ago · 12 likes · Philosophy bear

In that article I made the statement:

It’s unsurprising that Midjourney can’t copy very well, even when I try to use it to plagiarize a famous work. Transformer models generally can’t reproduce the images in their training sets because they don’t store artworks.

At the time, this is what the evidence said. Since then though, there has been a paper revealing that image models can reproduce some of their inputs:

Eric Wallace @Eric_Wallace_

Models such as Stable Diffusion are trained on copyrighted, trademarked, private, and sensitive images. Yet, our new paper shows that diffusion models memorize images from their training data and emit them at generation time. Paper: arxiv.org/abs/2301.13188 👇[1/9]

But here’s the thing. We’re talking about a handful of images relative to the training dataset, most. As one of the authors of the paper himself notes:

Eric Wallace @Eric_Wallace_

@Tom14985282 Note that it is impossible by definition for large-scale models to memorize lots of data because the size of their training sets are 1000x - 1,000,000x larger than the model in terms of storage.

So while I could try to quibble my way out of it (“I said generally”) I’m going to take the L on this one. These models do memorize a (relative handful) of images. It seems to be mostly, but not exclusively, images for which it has a small sample.

However, while this is a fascinating paper, the implications really not as profound as some people who want these machines to turn out to be plagiarism generators are making out. In particular, I see no reason why we should infer anything about the images that aren’t copies from this- at least not without a lot of further argument. It does not make the claim that these models are “collage generators” true.

Philosophy bear

This one's for the error files- but only technically- image generators and plagarism

Discussion about this post