Indian publishers join the long list of copyright holders demanding that OpenAI and other LLM creators, who are profiting off of training their datasets on copyrighted material, stop and delete their training. In a story as old as capitalism, we are at it again. We all know OpenAI and others are stealing IP. At CES 2025, every AI company I talked to said they trained off of “free content found on the Internet”, you know, like anything you can find on the Internet. I mean, Spotify and Apple Music is on the Internet. So is Amazon Prime. Many illegally hosted piracy websites. Illegal troves of literary works. You name it, you can find it on the Internet. We hope all these LLM creators have what’s coming to them — either compensate those whose works you are stealing (if they consent) or F*Off and delete your training sets and products that are based off of them. The only reason this discussion is happening is because the material is digital and not physical. If it were physical theft it would be an open and shut case. That’s all I gotta say about that.
PS Today’s generative AI picture “Indian Publishers” (before I even see the output, I’m guessing it’ll be offensive) brought to you by “books inside of books”, “don’t read the text on covers”, and our favorite “defying the laws of Physics — look at the guy in the back”.