Original analysis of public data is not stealing. If it were stealing to do so, it would gut fair use, and hand corporations a monopoly of a public technology. They already have their own datasets, and the money to buy licenses for more. Regular consumers, who could have had access to a corporate-independent tool for creativity and social mobility, would instead be left worse off with fewer rights than where they started.
The end bit, that AI created works aren’t copyrightable, is already settled. However, any work a human does to tweak or select AI generated content, if it is itself creative, is copyrightable.
AI created works are copyrightable and guidance from the U.S. Copyright Office isn’t law, so it’s also not settled. Guidance reflects only the office’s interpretation based on its experience, it isn’t binding in the courts or other parties. Guidance from the office is not a substitute for legal advice, and it does not create any rights or obligations for anyone.
So you could go take the images out of the comic book and reuse them because they are not copyrighted.
You’re begging the question by assuming such content hasn’t been modified and could be taken in the first place. How would you know the content you’re eyeing is usable without violating any rights or laws?
Copyright law is one big “It depends” making sweeping statements like made and the headline of the article you linked are oversimplifying the issue and presenting a false dichotomy of a much more nuanced issue. The Reuters article I linked presents much less biased coverage that doesn’t gloss over important facts.
There are AI works every day that fit that description. The art in question in the comic book case was not modified and could be taken from the page and used somewhere else with the exception of the words.
You are arguing in bad faith by implying that my intent is to spread doubt through misinformation. Don’t assume things like that. You have no clue of my intentions.
I’m not trying to “spread doubt”. I’m simply giving the information as is. If you want to have a conversation about the facts, let me know. If you are here to argue in bad faith then I can’t help you.
I’m not accusing you of arguing in bad faith or intentionally spreading information, I’m letting you know that you’re repeating the talking points of those who do.
In other cases, however, a work containing AI-generated material will also contain sufficient human authorship to support a copyright claim. For example, a human may select or arrange AI-generated material in a sufficiently creative way that “the resulting work as a whole constitutes an original work of authorship.” Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection. In these cases, copyright will only protect the human-authored aspects of the work, which are “independent of” and do “not affect” the copyright status of the AI-generated material itself.
There are no AI works that fit this description though. When most people think of AI works they’re thinking of the former, not the latter. So saying “Right now, AI-generated works aren’t copyrightable.” without making the distinction is misinformation designed to spread doubt.
Human writing and LLM output can be creative, original, informative, or useful, depending on the context and purpose. It is a tool to be used by humans, we are in control of the input and the output. What we say goes, no one ever has to see LLM output without people making those decisions. Restricting LLMs is restricting the people that use them. Mega-corporations will have their own models, no matter the price. What we say and do here will only affect our ability to catch up and stay competitive.
You also seem to be arguing a slippery slope argument, by implying that if LLMs are allowed to use copyrighted books as data, it will lead to negative consequences for creators and society, without explaining how or why this will happen, or providing any evidence. It’s a one-sided look at the issue that ignores the positive outcomes from LLMs, like increasing accessibility, diversity, and quality of literature and thought. As well as inspiring new forms of expression and creativity.
Finally, you seem to be making a moralistic fallacy. You claim that there is a perfectly reasonable way of doing this ethically, by using content that people have provided. However, you don’t support this claim, or address its challenges. How would you ensure that the content providers are the original authors or have the rights to the content? How would you compensate them for their contribution? Is this a good way to get content that is diverse and representative of different perspectives and cultures? What about bias or manipulation in the data collection and processing?
I don’t think we need any more expansions to copyright, but a better understanding of LLMs’ capabilities and responsibilities. I think we need to be open-minded and critical about the potential and challenges of LLMs, but also be on guard against fallacious arguments or emotional appeals.
You’re making a hasty generalization here, namely by making sweeping claims without evidence or examples. Also, you’re begging the question by assuming that humans are more original than LLMs, again without providing any support or justification.
Take for example this study that found doctors prefered Med-paLM’s output to human doctor’s. If Everything is a remix, there’s no reason LLMs can’t meet the minimum criteria for creativity, especially absent any evidence to the contrary.