sagrotan,
@sagrotan@lemmy.world avatar

Like the record labels sued every music sharing platform in the early days. Adapt. They’re all afraid of new things but in the end nobody can stop it. Think, learn, work with it, not against it.

diskmaster23,

I think it’s valid. This isn’t about the tech, but the sources of your work.

sagrotan,
@sagrotan@lemmy.world avatar

Of course it’s valid. And the misuse of AI has to be fight. Nevertheless we have to think differently in the face of something we cannot stop in the long run. You cannot create a powerful tool and only misuse it. I miscommunicated here, should’ve explained myself, I got no excuses, maybe one: I sat on the shitter and wanted to make things short.

dep,
@dep@lemmy.world avatar

Feels like a publicity play

MaxPower,
@MaxPower@feddit.de avatar

I like her and I get why creatives are panicking because of all the AI hype.

However:

In evidence for the suit against OpenAI, the plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.

A summary is not a copyright infringement. If there is a case for fair-use it’s a summary.

The comic’s suit questions if AI models can function without training themselves on protected works.

A language model does not need to be trained on the text it is supposed to summarize. She clearly does not know what she is talking about.

IANAL though.

jmcs,

I guess they will get to analyze OpenAI’s dataset during discovery. I bet OpenAI didn’t have authorization to use even 1% of the content they used.

maynarkh,

That’s why they don’t feel they can operate in the EU, as the EU will mandate AI companies to publish what datasets they trained their solutions on.

Jaded,

Things might change but right now, you simply don’t need anyones authorization.

Hopefully it doesn’t change because only a handful of companies have the data or the funds to buy the data, it would kill any kind of open source or low priced endeavour.

Flaky,
@Flaky@iusearchlinux.fyi avatar

FWIW, Common Crawl - a free/open-source dataset of crawled internet pages - was used by OpenAI for GPT-2 and GPT-3 as well as EleutherAI’s GPT-NeoX. Maybe on GPT3.5/ChatGPT as well but they’ve been hush about that.

erogenouswarzone,
@erogenouswarzone@lemmy.ml avatar

SS is such a tool. Does anybody remember the big anti-gay speech that launched her career in The Way of the Gun? She’ll do anything to get ahead.

Here’s the speech: www.youtube.com/watch?v=PAl5xGi7urQ

wick,

You hate her because of a part in a shitty movie?

erogenouswarzone,
@erogenouswarzone@lemmy.ml avatar

Did I say hate? I said she’s a tool.

MargotRobbie,
@MargotRobbie@lemmy.world avatar

She’s going to lose the lawsuit. It’s an open and shut case.

“Authors Guild, Inc. v. Google, Inc.” is the precedent case, in which the US Supreme Court established that transformative digitalization of copyrighted material inside a search engine constitutes as fair use, and text used for training LLMs are even more transformative than book digitalization since it is near impossible to reconstitute the original work barring extreme overtraining.

You will have to understand why styles can’t and should not be able to be copyrighted, because that would honestly be a horrifying prospect for art.

patatahooligan,
@patatahooligan@lemmy.world avatar

“Transformative” in this context does not mean simply not identical to the source material. It has to serve a different purpose and to provide additional value that cannot be derived from the original.

The summary that they talk about in the article is a bad example for a lawsuit because it is indeed transformative. A summary provides a different sort of value than the original work. However if the same LLM writes a book based on the books used as training data, then it is definitely not an open and shut case whether this is transformative.

MargotRobbie,
@MargotRobbie@lemmy.world avatar

But what an LLM does meets your listed definition of transformative as well, it indeed provides additional value that can’t be derive from the original, because everything it outputs is completely original but similar in style to the original that you can’t use to reconstitute the original work, in other words, similar to fan work, which is also why the current ML models, text2text or text2image, are called “transformers”. Again, works similar in style to the original cannot and should not be considered copyright infringement, because that’s a can of worm nobody actually wants to open, and the courts has been very consistent on that.

So, I would find it hard to believe that if there is a Supreme Court ruling which finds digitalizing copyrighted material in a database is fair use and not derivative work, that they wouldn’t consider digitalizing copyrighted material in a database with very lossy compression (that’s a more accurate description of what LLMs are, please give this a read if you have time) fair use as well. Of course, with the current Roberts court, there is always the chance that weird things can happen, but I would be VERY surprised.

There is also the previous ruling that raw transformer output cannot be copyrighted, but that’s beyond the scope of this post for now.

My problem with LLM outputs is mostly that they are just bad writing, and I’ve been pretty critical against “”“Open”""AI elsewhere on Lemmy, but I don’t see Siverman’s case going anywhere.

patatahooligan,
@patatahooligan@lemmy.world avatar

But what an LLM does meets your listed definition of transformative as well

No it doesn’t. Sometimes the output is used in completely different ways but sometimes it is a direct substitute. The most obvious example is when it is writing code that the user intends to incorporate into their work. The output is not transformative by this definition as it serves the same purpose as the original works and adds no new value, except stripping away the copyright of course.

everything it outputs is completely original

[citation needed]

that you can’t use to reconstitute the original work

Who cares? That has never been the basis for copyright infringement. For example, as far as I know I can’t make and sell a doll that looks like Mickey Mouse from Steamboat Willie. It should be considered transformative work. A doll has nothing to do with the cartoon. It provides a completely different sort of value. It is not even close to being a direct copy or able to reconstitute the original. And yet, as far as I know I am not allowed to do it, and even if I am, I won’t risk going to court against Disney to find out. The fear alone has made sure that we mere mortals cannot copy and transform even the smallest parts of copyrighted works owned by big companies.

I would find it hard to believe that if there is a Supreme Court ruling which finds digitalizing copyrighted material in a database is fair use and not derivative work

Which case are you citing? Context matters. LLMs aren’t just a database. They are also a frontend to extract the data from these databases, that is being heavily marketed and sold to people who might otherwise have bought the original works instead.

The lossy compression is also irrelevant, otherwise literally every pirated movie/series release would be legal. How lossy is it even? How would you measure it? I’ve seen github copilot spit out verbatim copies of code. I’m pretty sure that if I ask ChatGPT to recite me a very well known poem it will also be a verbatim copy. So there are at least some works that are included completely losslessly. Which ones? No one knows and that’s a big problem.

MargotRobbie,
@MargotRobbie@lemmy.world avatar

I’m tired of internet arguments. If you are not going to make a good faith attempt to understand anything I said, then I see no point in continuing this discussion further. Good day.

ZIRO, (edited )
@ZIRO@lemmy.world avatar

Let’s remove the context of AI altogether.

Say, for instance, you were to check out and read a book from a free public library. You then go on to use some of the book’s content as the basis of your opinions. More, you also absorb some of the common language structures used in that book and unwittingly use them on your own when you speak or write.

Are you infringing on copyright by adopting the book’s views and using some of the sentence structures its author employed? At what point can we say that an author owns the language in their work? Who owns language, in general?

Assuming that a GPT model cannot regurgitate verbatim the contents of its training dataset, how is copyright applicable to it?

Edit: I also would imagine that if we were discussing an open source LLM instead of GPT-4 or GPT-3.5, sentiment here would be different. And more, I imagine that some of the ire here stems from a misunderstanding of how transformer models are trained and how they function.

patatahooligan,
@patatahooligan@lemmy.world avatar

Let’s remove the context of AI altogether.

Yeah sure if you do that then you can say anything. But the context is crucial. Imagine that you could prove in court that I went down to the public library with a list that read “Books I want to read for the express purpose of mimicking, and that I get nothing else out of”, and on that list was your book. Imagine you had me on tape saying that for me writing is not a creative expression of myself, but rather I am always trying to find the word that the authors I have studied would use. Now that’s getting closer to the context of AI. I don’t know why you think you would need me to sell verbatim copies of your book to have a good case against me. Just a few passages should suffice given my shady and well-documented intentions.

Well that’s basically what LLMs look like to me.

Haha,

Lmao all these lawsuits smell like toilet paper to me; and probably another attack on AI to slow it down

Numuruzero,

Honestly, a lot of them bring up necessary questions. AI being developed so quickly means a lot of questions got pushed off until later.

damnYouSun,

Absolutely, but this one’s especially stupid.

It’s like claiming that I am guilty of copyright violation because I read their book. If I regurgitated word for word their novel, for free, to anyone that asked for it, than yeah that would be copyright violation. However I sincerely down that is what’s actually happening here.

Riptide502,

AI is a duel sided blade. On one hand, you have an incredible piece of technology that can greatly improve the world. On the other, you have technology that can be easily misused to a disastrous degree.

I think most people can agree that an ideal world with AI is one where it is a tool to supplement innovation/research/creative output. Unfortunately, that is not the mindset of venture capitalists and technology enthusiasts. The tools are already extremely powerful, so these parties see them as replacements to actual humans/workers.

The saddest example has to be graphic designers/digital artists. It’s not some job that “anyone can do.” It’s an entire profession that takes years to master and perfect. AI replacement doesn’t just mean taking away their job, it’s rendering years of experience worthless. The frustrating thing is it’s doing all of this with their works, their art. Even with more regulations on the table, companies like adobe and deviant art are still using shady practices to unknowingly con users into building their AI algorithms (quietly instating automatic OPT-IN and making OPT-OUT options difficult). It’s sort of like forcing a man to dig their own grave.

You can’t blame artists for being mad about the whole situation. If you were in their same position, you would be just as angry and upset. The hard truth is that a large portion of the job market could likely be replaced by AI at some point, so it could happen to you.

These tools need to be TOOLS, not replacements. AI has it’s downfalls and expert knowledge should be used as a supplement to both improve these tools and the final product. There was a great video that covered some of those fundamental issues (such as not actually “knowing” or understanding what a certain object/concept is), but I can’t find it right now. I think the best comes when everyone is cooperating.

Steeve,

Even as tools, every time we increase worker productivity without a similar adjustment to wages we transfer more wealth to the top. It’s definitely time to seriously discuss a universal basic income.

Maslo,

I can’t really take seriously any accusations coming from Sarah Silverman after that whole wage gap bs she tried to pull.

Seems like she isn’t afraid to manipulate a trending social outcry to collect a paycheck.

vlad76,
@vlad76@lemmy.sdf.org avatar

I was under impression that there was no real definitive way to tell what ChatGPT or similar AI use for their training. Am I wrong?

NevermindNoMind,

Yes, it’s in the lawsuit and another article I read. Open AI said they used a specific dataset, and the makers of that dataset said they used some online open libraries which have full texts of books. That’s the primary basis of the lawsuit. They also argue that if you ask ChatGPT for a summary of their books, it will spit one out, which they are claiming is misuse of their copywriten work. That claim sounds dicey to me, Wikipedia and all manner of websites summarize books, so I’m not following how ChatGPT doing it is different. But I’m an idiot so who cares what I think.

vlad76,
@vlad76@lemmy.sdf.org avatar

I care. Idiots unite!

hurp_mcderp,

Remember, the human that wrote a summary had to legally obtain a copy of the source material first too. It should be no different when training an AI model. There’s a whole new can of worms here, though, since the summary was written by another person and that person holds the copyright to that summary (unless there is a substantial amount of the original material, of course). But an AI model is not “creating” a new, copyrightable work. It has to be trained on the entire source material and algorithmically creates a summary directly from that. Because there’s nothing ‘new’ being created, I can see why it could be claimed that a summary from an AI model should be considered a derivative work. But honestly, it’s starting to border on the question of whether or not what AI models can do is considered ‘creative thinking’. Shit’s getting wild.

TheSaneWriter,

If the models were trained on pirated material, the companies here have stupidly opened themselves to legal liability and will likely lose money over this, though I think they’re more likely to settle out of court than lose. In terms of AI plagiarism in general, I think that could be alleviated if an AI had a way to cite its sources, i.e. point back to where in its training data it obtained information. If AI cited its sources and did not word for word copy them, then I think it would fall under fair use. If someone then stripped the sources out and paraded the work as their own, then I think that would be plagiarism again, where that user is plagiarizing both the AI and the AI’s sources.

ayaya,

It is impossible for an AI to cite its sources, at least in the current way of doing things. The AI itself doesn’t even know where any particular text comes from. Large language models are essentially really complex word predictors, they look at the previous words and then predict the word that comes next.

When it’s training it’s putting weights on different words and phrases in relation to each other. If one source makes a certain weight go up by 0.0001% and then another does the same, and then a third makes it go down a bit, and so on-- how do you determine which ones affected the outcome? Multiply this over billions if not trillions of words and there’s no realistic way to track where any particular text is coming from unless it happens to quote something exactly.

And if it did happen to quote something exactly, which is basically just random chance, the AI wouldn’t even be aware it was quoting anything. When it’s running it doesn’t have access to the data it was trained on, it only has the weights on its “neurons.” All it knows are that certain words and phrases either do or don’t show up together often.

RoundSparrow,

The comic’s suit questions if AI models can function without training themselves on protected works.

I doubt a human can compose chat responses without having trained at school on previous language. Copyright favors the rich and powerful, established like Silverman.

trachemys,

We are overdue for strengthening fair use.

Rivalarrival,

Indeed.

Possession of a copyrighted work should never be considered infringement. The fact that a book is floating around in a mind must not be considered infringement no matter how it got into that mind, nor whether that mind is biological or artificially constructed.

Until that work comes back out of that mind in substantially identical form as to how it went in, it cannot be considered copyright infringement.

patatahooligan,
@patatahooligan@lemmy.world avatar

Selectively breaking copyright laws specifically to allow AI models also favors the rich, unfortunately. These models will make a very small group of rich people even richer while putting out of work the millions of creators whose works wore stolen to train the models.

TheSaneWriter,

To be fair, in most Capitalist nations, literally any decision made will favor the rich because the system is automatically geared that way. I don’t think the solution is trying to come up with more jobs or prevent new technology from emerging in order to preserve existing jobs, but rather to retool our social structure so that people are able to survive while working less.

patatahooligan,
@patatahooligan@lemmy.world avatar

Oh no, rich assholes who continuously lobby for strict copyright and patent laws in order to suffocate competition might find themselves restricted by it for once. Quick, find me the world’s smallest violin!

No, if you want AI to emerge, argue in favor of relaxing copyright law in all cases, not specifically to allow AI to copyright launder other peoples’ works.

Zetaphor,

Quoting this comment from the HN thread:

On information and belief, the reason ChatGPT can accurately summarize a certain copyrighted book is because that book was copied by OpenAI and ingested by the underlying OpenAI Language Model (either GPT-3.5 or GPT-4) as part of its training data.

While it strikes me as perfectly plausible that the Books2 dataset contains Silverman’s book, this quote from the complaint seems obviously false.

First, even if the model never saw a single word of the book’s text during training, it could still learn to summarize it from reading other summaries which are publicly available. Such as the book’s Wikipedia page.

Second, it’s not even clear to me that a model which only saw the text of a book, but not any descriptions or summaries of it, during training would even be particular good at producing a summary.

We can test this by asking for a summary of a book which is available through Project Gutenberg (which the complaint asserts is Books1 and therefore part of ChatGPT’s training data) but for which there is little discussion online. If the source of the ability to summarize is having the book itself during training, the model should be equally able to summarize the rare book as it is Silverman’s book.

I chose “The Ruby of Kishmoor” at random. It was added to PG in 2003. ChatGPT with GPT-3.5 hallucinates a summary that doesn’t even identify the correct main characters. The GPT-4 model refuses to even try, saying it doesn’t know anything about the story and it isn’t part of its training data.

If ChatGPT’s ability to summarize Silverman’s book comes from the book itself being part of the training data, why can it not do the same for other books?

As the commentor points out, I could recreate this result using a smaller offline model and an excerpt from the Wikipedia page for the book.

patatahooligan,
@patatahooligan@lemmy.world avatar

You are treating publicly available information as free from copyright, which is not the case. Wikipedia content is covered by the Creative Commons Attribution-ShareAlike License 4.0. Images might be covered by different licenses. Online articles about the book are also covered by copyright unless explicitly stated otherwise.

Zetaphor,

My understanding is that the copyright applies to reproductions of the work, which this is not. If I provide a summary of a copyrighted summary of a copyrighted work, am I in violation of either copyright because I created a new derivative summary?

patatahooligan,
@patatahooligan@lemmy.world avatar

Not a lawyer so I can’t be sure. To my understanding a summary of a work is not a violation of copyright because the summary is transformative (serves a completely different purpose to the original work). But you probably can’t copy someone else’s summary, because now you are making a derivative that serves the same purpose as the original.

So here are the issues with LLMs in this regard:

  • LLMs have been shown to produce verbatim or almost-verbatim copies of their training data
  • LLMs can’t figure out where their output came from so they can’t tell their user whether the output closely matches any existing work, and if it does what license it is distributed under
  • You can argue that by its nature, an LLM is only ever producing derivative works of its training data, even if they are not the verbatim or almost-verbatim copies I already mentioned
barsoap,

LLMs have been shown to produce verbatim or almost-verbatim copies of their training data

That’s either overfitting and means the training went wrong, or plain chance. Gazillions of bonkers court cases over “did the artist at some point in their life hear a particular melody” come to mind. Great. Now that’s flanked with allegations of eidetic memory we have reached peak capitalism.

crackgammon,

Don’t all three of those points apply to humans?

Banzai51,
@Banzai51@midwest.social avatar

Aren’t summaries and reviews covered under fair use? Otherwise Newspapers have been violating copyrights for hundreds of years.

barsoap,

Second, it’s not even clear to me that a model which only saw the text of a book, but not any descriptions or summaries of it, during training would even be particular good at producing a summary.

Summarising stuff is literally all ML models do. It’s their bread and butter: See what’s out there and categorise into a (ridiculously) high-dimensional semantic space. Put a bit flippantly: You shouldn’t be surprised if it’s giving you the same synopsis for both Dances with Wolves and Avatar because they are indeed very similar stories, occupying the same approximate position in that space. If you don’t ask for a summary but a full screenplay it’s going to come up with random details to fill in the details it ignored while categorising, again the results will look similar if you squint right because, again, they’re at the core the same story.

It’s not even really necessary for those models to learn the concept of “summary” – only that, in a prompt, it means “write a 200 word output instead of a 20000 word one”. It will produce a longer or shorter description of that position in space, hallucinating more or less details. It’s really no different than police interviewing you as a witness to a car accident and having to pay attention to not prompt you wrong, including assuming that you saw certain things or you, too, will come up with random bullshit (and believe it): It’s all a reconstructive process, generating a concrete thing from an abstract representation. There’s really no art to summary it’s inherent in how semantic abstraction works.

Marxine,
@Marxine@lemmy.ml avatar

VC backed AI makers and billionaire-ran corporations should definitely pay for the data they use to train their models. The common user should definitely check the licences of the data they use as well.

SixTrickyBiscuits,

That is essentially impossible. How are they going to pay each reddit user whose comment the AI analyzed? Or each website it analyzed? We’re talking about terabytes of text data taken from a huge variety of sources.

CannaVet,

Then it should be treated as what it is, an illegal venture based off of theft. I don’t get a legal pass to steal just because the groceries I stole got cooked into a meal and are therefore no longer the groceries I stole.

azuth,

Firstly copyright infringement is not theft. It’s not theft because the grocer still has the groceries. It is a lesser crime which obviously hurts the victim less if at all in some cases.

A summary is also not copyright infringement, it’s fair use. Of course copyright holders would love to copyright strike bad reviews (they already do even though it’s not illegal).

Marxine,
@Marxine@lemmy.ml avatar

Billionaires can spend and burn their whole net worth for all I care. Datasets should be either:

  • Paid for to the provider platform, and each original content creator gets a share (eg. The platform keeps 10% of the sold price for hosting costs, the 90% remaining are distributed to content creators according to size and quality of the data provided)
  • Consciously donated by the content creators (eg: an OPT-IN term in the platform about donating agreed upon data for non-profit research), but the dataset must never be sold for or used for profit. Publicly available research purposes only.
  • Dataset is “rented” by the users and platform in an OPT-IN manner, and they receive royalties/payments for each purchase/usage of the dataset.

The current manner things are done only favours venture capitalists (wage thieves), shareholders (also wage thieves) and billionaire C-suits (wage thieves as well).

RedCanasta,

Copyright laws are a recent phenomenon and should have never been a thing imo. The only reason it’s there is not to “protect creators,” but to make sure upper classes extract as much wealth over the maximum amount of time possible.

Music piracy has showed that it’s got too many holes in it to be effective, and now AI is showing us its redundancy as it uses data to give better results.

it stifles creativity to the point it makes us inhuman. Hell, Chinese writers used to praise others if they used a line or two from other writers.

TheSaneWriter,

I think that copyright laws are fine in a vacuum, but that if nothing else we should review the amount of time before a copyright enters the public domain. Disney lobbied to have it set to something awful like 100 years, and I think it should almost certainly be shorter than that.

Peanutbjelly,

Personally I find this stupid. If we have robots walking around, are they going to be sued every time they see something that’s copywrited?

It’s this what will stop progress that could save us from environmental collapse? That a robot could summarize your shitty comedy?

Copywrite is already a disgusting mess, and still nobody cares about models being created specifically to manipulate people en mass. “What if it learned from MY creations” asks every self obsessed egoist in the world.

Doesn’t matter how many people this tech could save after another decade of development. Somebody think of the [lucky few artists that had the connections and luck to make a lot of money despite living in our soul crushing machine of a world]

All of the children growing up abused and in pain with no escape don’t matter at all. People who are sick or starving or homeless do no matter. Making progress to save the world from immanent environmental disaster doesn’t matter. Let Canada burn more and more every year. As long as copywrite is protected, all is well.

laylawashere44,

This won’t even stop LLMs, ones from countries that don’t respect copyright will simply advance past the ones that are. Like a Chinese tech company could simply ignore all that, and what, is Sarah Silverman going to sue Tencent in China? Good Luck. Tiktok uses copyrighted music constantly without permission, yet YouTube Shorts had to set up a system with the publishers. All that copyright laws in their current form do is hamper basically everyone for the sake of a few large companies.

Asafum,

I feel like when confronted about a “stolen comedy bit” a lot of these people complaining would also argue that “no work is entirely unique, everyone borrows from what already existed before.” But now they’re all coming out of the woodwork for a payday or something… It’s kinda frustrating especially if they kill any private use too…

TheyHaveNoName,

I’m a teacher and the last half of this school year was a comedy of my colleagues trying to “ban” chat GPT. I’m not so much worried about students using chat GPT to do work. A simple two minute conversation with a student who creates an excellent (but suspected) piece of writing will tell you whether they wrote it themselves or not. What worries me is exactly those moments where you’re asking for a summary or a synopsis of something. You really have no idea what data is being used to create that summary.

BedbugCutlefish,
@BedbugCutlefish@lemmy.world avatar

The issue isn’t that people are using others works for ‘derivative’ content.

The issue is that, for a person to ‘derive’ comedy from Sarah Silverman the ‘analogue’ way, you have to get her works legally, be that streaming her comedy specials, or watching movies/shows she’s written for.

With chat GPT and other AI, its been ‘trained’ on her work (and, presumably as many other’s works as possible) once, and now there’s no ‘views’, or even sources given, to those properties.

And like a lot of digital work, its reach and speed is unprecedented. Like, previously, yeah, of course you could still ‘derive’ from people’s works indirectly, like from a friend that watched it and recounted the ‘good bits’, or through general ‘cultural osmosis’. But that was still limited by the speed of humans, and of culture. With AI, it can happen a functionally infinite number of times, nearly instantly.

Is all that to say Silverman is 100% right here? Probably not. But I do think that, the legality of ChatGPT, and other AI that can ‘copy’ artist’s work, is worth questioning. But its a sticky enough issue that I’m genuinely not sure what the best route is. Certainly, I think current AI writing and image generation ought to be ineligible for commercial use until the issue has at least been addressed.

Rivalarrival,

The issue is that, for a person to ‘derive’ comedy from Sarah Silverman the ‘analogue’ way, you have to get her works legally,

That is not actually true.

I would violate copyright by making an unauthorized copy and providing it to you, but you do not violate copyright for simply viewing that unauthorized copy. Sarah can come after me for creating the cop[y|ies], but she can’t come after the people to whom I send them, even if they admit to having willingly viewed a copy they knew to be unauthorized.

Copyright applies to distribution, not consumption.

azuth,

The issue is that, for a person to ‘derive’ comedy from Sarah Silverman the ‘analogue’ way, you have to get her works legally, be that streaming her comedy specials, or watching movies/shows she’s written for.

Damn did they already start implanting DRM bio-chips in people?

And like a lot of digital work, its reach and speed is unprecedented. Like, previously, yeah, of course you could still ‘derive’ from people’s works indirectly, like from a friend that watched it and recounted the ‘good bits’, or through general ‘cultural osmosis’.

Please explain why you cannot download a movie/episode/ebook illegally and then directly derive from it.

BedbugCutlefish,
@BedbugCutlefish@lemmy.world avatar

I mean, you can do that, but that’s a crime.

Which is exactly what Sarah Silverman is claiming ChatGPT is doing.

And, beyond a individual crime of a person reading a pirated book, again, we’re talking about ChatGPT and other AI magnifying reach and speed, beyond what an individual person ever could do even if they did nothing but read pirated material all day, not unlike websites like The Pirate Bay. Y’know, how those website constantly get taken down and have to move around the globe to areas where they’re beyond the reach of the law, due to the crimes they’re doing.

I’m not like, anti-piracy or anything. But also, I don’t think companies should be using pirated software, and my big concern about LLMs aren’t really for private use, but for corporate use.

azuth,

I mean, you can do that, but that’s a crime.

Consuming content illegally is by definition a crime, yes. It also has no effect on your output. A summary or review of that content will not be infringing, it will still be fair use.

A more substantial work inspired by that content could be infringing or not depending on how close it is to the original content but not on the legality of your viewing of that content.

Nor is it relevant. If you have any success with your copy you are going to cause way more damage to the original creator than pirating one copy.

And, beyond a individual crime of a person reading a pirated book, again, we’re talking about ChatGPT and other AI magnifying reach and speed, beyond what an individual person ever could do even if they did nothing but read pirated material all day, not unlike websites like The Pirate Bay. Y’know, how those website constantly get taken down and have to move around the globe to areas where they’re beyond the reach of the law, due to the crimes they’re doing.

I can assure you that The Pirate Bay is quite stable. I would like to point out that none of AI vendors has been actually convicted of copyright infringement yet. That their use is infringing and a crime is your opinion.

It also going to be irrelevant because there are companies that do own massive amounts of copyrighted materials and will be able to train their own AIs, both to sell as a service and to cut down on labor costs of creating new materials. There are also companies that got people to agree to licensing their content for AI training such as Adobe.

So copyright law will not be able to help creators. So there will be a push for more laws and regulators. Depending on what they manage to push through you can forget non major corp backed AI, reduced fair use rights (as in unapproved reviews being de-facto illegal) and perhaps a new push against software that could be used for piracy such as non-regulated video or music players, nevermind encoders etc.

BedbugCutlefish,
@BedbugCutlefish@lemmy.world avatar

Consuming content illegally is by definition a crime, yes. It also has no effect on your output. A summary or review of that content will not be infringing, it will still be fair use.

That their use is infringing and a crime is your opinion.

“My opinion”? have you read the headline? Its not my opinion that matters, its that of the prosecution in this lawsuit. And this lawsuit indeed alleges that copyright infringement has occurred; it’ll be up to the courts to see if the claim holds water.

I’m definitely not sure that GPT4 or other AI models are copyright infringing or otherwise illegal. But, I think that there’s enough that seems questionable that a lawsuit is valid to do some fact-finding, and honestly, I feel like the law is a few years behind on AI anyway.

But it seem plausible that the AI could be found to be ‘illegally distributing works’, or otherwise have broken IP laws at some point during their training or operation. A lot depends on what kind of agreements were signed over the contents of the training packages, something I frankly know nothing about, and would like to see come to light.

azuth,

“My opinion”? have you read the headline? Its not my opinion that matters, its that of the prosecution in this lawsuit. And this lawsuit indeed alleges that copyright infringement has occurred; it’ll be up to the courts to see if the claim holds water.

No, the opinion that matters is the opinion of the judge. Before we have a decision, there is no copyright infringement.

I’m definitely not sure that GPT4 or other AI models are copyright infringing or otherwise illegal. But, I think that there’s enough that seems questionable that a lawsuit is valid to do some fact-finding You sure speak as if you do.

and honestly, I feel like the law is a few years behind on AI anyway.

But it seem plausible that the AI could be found to be ‘illegally distributing works’, or otherwise have broken IP laws at some point during their training or operation. A lot depends on what kind of agreements were signed over the contents of the training packages, something I frankly know nothing about, and would like to see come to light.

I 've said in my previous post that copyright will not solve the problems, what you describe as it being behind AI. Considering how the laws regarding copyright ‘caught up with the times’ in the beginning of the internet… I am not optimistic the changes will be beneficial to society.

Rivalarrival,

Consuming content illegally is by definition a crime, yes.

What law makes it illegal to consume an unauthorized copy of a work?

That’s not a flippant question. I am being absolutely serious. Copyright law prohibits the creation and distribution of unauthorized copies; it does not prohibit the reception, possession, or consumption of those copies. You can only declare content consumption to be “illegal” if there is actually a law against it.

azuth,

What law makes it illegal to consume an unauthorized copy of a work?

That’s not a flippant question. I am being absolutely serious. Copyright law prohibits the creation and distribution of unauthorized copies; it does not prohibit the reception, possession, or consumption of those copies. You can only declare content consumption to be “illegal” if there is actually a law against it.

Which legal system?

Rivalarrival,

She’s an American actor, suing an American company, so I think we should discuss the laws of Botswana, Mozambique, and Narnia. /s

azuth,

The copying part. Yes, you can conceive a theoritical example where you can consume the content without reproducing it but it’s not what happened in this case.

Or any AI case. There are AI trained outside of the US but they all download the data to train on. They delete it after. What makes it not infringing in AI training is fair use exception for research.

Rivalarrival,

The copying part.

The uploader is the only person/entity that qualifies as infringing under copyright law. The downloader does not. The downloader is merely receiving the copy; the uploader is the one who producing the copy.

Fair use exemptions are only necessary for producing a copy without permission. No fair use exemption is necessary for either receiving a copy, or for consuming or otherwise using that copy.

azuth,

The uploader is the only person/entity that qualifies as infringing under copyright law. The downloader does not. The downloader is merely receiving the copy; the uploader is the one who producing the copy.

Where does it say that in US copyright law? Downloading is making a copy.

Rivalarrival,

Title 17 of US Code

I agree that after a download is complete, a copy has come into existence, and it is located on the downloader’s computer. But, the downloader did not have the work prior to downloading. How can he make a copy of something he does not yet possess? What is the “original” from which this copy came to exist? Who had any obligations under copyright law regarding that original?

The answer, of course, is that the “original” was located on the uploader’s computer. He is responsible for the actions of that machine. He controls it. He decides to whom to send it. He decides how many people it will be sent to. He is fully and solely responsible for distributing the work in his possession.

Every prohibited act is performed by the uploader, not the downloader.

No, Silverman’s argument is not that the mere possession of the work by ChatGPT violates copyright, because yhat question has long since been answered: the artist controls the work, not the audience. The artist cannot decide who is and is not allowed to consume the work. Regardless of how someone came to consume the work, they are fully entitled to speak about it.

Instead, her argument is that the summaries produced by ChatGPT violate the copyright of her work. She is trying to argue that these summaries are merely derivative works, rather than “transformative derivations”. She’s trying to argue that you can’t summarize her work; that your summary of her work violates her copyright.

She is wrong.

azuth,

I agree that after a download is complete, a copy has come into existence, and it is located on the downloader’s computer. But, the downloader did not have the work prior to downloading. How can he make a copy of something he does not yet possess? What is the “original” from which this copy came to exist? Who had any obligations under copyright law regarding that original?

Unless you can point where the law says you have to make the copy from a copy you posses it is irrelevant.

But we do actually have precedent where there was creation of copies out of thin air. VHS recordings of broadcast, Sony Corp. of America v. Universal City Studios. It was actually settled on time-shifted of free-aired material being fair use. Nobody argued that the VCR owners having no copy before recording did not make a copy.

No, Silverman’s argument is not that the mere possession of the work by ChatGPT violates copyright, because what question has long since been answered: the artist controls the work, not the audience. The artist cannot decide who is and is not allowed to consume the work. Regardless of how someone came to consume the work, they are fully entitled to speak about it.

I will concede that there are situations where you can just consume copyrighted material without copying them (which downloading is). That would be if you I downloaded a movie and invited you to watch it, or a sports bar showing illegal streams.

My whole point is that it does not matter if you have committed copyright infringement, you can always make fair use derivative works such as reviews. I could get DVDs from a friend in the 00s and copy them to a my own disc before watching the copy. That would mean I infringed even in your wrong understanding of copyright. If it was worth it and there was evidence of it the copyright owner would be able to successfully sue me for copying them.

He could correctly argue that me copying the disc, infringing on his copyright, was necessary for me to write a review of his movie, a derivative work. It would not matter.

I could later make another film that is inspired by the movie whose copyright I infringed upon. If the movie is not too similar it would not be itself infringing. If it too similar it could be infringing but so would a movie made by someone who committed no copyright infringement to be able to watch the original.

This is what the discussions was about. AI opponents push the idea that if there was copyright infringement on the training process, any output of AI must be infringing or derivative of the original work. Which is bullshit.

I suppose you are not pro-copyright, same as me, but you are not helping any argument by making claims that are besides the point and wrong.

Rivalarrival,

But we do actually have precedent where there was creation of copies out of thin air. VHS recordings of broadcast, Sony Corp. of America v. Universal City Studios. It was actually settled on time-shifted of free-aired material being fair use. Nobody argued that the VCR owners having no copy before recording did not make a copy.

You’re conflating the concept of “recording” with the concept of “copying”. They weren’t making a copy. They were making a recording. As your citation demonstrates, these two concepts are not the same thing.

Importantly, there is no difference in the legality of recording when we switch from an authorized broadcaster to a pirate transmitter. It is still perfectly lawful to create a recording of what was sent to you.

Even if you call up the pirate station and ask them to transmit the specific work that you want to receive, the transmitter is still exclusively responsible. Even if you call them up and ask them to retransmit those parts you didn’t receive clearly, the infringement is theirs, not yours. You are free to receive and record whatever someone wants to send you.

Downloading is recording, not copying. You are receiving and saving the work that is being transmitted to you.

azuth,

You’re conflating the concept of “recording” with the concept of “copying”. They weren’t making a copy. They were making a recording. As your citation demonstrates, these two concepts are not the same thing.

Fuck off mate, you are full of shit. The concept of recording is so different to copying according to my citation that the recording are made via ‘copying devices’. It’s also immaterial. RECORDINGS could infringe and thus the court therefore examines if the fair use exception applies.

I will no more argue with you, since you are dishonest.

Rivalarrival,

They are technologically similar concepts, but there are distinct legal differences. When the act is performed by a single legal entity, it is “copying”. Where the act is performed by two separate legal entities, the receiving entity is “recording”. The transmitting entity is “distributing”.

RECORDINGS could infringe

There can certainly be infringement involved in the complete act, but it is committed by the entity distributing the work without permission, not the entity receiving the work.

Recording could be infringing in certain special circumstances, where the rightsholder controls both the transmission of the work as well as the presence of the receiver. It could be infringement to record inside a theater, for example.

But they cannot prohibit me from putting up an antenna and recording what I hear; they cannot prohibit me from attaching a computer to a network and recording what is sent to me over that network.

Rivalarrival,

Please explain why you cannot download a movie/episode/ebook illegally and then directly derive from it.

The law does not prohibit the receiving of an unauthorized copy. The law prohibits the distribution of the unauthorized copy. It is possible to send/transmit/upload a movie/episode/ebook illegally, but the act of receiving/downloading that unauthorized copy is not prohibited and not illegal.

You can’t illegally download a movie/episode/ebook for the same reason that you can’t illegally park your car in your own garage: there is no law making it illegal.

Even if ChatGPT possesses an unauthorized copy of the work, it would only violate copyright law if it created and distributed a new copy of that work. A summary of the work would be considered a “transformative derivation”, and would fall well within the boundaries of fair-use.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • [email protected]
  • uselessserver093
  • Food
  • aaaaaaacccccccce
  • test
  • CafeMeta
  • testmag
  • MUD
  • RhythmGameZone
  • RSS
  • dabs
  • SuperSentai
  • oklahoma
  • Socialism
  • KbinCafe
  • TheResearchGuardian
  • KamenRider
  • feritale
  • All magazines