I'm interested in the copyright-related issues arising from generative AI and I would like to hear some opinions on the following perspective. I'm not an expert on copyright and I am thinking in some kind of (50% moral, 50% legal) sense of copyright here, as copyright and especially fair use is legally quite vaguely defined.
Let's broadly distinguish 3 types of copyright concerns with genAI:
1. training
2. commerical distribution
3. end-user
I think while some disagree, many people could potentially agree that merely training on copyrighted works without commercial or non-commercial distribution of the results is kind of fine. Similarly, most people would agree that the end-user also has some role to play in regards to copyright considerations.
I think the main point that is causing trouble right now is the question of commercial distribution of genAI works, meaning that users give OpenAI (replace with others) some money and a specification, and OpenAI gives them a work according to their specification.
I think this process can be implemented in a way just as copyright-okay as google image search.
If I perform a query on google images, it spits out some samples of the distribution of reasonable images, conditioned on my query. If I do the same with genAI, essentially the same thing happens, except that the distribution is somwhat smoothed out, approximated, latent-space interpolated, however you want to call it. That is, genAI as search is fundamentally more original and less infringing than google images.
Why does google images not cause copyright outrage? Because no claims whatsoever are being made about ownership of rights. If you pay an artist for a digital painting, they are creating an original work and then selling you the rights for it. Google images does not sell you any rights, nor claim to own the rights. It is just a snapshot of a region of the distribution of reasonable images.
Afaik this is where genAI and search so-far differ. I think OpenAI atm does claim to own the rights of generated works and sell them to you, which leads to copyright infringement by OpenAI. However, I think if they changed their general terms and conditions to remove that, then at least the commercial or non-commercial distribution aspect of genAI is copyright-okay.
Note: I think some people would say google images is copyright-okay mainly because it will usually give you sources for the images it shows you. Well, then I think we should consider the following: Paid-per-query image-search with images-only, no links and no text. Is it okay? I personally think that since google gives no guarantees that their links actually lead you to the correct source of the image or its copyright owner, the presence of links should not protect them from copyright considerations. I.e. Then since google images is okay, the above is also okay.
Consent is as big a problem as copyright.
Training data includes intellectual property that isn't legally publicly available online, for which consent was not provided, and that's without getting into the commercial aspect.
If I publish a piece of information on the internet with the intent of it reaching people and helping people, I might consent to that if the goal is to further the access to that information rather than a major corp profiting from my hard work with no share of that to anyone but their shareholders and employees.
A song or a piece of art? No, I personally would not consent to being assimilated, not just because it's exploitive if both the AI service provider and their customer have made money from using my work because they haven't purchased a license, but because they have not asked for my consent, but did it anyway.
Until people understand the incredibly complex topic of consent, every other conversation lacks the necessary context.