This I feel like is one of the better points in the thread.
The asymmetry that exists in copyright law where large corporations can enforce their copyright to the point of breaking the law themselves (YouTube's content ID is another non-legal, but still very impactful example) is absolute bullshit.
Unfortunately I think that if training ML models on Internet-data is found not to be fair use then things will get harder for individuals training models and corporations will be barely inconvenienced as they can afford to pay for sources, make deals with other large institutions for data, etc.
The asymmetry that exists in copyright law where large corporations can enforce their copyright to the point of breaking the law themselves (YouTube's content ID is another non-legal, but still very impactful example) is absolute bullshit.
Unfortunately I think that if training ML models on Internet-data is found not to be fair use then things will get harder for individuals training models and corporations will be barely inconvenienced as they can afford to pay for sources, make deals with other large institutions for data, etc.