Sorry but it is actually a huge problem for the US if the DeepSeek models are able to train on sorta-illegal dumps of scientific papers and US models aren't. The ones that are paywalled by scientific journals.
Everyone WILL start using hosted frontier Chinese models if they are demonstrably better at answering scientific questions than ChatGPT, sending essentially all US research questions into a Chinese data dump. This is even worse than the national security catastrophe that is TikTok (even aside from the EVEN BIGGER issue that China will have models that are staggeringly better than those in the US, because they are up to date on the science).
I understand the reflexivity against AI companies "stealing content" but we need to stay competitive and figure out the financial compensation later. This is not a case where our unbelievably generous copyright laws should take precedence over US competitiveness.
Everyone WILL start using hosted frontier Chinese models if they are demonstrably better at answering scientific questions than ChatGPT, sending essentially all US research questions into a Chinese data dump. This is even worse than the national security catastrophe that is TikTok (even aside from the EVEN BIGGER issue that China will have models that are staggeringly better than those in the US, because they are up to date on the science).
I understand the reflexivity against AI companies "stealing content" but we need to stay competitive and figure out the financial compensation later. This is not a case where our unbelievably generous copyright laws should take precedence over US competitiveness.