illegally using annas archive, the pile, common crawl, their own crawl, books2, libgen etc. and embed it into high dimensional space and do next token prediction on it.
illegally using annas archive, the pile, common crawl, their own crawl, books2, libgen etc. and embed it into high dimensional space and do next token prediction on it.