Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One reason is that there is no "good" search engine in China. The most popular one, Baidu, is like garbage compared to Google search. The most useful training data in Chinese would likely be from the social media and video sharing platforms, which I guess is much more difficult to crawl and clean up.


A few thousand years of literature ain’t nothing…


Peanuts compared to the discourse available on the internet.

The literature that survived thousands of years are cream of the crop; you won't find lots of random unimportant dialog between people thousands of years ago, but you find that on Reddit.


Given premodern population sizes and literacy rates, historical texts probably don't exist in anything like the quantity that internet posts do. Even if they did, the information may not be relevant to the modern world.


> The most popular one, Baidu, is like garbage compared to Google search

It must be very bad when you see the walking turd that Google search has become over the years…


It is. In Chinese speaking countries where there's google available, no one is using Baidu


There's only ONE* Chinese speaking country, at least if you only count those that have a Chinese speaking majority population, or uses Chinese as the official language.

* for various interpretations of one.


Chinese is one of the offical languages of Singapore.


Do any of those countries have a good relationship with China and/or countries from there?


Singapore has a pretty good relationship with China (with all Chinas, actually). And we have plenty of Chinese speakers, too. I'm not sure how prevalent Baidu is, however.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: