> Google codebase is arguably one of the most curated, and iterated on datasets in existence
I spent 12 years of my career in the Google codebase.
This assertion is technically correct in that google3 has been around for 20 years, and all code gets reviewed, but the implication that Google's codebase is a high-quality training set is not consistent with my experience.
I spent 12 years of my career in the Google codebase.
This assertion is technically correct in that google3 has been around for 20 years, and all code gets reviewed, but the implication that Google's codebase is a high-quality training set is not consistent with my experience.