Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This talks about tensorflow and I’ve been looking at scikit’s random forest regression.

I have about one million rows of tabular data, with 15 features, to make price predictions.

Is there a definitively better choice between the two?



Absolutely, you should look at XGBoost, LightGBM, and CatBoost.

XGBoost is the og and the most feature-rich.

LightGBM is the fastest and what I use for my case (millions of rows of data with over 100 features).

CatBoost could be good depending on the nature of your data, for example if you have a lot of categorical types.

EDIT: Alos, they all support GPU training, but I haven't been able to make that faster than just using more CPU cores.


To add to that, I'd primarily consider LightGBM if you want to tweak it a lot, and CatBoost if you want great out-of-the-box results. Both are significant improvements over XGBoost, especially in training speed. I would only really consider XGBoost if you don't want software primarily developed by either Microsoft or Yandex.

All three are lightyears ahead of naive random forest implementations, and are in very active development.


Super helpful, thank you!


Exactly what's the problem with sci-kit's random forest?


Very, very helpful – thanks Brad!




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: