This talks about tensorflow and I’ve been looking at scikit’s random forest regr...

bradhilton · on March 6, 2024

Absolutely, you should look at XGBoost, LightGBM, and CatBoost.

XGBoost is the og and the most feature-rich.

LightGBM is the fastest and what I use for my case (millions of rows of data with over 100 features).

CatBoost could be good depending on the nature of your data, for example if you have a lot of categorical types.

EDIT: Alos, they all support GPU training, but I haven't been able to make that faster than just using more CPU cores.

wongarsu · on March 6, 2024

To add to that, I'd primarily consider LightGBM if you want to tweak it a lot, and CatBoost if you want great out-of-the-box results. Both are significant improvements over XGBoost, especially in training speed. I would only really consider XGBoost if you don't want software primarily developed by either Microsoft or Yandex.

All three are lightyears ahead of naive random forest implementations, and are in very active development.

iambateman · on March 6, 2024

Super helpful, thank you!

bdjsiqoocwk · on March 7, 2024

Exactly what's the problem with sci-kit's random forest?

iambateman · on March 6, 2024

Very, very helpful – thanks Brad!