I think the skill-level required to start grinding boosted trees is relatively low compared to NNs. There's lots you can do without special hardware. Its more democratic. It trains very quickly compared to NNs. It works for big and small data. The implementations are very fancy at this stage. You can customise loss, splits, the base algorithm. Inference is fast. And so on. Boosted trees have a lot going for them in the black box model space.
additionally trees have fewer, often computably optimal parameters, whereas DNN often require extensive hyperparameter tuning and even neural architecture search(how many rows, what activations, what optimization method...). Furthermore, trees are generally more interpretable. There have been some recent interesting papers relating random forest trees to adaptive smoothers, that tries to understand why they beat dnns on tabular data.
Are you saying that the prevalence of trees among good-performing solutions is not related to superior performance of trees over other architectures, but rather that more people are trying them out and they will show up in the winning solutions more often because of the implementation rate?
I haven't followed prediction contests for a while because, frankly, the field has moved on (more sideways, actually, with LLMs).
When I used to follow, until a few years ago, the winning models were ensembles of ensembles (e.g., RF is an ensemble). The fact that the best single models are ensembles, or evolutions of ensembles, is therefore not surprising.
When dealing with numerical data, squeezing blood from the stones, which is what happens in the latter stages of the prediction competition, is very rarely worth squeezing in the real world.
When the model is not mechanistic but only correlative (almost all models are not purely correlative or mechanistic, anyway), getting to the last decimal place of mean absolute error or a similar metric requires building an increasingly complex structure over which we have little control upon a building that has its foundation of sand. All it takes is a little wind, such as a change in the distribution of data over time-which always happens-and unstable structures are bound to collapse.
To add to that, I'd primarily consider LightGBM if you want to tweak it a lot, and CatBoost if you want great out-of-the-box results. Both are significant improvements over XGBoost, especially in training speed. I would only really consider XGBoost if you don't want software primarily developed by either Microsoft or Yandex.
All three are lightyears ahead of naive random forest implementations, and are in very active development.
Here's a simplified version of the approach (i.e. performing strong feature engineering, then converting the multivariate time series data to a panel/tabular dataset and training a boosting trees model on it), using temporian (a much improved alternative to pandas for working with temporal data) and xgboost: https://temporian.readthedocs.io/en/stable/tutorials/m5_comp...
I used to be an xgboost bro but these days I'm schilling for catboost. Anyways both have lots of examples online. The truth is without a problem interesting to you there's not much reason to learn about them unless you simply find gradient boosting algorithm elegant. Otherwise I would take some intro to machine learning course.