Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I do not have a horse in the race, but it is interesting to see open source comparisons to traditional timeseries strategies: https://github.com/Nixtla/nixtla/tree/main/experiments/amazo...

In general, the M-Competitions (https://forecasters.org/resources/time-series-data/), the olympics of timeseries forecasting, have proven frustrating for ML methods... linear models do shockingly well and the ML models that have won, generally seem to be variants of older tree-based methods (ie. LightGBM is a favorite).

Will be interesting to see whether the Transformer architecture ends up making real progress here.



They are comparing a non-ensembled transformer model with an ensemble of simple linear models. It's not surprising that the ensemble models of linear time series models will do well, since ensembles optimize for the bias-variance trade-off.

Transformer/ML models by themselves have a tendency to overfit past patterns. They pick up more signal in the patterns, but they also pick up spurious patterns. They're low bias but high variance.

It would be more interesting to compare an ensemble of transformer models with an ensemble of linear models to see which is more accurate.

(that said, it's pretty impressive that an ensemble of simple linear models can beat a large scale transformer model -- this tells me the domain being forecast has a high degree of variance, which transformer models by themselves don't do well on.)


fyi I think you have bias and variance the wrong way around. Over-fitting indicates high variance


Thank you for catching that. Corrected.


> ensemble of transformer models

Isn't that just dropout?


No. Why do you think so?


Geoffrey Hinton describes dropout that way. It's like you're training different nets each time dropout changes.


Dropout is different from ensembles. It is a regularization method.

It might look like an ensemble because you’re selecting different subsets but ensembles combine different independent models rather than just subset models.


That said random forests are an internal ensemble, so I guess that could work.

In my mind an ensemble is like a committee. For it to be effective, each member should be independent (able to pick up different signals) and have a greater than random chance of being correct.


I am aware it is not literally an ensemble model, but Geoffrey Hinton says it achieves the same thing conceptually and practically.


Are these models high risk because of their lack of interpratability? Specialized models like temporal fusion transformers attempt to solve this but in practice I'm seeing folks torn apart when defending transformers against model risk committees within organizations that are mature enough to have them.


Interpretability is just one pillar to satisfy in AI governance. You have build submodels to assist with interpreting black box main prediction models.


Is there a way to directly train transformer models to output embeddings that could help tree based models downstream? For tabular data tree based models seems to be the best but I feel like foundational models could help them in some way




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: