I do not have a horse in the race, but it is interesting to see open source comp...

wenc · on March 22, 2024

They are comparing a non-ensembled transformer model with an ensemble of simple linear models. It's not surprising that the ensemble models of linear time series models will do well, since ensembles optimize for the bias-variance trade-off.

Transformer/ML models by themselves have a tendency to overfit past patterns. They pick up more signal in the patterns, but they also pick up spurious patterns. They're low bias but high variance.

It would be more interesting to compare an ensemble of transformer models with an ensemble of linear models to see which is more accurate.

(that said, it's pretty impressive that an ensemble of simple linear models can beat a large scale transformer model -- this tells me the domain being forecast has a high degree of variance, which transformer models by themselves don't do well on.)

gradascent · on March 22, 2024

fyi I think you have bias and variance the wrong way around. Over-fitting indicates high variance

wenc · on March 22, 2024

Thank you for catching that. Corrected.

hackerlight · on March 22, 2024

> ensemble of transformer models

Isn't that just dropout?

mikkom · on March 22, 2024

No. Why do you think so?

hackerlight · on March 23, 2024

Geoffrey Hinton describes dropout that way. It's like you're training different nets each time dropout changes.

wenc · on March 23, 2024

Dropout is different from ensembles. It is a regularization method.

It might look like an ensemble because you’re selecting different subsets but ensembles combine different independent models rather than just subset models.

wenc · on March 23, 2024

That said random forests are an internal ensemble, so I guess that could work.

In my mind an ensemble is like a committee. For it to be effective, each member should be independent (able to pick up different signals) and have a greater than random chance of being correct.

hackerlight · on March 24, 2024

I am aware it is not literally an ensemble model, but Geoffrey Hinton says it achieves the same thing conceptually and practically.

one_buggy_boi · on March 22, 2024

Are these models high risk because of their lack of interpratability? Specialized models like temporal fusion transformers attempt to solve this but in practice I'm seeing folks torn apart when defending transformers against model risk committees within organizations that are mature enough to have them.

tomrod · on March 22, 2024

Interpretability is just one pillar to satisfy in AI governance. You have build submodels to assist with interpreting black box main prediction models.

rdedev · on March 22, 2024

Is there a way to directly train transformer models to output embeddings that could help tree based models downstream? For tabular data tree based models seems to be the best but I feel like foundational models could help them in some way