From what I can tell all the large players in the space are continuing developin...

thatguysaguy · on Feb 23, 2024

Too new is definitely one thing. Someone is going to have to make a gamble to actually paying for a serious pretraining run with this architecture before we know how it really stacks up against transformers.

There are some papers suggesting that transformers are better than SSMs in fundamental ways (e.g. They cannot do arbitrary key-based recall from their context: https://arxiv.org/abs/2402.01032). This means it's not just a no-brainer to switch over.

espadrine · on Feb 23, 2024

Another element is that Mamba required a very custom implementation down to custom fused kernels which I expect would need to be implemented in deepspeed or the equivalent library for a larger training run spanning thousands of GPUs.

cs702 · on Feb 23, 2024

Not necessarily:

https://www.reddit.com/r/MachineLearning/comments/1amb3xu/d_...

gaogao · on Feb 23, 2024

It's a reasonably easy bet that Together is doing or will do a serious pretraining run with Mamba, where if that's a success other players might start considering it more.

whimsicalism · on Feb 23, 2024

> There are some papers suggesting that transformers are better than SSMs in fundamental ways

I mean the vanilla transformers are also shown failing at the tasks they present.

whimsicalism · on Feb 23, 2024

we have no idea what the large players in the space are doing

danielmarkbruce · on Feb 23, 2024

Exactly this. Except, there is zero chance they just looked at mamba and went "meh, too new for us". People are definitely trying stuff. It takes a lot of fiddling around with a brand new model architecture to get something working well. OpenAI aren't going to give a running commentary on the state of all the things they are looking into.