> One thing I'm keen to understand is: how well does attention hold across huge context sizes, with respect to the usual transformer models, and also these proposed RNN models?
We won't know before we've tried it. Reasoning by analogy with humans is not useful. In a year we'll have tried lots of things and can give a much better answer.
We won't know before we've tried it. Reasoning by analogy with humans is not useful. In a year we'll have tried lots of things and can give a much better answer.