> researchers, not ML engineers in a FAANG Why did you point out this distinctio...

nestorD · on Feb 16, 2024

It means they have significantly less means (to get a lot of GPUs letting them scale up in context length) and are likely less well-versed in optimization (which also helps with scaling up)[0].

I believe those two things together are likely enough to explain the difference between a 1M context length and a 10M context length.

[0]: Which is not looking down on that particular research team, the vast majority of people have less means and optimization know-how than Google.

vineyardmike · on Feb 16, 2024

Probably to indicate that its research and not productized?