Emad tweeted "Goin to train a 3B model on 3T tokens" last month. These 800B checkpoints are just early alpha training checkpoints.
The full training set is 1.5T currently and will likely grow.
Emad tweeted "Goin to train a 3B model on 3T tokens" last month. These 800B checkpoints are just early alpha training checkpoints.
The full training set is 1.5T currently and will likely grow.