(1) Some people speculate that GPT-4 is actually an ensemble of models selected by a supervisory model based on response timings.
(2) GPT4All is using improved training data with LoRA and GPT-J to build something that competes with GPT-4 at a fraction of the size and parameter count.
(3) The emergence of new capabilities strongly corresponds to parameter count. Transformers with 6.7B parameters are apparently where the first such emergence occurs associated with a global structural change in the hidden layers (the so-called “phase change”). There will be a strong incentive to train larger models to identify additional capabilities.
The first and second observations imply that federal regulation of A100-class GPUs would be ineffective. However, the third observation suggests that it would be highly effective at preventing untrusted parties from discovering novel and potentially more disruptive capabilities.
(2) GPT4All is using improved training data with LoRA and GPT-J to build something that competes with GPT-4 at a fraction of the size and parameter count.
(3) The emergence of new capabilities strongly corresponds to parameter count. Transformers with 6.7B parameters are apparently where the first such emergence occurs associated with a global structural change in the hidden layers (the so-called “phase change”). There will be a strong incentive to train larger models to identify additional capabilities.
The first and second observations imply that federal regulation of A100-class GPUs would be ineffective. However, the third observation suggests that it would be highly effective at preventing untrusted parties from discovering novel and potentially more disruptive capabilities.