As others point out, it's essentially distillation of a larger model to a smaller one. But you're right, it doesn't work very well. Phi's performance is high on benchmarks but not nearly as good in actual real world usage. It is extremely overfit on a narrow range of topics in a narrow format.