I believe it is because llama 3 8B beats it, which would make it look bad. The phi-3-mini version they used is the 4k which is 3.8B, while LLama 3 8B would be more comparable to phi-3 small (7B) which also considerably better than phi-3-mini. Likely both phi-3 small and llama 3 8B had too good results in comparison to Apple's to be added, since they did add other 7B models for comparison, but only when they won.
llama 3 definitely beats it, but 99% of the users wont care which is actually a good thing... apple totally wins the ai market not by being sota but by sheer amount of devices which will be running their models, we're talking billions
How is any of this good? Apple serves its captive users inferior models without giving them a choice. I don't see how that is winning the AI market either.
You may be able to make a case that Apple's model has less parameters or performs worse than other models on standardized tests.
Thats far from being "inferior" when you are talking about tuning for specific tasks, let alone when taking into account real-world constraints - like running as a local always-running task on resource-constrained mobile devices.
Running third party models means requiring them to accomplish the same tasks. Since the adapters are LORA-based, they are not adaptable to a different base model. This pushes a lot of specialized requirements onto someone hoping to replace the on-device portion.
This is different from say externally hosted models such as their announced ChatGPT integration. They announced an intention to integrate with other providers, but it is not clear yet how that is intended to work (none of this stuff is released yet even in alpha form).