Why isn't there a comparison with the Llama3 8b in the "benchmarks" ?

axoltl · on June 10, 2024

The Llama 3 license says:

"If, on the Meta Llama 3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights."

IANAL but my read of this is that Apple's not allowed to use Llama 3 at all, for any purposes, including comparisons.

anvuong · on June 11, 2024

They can just run the same tests and cite the results from other websites. That has nothing to do with Meta. No companies can force you to not talk about them.

axoltl · on June 11, 2024

The tests they ran were very different from what's usually run, mostly involving perception of usefulness to humans. I don't see what website they would've cited from?

teonimesic2 · on June 10, 2024

I believe it is because llama 3 8B beats it, which would make it look bad. The phi-3-mini version they used is the 4k which is 3.8B, while LLama 3 8B would be more comparable to phi-3 small (7B) which also considerably better than phi-3-mini. Likely both phi-3 small and llama 3 8B had too good results in comparison to Apple's to be added, since they did add other 7B models for comparison, but only when they won.

mixtureoftakes · on June 10, 2024

llama 3 definitely beats it, but 99% of the users wont care which is actually a good thing... apple totally wins the ai market not by being sota but by sheer amount of devices which will be running their models, we're talking billions

Hugsun · on June 11, 2024

How is any of this good? Apple serves its captive users inferior models without giving them a choice. I don't see how that is winning the AI market either.

dwaite · on June 11, 2024

You may be able to make a case that Apple's model has less parameters or performs worse than other models on standardized tests.

Thats far from being "inferior" when you are talking about tuning for specific tasks, let alone when taking into account real-world constraints - like running as a local always-running task on resource-constrained mobile devices.

Running third party models means requiring them to accomplish the same tasks. Since the adapters are LORA-based, they are not adaptable to a different base model. This pushes a lot of specialized requirements onto someone hoping to replace the on-device portion.

This is different from say externally hosted models such as their announced ChatGPT integration. They announced an intention to integrate with other providers, but it is not clear yet how that is intended to work (none of this stuff is released yet even in alpha form).

woadwarrior01 · on June 11, 2024

Because their model won't look good in comparison. Also see this part of the footnote: "The open-source and Apple models are evaluated in bfloat16 precision." The end user's on-device experience will be with a quantized model and not the bfloat16 model.

leodriesch · on June 10, 2024

I think it’s fair to leave it out in the on-device model comparison. 3b is much smaller than 8b, it is obviously not going to be as good as llama 3 if they did not make groundbreaking advancements with the technology.

hmottestad · on June 10, 2024

Maybe it’s too new for them to have had time to include it in their studies?

TheRoque · on June 10, 2024

Phi-3-Mini, which is in the benchmarks, was released after Llama3 8b

hmottestad · on June 10, 2024

Llama 3 8B is really really good. Maybe it makes Apples models look bad? Or it could be a licensing thing where Apple can’t use Llama 3 at all, even just for benchmarking and comparison.

The license for the Llama models was basically designed to stop Apple, Microsoft and Google from using it.