Looks like you might be replying out of context. The parent comment had asked why their mac doesn't feel thousands of times faster than earlier models because they've misinterpreted the marketing claims.
However the marketing claims did not state an across the board weighted performance increase over M4 and certainly by reading the claims one would not assume one that large. Instead the claims state performance gains in specific benchmarks, which is relevant to common modern workflows such as inference. The closest benchmark stated to general purpose computing is the multicore CPU performance increase, which the marketing puts at 15% over M4.
As for that large leap in GPU-driven AI performance, this is on account of the inclusion of a "Neural Accelerator" in each GPU core, which is an M5 specific addition and is similar to changes introduced in the A19 SoC.
Their "peak GPU compute performance for AI" is quite different from your unqualified "performance". I don't know what figures they're quoting, but something stupid like supporting 4-bit floats while the predecessor only supported down to 16-bit floats could easily deliver "over 4x peak GPU compute performance for AI" (measured in FLOPS) without actually making the hardware significantly faster.
Did they claim 4x peak GPU compute going from the M3 to M4? Or M2 to M3? Can you link to these claims? Are you sure they weren't boasting about other metrics being improved by some multiplier? Not every metric is the same, and different metrics don't necessarily stack with each other.
I wish we could get something other than Geekbench for these things, since Geekbench seems to be trash. For example, it has the Ryzen 7 7700X with a higher multi-core score than the Epyc 9534 even though they're both Zen4 and the latter has 8 times as many cores and is significantly faster on threaded workloads in real life.
That's what the single thread score is supposed to be for. The multi-thread score is supposed to tell you how the thing performs on the many real workloads that are embarrassingly parallel.
Suppose I'm trying to decide whether to buy a 32-core system with a lower base clock or a 24-core system with a higher base clock. What good is it to tell me that both of them are the same speed as the 8-core system because they have the same boost clock and the "multi-core" benchmark doesn't actually use most of the cores?
The only valid benchmark for that is to use the application you intend to use as a benchmark. Even embarassingly parallel problems can have different characteristics depending on their use of memory and caches and the thermal characteristics of the CPU. Something that uses only L1 cache and registers will probably scale almost linearly in the number of cores, except for thermal influences. Something that uses L2, L3 caches or even main memory will be sublinear.
You're essentially just arguing that all general-purpose benchmarks are worthless because your application could be different.
Suppose I run many different kinds of applications and am just looking for an overall score to provide a general idea of how two machines compare with one another. That's supposed to be the purpose of these benchmarks, isn't it? But this one seems to be unusually useless at distinguishing between various machines with more than a small number of cores.
Your analysis is also incorrect for many of these systems. Each core may have its own L2 cache and each core complex may have its own L3, so systems with more core complexes don't inherently have more contention for caches because they also have more caches. Likewise, systems with more cores often also have more memory bandwidth, so the amount of bandwidth per core isn't inherently less than it is in systems with fewer cores, and in some cases it's actually more, e.g. a HEDT processor may have twice as many cores but four times as many memory channels.
General-purpose benchmarks aren't worthless. They can be used to predict, in very broad strokes, what application performance might be. Especially if you don't really know what the applications would be, or if it is too tedious to use real application benchmarks.
But in your example, deciding between 24 cores with somewhat higher frequency or 32 cores with somewhat lower frequency based on some general-purpose benchmark is essentially pointless. The difference will be small enough that only the real application benchmark can tell you what you need to know. A general purpose benchmark will be no better than a coin toss, because the exact workings of the benchmark, the weightings of it's components into a score and the exact hardware you are running on will have interactions that will determine the decision to a far greater amount. You are right that there could be shared or separate caches, shared or separate memory channels. The benchmark might exercise those, or it might not. It might heat certain parts of the die more than others. It might just be the epitome of embarassingly parallel benchmarks, BogoMIPS, which is a loop executing NOPs. The predictive value of the general purpose benchmark is nil in those cases. The variability from the benchmark maker's choices will always necessarily introduce a bias and therefore a measurement uncertainty. And what you are trying to measure is usually smaller than that uncertainty. Therefore: No better than a coin toss.
You're just back to arguing that general purpose benchmarks are worthless again. Yes, they're not as applicable to the performance of a specific application as testing that application in particular, but you don't always have a specific application in mind. Many systems run a wide variety of different applications.
And a benchmark can then provide a reasonable cross-section of different applications. Or it can yield scores that don't reflect real-world performance differences, implying that it's poorly designed.
I attempted to do this and discovered an irregularity.
Many of the systems claiming to have that CPU were actually VMs assigned random numbers of cores less than all of them. Moreover, VMs can list any CPU they want as long as the underlying hardware supports the same set of instructions, so unknown numbers of them could have been running on different physical hardware, including on systems that e.g. use Zen4c instead of Zen4 since they provide the same set of instructions.
If they're just taking all of those submissions and averaging them to get a combined score it's no wonder the results are nonsense. And VMs can claim to be non-server CPUs too:
The multi-core score listed in the main results page for EPYC 9534 is 15433, but if you look at the individual results, the ones that aren't VMs with fewer than all the cores typically get a multi-core score in the 20k-25k range, e.g.:
What does that have to do with the scores being wrong? As mentioned, virtual machines can claim to be consumer CPUs too, while running on hardware with slower cores than the ones in the claimed CPU.
That doesn't make any sense. Many of the applications are identical, e.g. developer workstations and CI servers are both compiling code, video editing workstations and render farms are both processing video. A lot of the hardware is all but indistinguishable; Epyc and Threadripper have similar core counts and even use the same core complexes.
The only real distinction is between high end systems and low end systems, but that's exactly what a benchmark should be able to usefully compare because people want to know what a higher price tag would buy them.
For >99% of people looking to compile code or render video on an M5 Laptop they are interested in the wall-clock time, running bare metal, assuming all IO is to a fast NVMe SSD, and even a large job will only thermally throttle for a bit then recover.
Most people looking to optimize Epyc compile or render performance care about running inside VMs, all IO to SANs, assuming the is enough work you can yield to other jobs to increase throughput, and ideally near thermal equilibrium.
Will the base core count and mix between perf and efficient cores remain the same? That has lead to different scaling factors for the multicore performance than the single core metrics.
Possibly, at least compared to the previous M4 generation. For the lowest tier M models to this point:
M1 (any): 4P + 4E
M2 (any): 4P + 4E
M3 (any): 4P + 4E
M4 (iPad): 3P + 6E
M4 (Mac): 4P + 6E
M5 (iPad): 3P + 6E (claimed)
M5 (Mac): Unknown
It's worth noting there are often higher tier models that still don't earn the "Pro" moniker. E.g. there is a 4P + 8E variant of the iMac which is still marketed as just having a normal M4.
The die shrinks are less than the marketing numbers would make you believe, but the cores are getting significantly more complex. I think E cores had a 50% cache increase this generation, as an example.
The above summary also excludes the GPU, which seems to have gotten the most attention this generation (~+30%, even more in AI workloads).
strawberry -> DeepSeek, GeminiPro and ChatGPT4o all correctly said three
strawberrry -> DeepSeek, GeminiPro and ChatGPT4o all correctly said four
stawberrry -> DeepSeek, GeminiPro all correctly said three
ChatGPT4o even in a new Chat, incorrectly said the word "stawberrry" contains 4 letter "r" characters. Even provided this useful breakdown to let me know :-)
Breakdown:
stawberrry → s, t, a, w, b, e, r, r, r, y → 4 r's
And then asked if I meant "strawberry" instead and said because that one has 2 r's....
The crazy thing about the definition of NP-completeness is that Cook's theorem says that all problems in NP can be reduced in polynomial time to an NP-complete problem. So if a witness to a problem can be verified in polynomial time, it is by definition in NP and can be reduced to an NP-complete problem.
If I can verify a solution to this problem by finding a path in polynomial time, it is by definition in NP. The goal here was to present an example of a problem known to not be in NP.
> The crazy thing about the definition of NP-completeness is that Cook's theorem says that all problems in NP can be reduced in polynomial time to an NP-complete problem.
What were you trying to say here? Cook's theorem says that SAT is NP-complete. "All problems in NP can be reduced in polynomial time to an NP-complete problem" is just a part of the definition of NP-completeness.
Yes but for at least some of us, the confetti was seen on the round where we used our last guess. So we are wondering how the commenter knows there weren't more rounds. Did you get confetti on a round that you know to have guessed correctly?
Edit: I just played again and am confident that my guess before the confetti was correct.