Hacker Newsnew | past | comments | ask | show | jobs | submit | Choco31415's commentslogin

We can wait for that to start appearing in tests or benchmarks first.


I’m currently in Louisville and the smoke trail was visible for miles.

Here’s hoping that anyone injured makes a full recovery.


Not quite. The announcement mentions that:

“M5 delivers over 4x the peak GPU compute performance for AI”

In this situation, at least, it’s just referring to AI compute power.


Looks like you might be replying out of context. The parent comment had asked why their mac doesn't feel thousands of times faster than earlier models because they've misinterpreted the marketing claims.

However the marketing claims did not state an across the board weighted performance increase over M4 and certainly by reading the claims one would not assume one that large. Instead the claims state performance gains in specific benchmarks, which is relevant to common modern workflows such as inference. The closest benchmark stated to general purpose computing is the multicore CPU performance increase, which the marketing puts at 15% over M4.

As for that large leap in GPU-driven AI performance, this is on account of the inclusion of a "Neural Accelerator" in each GPU core, which is an M5 specific addition and is similar to changes introduced in the A19 SoC.


Their "peak GPU compute performance for AI" is quite different from your unqualified "performance". I don't know what figures they're quoting, but something stupid like supporting 4-bit floats while the predecessor only supported down to 16-bit floats could easily deliver "over 4x peak GPU compute performance for AI" (measured in FLOPS) without actually making the hardware significantly faster.

Did they claim 4x peak GPU compute going from the M3 to M4? Or M2 to M3? Can you link to these claims? Are you sure they weren't boasting about other metrics being improved by some multiplier? Not every metric is the same, and different metrics don't necessarily stack with each other.


Much of this is probably down to optimized transformer kernels.


Here’s the multi core Geekbench progression:

M1: 8350

M2: 9700

M3: 11650

M4: 14600

M5: 16650 (estimated)

This is assuming an 8% uplift as mentioned. Also nice.


I wish we could get something other than Geekbench for these things, since Geekbench seems to be trash. For example, it has the Ryzen 7 7700X with a higher multi-core score than the Epyc 9534 even though they're both Zen4 and the latter has 8 times as many cores and is significantly faster on threaded workloads in real life.


There's real value in having a multi-threaded benchmark that doesn't ignore Amdahl's Law and pretend that everything is embarrassingly parallel.


That's what the single thread score is supposed to be for. The multi-thread score is supposed to tell you how the thing performs on the many real workloads that are embarrassingly parallel.

Suppose I'm trying to decide whether to buy a 32-core system with a lower base clock or a 24-core system with a higher base clock. What good is it to tell me that both of them are the same speed as the 8-core system because they have the same boost clock and the "multi-core" benchmark doesn't actually use most of the cores?


The only valid benchmark for that is to use the application you intend to use as a benchmark. Even embarassingly parallel problems can have different characteristics depending on their use of memory and caches and the thermal characteristics of the CPU. Something that uses only L1 cache and registers will probably scale almost linearly in the number of cores, except for thermal influences. Something that uses L2, L3 caches or even main memory will be sublinear.


You're essentially just arguing that all general-purpose benchmarks are worthless because your application could be different.

Suppose I run many different kinds of applications and am just looking for an overall score to provide a general idea of how two machines compare with one another. That's supposed to be the purpose of these benchmarks, isn't it? But this one seems to be unusually useless at distinguishing between various machines with more than a small number of cores.

Your analysis is also incorrect for many of these systems. Each core may have its own L2 cache and each core complex may have its own L3, so systems with more core complexes don't inherently have more contention for caches because they also have more caches. Likewise, systems with more cores often also have more memory bandwidth, so the amount of bandwidth per core isn't inherently less than it is in systems with fewer cores, and in some cases it's actually more, e.g. a HEDT processor may have twice as many cores but four times as many memory channels.


General-purpose benchmarks aren't worthless. They can be used to predict, in very broad strokes, what application performance might be. Especially if you don't really know what the applications would be, or if it is too tedious to use real application benchmarks.

But in your example, deciding between 24 cores with somewhat higher frequency or 32 cores with somewhat lower frequency based on some general-purpose benchmark is essentially pointless. The difference will be small enough that only the real application benchmark can tell you what you need to know. A general purpose benchmark will be no better than a coin toss, because the exact workings of the benchmark, the weightings of it's components into a score and the exact hardware you are running on will have interactions that will determine the decision to a far greater amount. You are right that there could be shared or separate caches, shared or separate memory channels. The benchmark might exercise those, or it might not. It might heat certain parts of the die more than others. It might just be the epitome of embarassingly parallel benchmarks, BogoMIPS, which is a loop executing NOPs. The predictive value of the general purpose benchmark is nil in those cases. The variability from the benchmark maker's choices will always necessarily introduce a bias and therefore a measurement uncertainty. And what you are trying to measure is usually smaller than that uncertainty. Therefore: No better than a coin toss.


You're just back to arguing that general purpose benchmarks are worthless again. Yes, they're not as applicable to the performance of a specific application as testing that application in particular, but you don't always have a specific application in mind. Many systems run a wide variety of different applications.

And a benchmark can then provide a reasonable cross-section of different applications. Or it can yield scores that don't reflect real-world performance differences, implying that it's poorly designed.


The trick with GeekBench is to scroll down and look at the specific sub-benchmarks that are most relevant to you.


I attempted to do this and discovered an irregularity.

Many of the systems claiming to have that CPU were actually VMs assigned random numbers of cores less than all of them. Moreover, VMs can list any CPU they want as long as the underlying hardware supports the same set of instructions, so unknown numbers of them could have been running on different physical hardware, including on systems that e.g. use Zen4c instead of Zen4 since they provide the same set of instructions.

If they're just taking all of those submissions and averaging them to get a combined score it's no wonder the results are nonsense. And VMs can claim to be non-server CPUs too:

https://browser.geekbench.com/v6/cpu/search?utf8=%E2%9C%93&q...

Are they actually averaging these into the results they show everyone?


Relax. No one worth their salt uses the average. They tend to pick the highest or near highest to compare.


The multi-core score listed in the main results page for EPYC 9534 is 15433, but if you look at the individual results, the ones that aren't VMs with fewer than all the cores typically get a multi-core score in the 20k-25k range, e.g.:

https://browser.geekbench.com/v6/cpu/6807094

https://browser.geekbench.com/v6/cpu/9507365

The ones on actual hardware with lower scores typically have comments like "Core Performance Boost Off":

https://browser.geekbench.com/v6/cpu/1809232

And that's still a higher score than the one listed on the main page.


No one is using Geekbench 6 for Epyc evaluation. It’s a consumer benchmark.


What does that have to do with the scores being wrong? As mentioned, virtual machines can claim to be consumer CPUs too, while running on hardware with slower cores than the ones in the claimed CPU.


Yeah, a simple SPECint or builtin Python benchmarks would be way more interesting that a proprietary "benchmark" with mystery tasks.


Any benchmark useful to cross compare single user desktop/laptop experience is going to be useless in the datacentre; and vice versa.


That doesn't make any sense. Many of the applications are identical, e.g. developer workstations and CI servers are both compiling code, video editing workstations and render farms are both processing video. A lot of the hardware is all but indistinguishable; Epyc and Threadripper have similar core counts and even use the same core complexes.

The only real distinction is between high end systems and low end systems, but that's exactly what a benchmark should be able to usefully compare because people want to know what a higher price tag would buy them.


For >99% of people looking to compile code or render video on an M5 Laptop they are interested in the wall-clock time, running bare metal, assuming all IO is to a fast NVMe SSD, and even a large job will only thermally throttle for a bit then recover.

Most people looking to optimize Epyc compile or render performance care about running inside VMs, all IO to SANs, assuming the is enough work you can yield to other jobs to increase throughput, and ideally near thermal equilibrium.


Just use xmrig. Smashes all cores.


For reference I have a M4 Pro mac mini, top spec model with 14 cores and score:

  single: 3960
  multi: 22521


I think he is showing the base cpu comparison for the MacMini/MacBooks. There are so many M-series multicore variants it is hard to mention them all.


Will the base core count and mix between perf and efficient cores remain the same? That has lead to different scaling factors for the multicore performance than the single core metrics.


Possibly, at least compared to the previous M4 generation. For the lowest tier M models to this point:

  M1 (any):  4P + 4E
  M2 (any):  4P + 4E
  M3 (any):  4P + 4E
  M4 (iPad): 3P + 6E
  M4 (Mac):  4P + 6E
  M5 (iPad): 3P + 6E (claimed)
  M5 (Mac):  Unknown
It's worth noting there are often higher tier models that still don't earn the "Pro" moniker. E.g. there is a 4P + 8E variant of the iMac which is still marketed as just having a normal M4.


Are these cores getting way more complex? Because there should be room for 2x - 3x as many cores with die shrinks at this point.


The die shrinks are less than the marketing numbers would make you believe, but the cores are getting significantly more complex. I think E cores had a 50% cache increase this generation, as an example.

The above summary also excludes the GPU, which seems to have gotten the most attention this generation (~+30%, even more in AI workloads).


If you get more space, you need a really good reason not to use that space on more cache.

Also, the size numbers are lies and aren't the actual size of anything.


Can confirm. I’m currently on Mint and have used Tello before. They both offer great service.


Just tried that canard on GPT-4o and it failed:

"The word "strawberry" contains 2 letter r’s."


I tried

strawberry -> DeepSeek, GeminiPro and ChatGPT4o all correctly said three

strawberrry -> DeepSeek, GeminiPro and ChatGPT4o all correctly said four

stawberrry -> DeepSeek, GeminiPro all correctly said three

ChatGPT4o even in a new Chat, incorrectly said the word "stawberrry" contains 4 letter "r" characters. Even provided this useful breakdown to let me know :-)

Breakdown: stawberrry → s, t, a, w, b, e, r, r, r, y → 4 r's

And then asked if I meant "strawberry" instead and said because that one has 2 r's....


A sequence is easy to verify. Choosing the sequence not so much.

Roughly put that is the certificate definition of being in NP.


The goal here was to show that it was strictly NP-hard, i.e. harder than any problem in NP.


Harder to solve, not necessarily harder to verify?

If I am understanding things right.


The crazy thing about the definition of NP-completeness is that Cook's theorem says that all problems in NP can be reduced in polynomial time to an NP-complete problem. So if a witness to a problem can be verified in polynomial time, it is by definition in NP and can be reduced to an NP-complete problem.

If I can verify a solution to this problem by finding a path in polynomial time, it is by definition in NP. The goal here was to present an example of a problem known to not be in NP.


> The crazy thing about the definition of NP-completeness is that Cook's theorem says that all problems in NP can be reduced in polynomial time to an NP-complete problem.

What were you trying to say here? Cook's theorem says that SAT is NP-complete. "All problems in NP can be reduced in polynomial time to an NP-complete problem" is just a part of the definition of NP-completeness.


The game beeps if you get a guess incorrect.


Yes but for at least some of us, the confetti was seen on the round where we used our last guess. So we are wondering how the commenter knows there weren't more rounds. Did you get confetti on a round that you know to have guessed correctly?

Edit: I just played again and am confident that my guess before the confetti was correct.


I definitely guessed the last one correctly, because it was obvious to me, as were nearly all of them.

I'd have to go back through it and deliberately fail on the last one to confirm, but I assumed I'd get the error beep on the past one if it was wrong.


BioCLIP won the CVPR best student paper award: https://cvpr.thecvf.com/Conferences/2024/News/Awards

Having talked with Sam, they're working on BioCLIPv2. So expect even better results sometime soon.


For me that wasn't the case. I took antibiotics as directed by my dentist and it caused years worth of issues.


Dairy intolerance?


To clarify, I was on the medication for six months.

It caused a gut condition that involved nausea, insomnia, and made it difficult to concentrate.

It was also causing me psychologically to feel as if the rug had been pulled out from under me - I didn't know what I was doing anymore.


How did you manage to go back to normalcy?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: