It would be good to see some independent verification of this claim. HN has previously [1] fallen for a claim by the same author to have reduced llama.cpp memory usage for a dense model way below the size of the model, which should have failed a basic smell test and indeed was debunked shortly after. Justine Tunney appears to enjoy extreme superstar status here, and it's hard to overstate the degree of social pressure that needed to be overcome at the time for the skeptic position to reach fixation (to begin with, what other LLM developments even hit upvote numbers like the +1300ish there or the +712 here at the time of writing?).
> Justine Tunney appears to enjoy extreme superstar status here
This is true, and for sure pretty much all humans can benefit from increased skepticism (though not cynicism), but that superstar status is achieved from numerous impressive works. Cosmopolitan C and Actually Portable Executable were some of the things in the past that alone were worthy of significant respect, and for many people (like myself) these were our first introduction.
Speaking only for myself, I have a high opinion of Justine on technical merits. I'm sure she makes mistakes like all humans. I can tell she gets excited by discoveries and the chase, and that probably does sometimes cause premature celebration (this is something I struggle with so it's recognizable to me haha), but being wrong sometimes doesn't erase when you're right, and she has been spectacularly right a lot more times than most people I know.
There have been some personality clashes between Justine and others at times, and unfortunately it's situations where only part (sometimes a small part) of it was public, meaning we can only take people's word for what happened. Given my ignorance, I choose to withhold judgment here, but even if I didn't (and assumed she was guilty) it doesn't change the technical merits and it certainly wouldn't dissuade me from seeing what she's working on now.
So when I see stuff from Justine come out like this, it gets my attention. Would it get my attention if the same thing were posted by somebody whose name I don't recognize? Likely not, but I think that is (unfortunately) part of being a human. We aren't capable (yet!) of evaluating everything on technical merit alone because the shear volume of material far exceeds our time. Therefore we use other (less reliable to be true) signalling mechanisms as a way to quickly decide what is worthy of our time investment and what may not be. Reputation/name recognition is a much imperfect, but better than random chance, indicator.
I don't know, my first (and main) impression of them was actually in the context of the llama.cpp mmap story, as I was somewhat involved in the project back then, and there I thought their impact on the project was predominantly negative. While they introduced a mildly beneficial change (mmap-based model loading), the way in which this was done was not healthy for the project - the changes were rammed through with little regard for concerns that existed at the time about backwards compatibility and edge cases that might be broken by the half-baked patch, Justine came across as self-aggrandizing (in the sense of "acting as if they ran the place", presenting their proposals as a plan that others must follow rather than suggestions) and overly eager to claim credit (epitomized by the injection of their own initials into the magic number file format identifier next to those of the project originator's, and the story of the hapless other author of the mmap changeset who was at first given a token acknowledgement but then quickly sidelined). Arguments for the inclusion of the patch seemed to be won by a combination of half- and untruths like those about memory savings and the sudden participation of a large number of previously uninvolved sycophants. It is fortunate that Georgi handled the fallout as well as he did, and that he in fact had amassed the social capital necessary to survive his heavy-handed solution (soft-banning both JT and their most prominent detractor). A less-successful project would probably have found itself captured or torn apart by the drama.
There is nothing wrong with holding people in esteem for their achievements, but in this case the degree of esteem really seems to be excessive. This is not a matter of simply being annoyed that people like "the wrong thing" - the mmap situation was significantly exacerbated by the presence of irrational/excessive supporters of Justine's as well as the irrational/excessive detractors that emerge wherever the former exist.
I would like to know more about the mmap situation, as what I saw on the surface could warrant some concern. Being somewhat involved you would probably know better than I as I was just an observer reading the thread after-the-fact. It seemed like the biggest accusation was the plagiarism (or "collaborating" but mostly taking somebody else's code).
Did anybody besides the two parties see the code develop, or does anybody else have knowledge of this? Or is it just his word vs. hers? Do you have any suggested reading to get more perspective other than just the github thread and HN thread? (really asking. these aren't rhetorical questions)
Reading the thread, I do think there are a lot of opportunities to read in confirmation bias. For example if I start reading that thread with the idea that Justine is coming in to hijack the project and make herself the hero that it needs and deserves, and to get her initials embedded in there as a permanent tribute to her own glory, I can see that. But if I read it as her coming in with cool work that she's excited about, and had to come up with a new format and couldn't think of a name (naming things can be really hard) and just stuck in one of the first things that came to mind (or even used as a placeholder prior to discussion), I can see that as well.
I absolutely don't want the truth covered up, but I also don't want to accept as true things that aren't true, especially where the implications are toward somebody's character. I'm a big "benefit of the doubt" kind of person.
My sense is that the part about credit/collaboration was actually somewhat overblown among the detractors. What roughly happened as far as I can remember is that JT and another person worked on mmap together with about equal contribution, though the other person might have been the one to have initiated the idea (and solicited help to push it through); then at some point JT decided to make a PR to the main repository in their own name, but crediting the other collaborator as a coauthor, which may or may not have been coordinated with the other person. After that, though, in a fairly characteristic fashion, JT started fielding adulatory questions from their fans (on Github, but also on HN, Twitter and possibly other media) about the change, and quickly switched to simply referring to it as their own, with no mention of the other contributor. The other contributor expressed some misgivings about having their contribution erased, which were picked up by a growing set of people who were generally resentful about JT's conduct in the project. As far as I can tell, when confronted about it, JT at no point explicitly denied what the other person did (and I think the commit logs should all still be there in the fork), but at some point the other person just decided to stop pushing the issue due to being uncomfortable with becoming a playing ball in the fandom war between JT fans and antis.
My personal main gripe with JT really was the tone they adopted in the Github discussions, and the effect of the large numbers of drive-by supporters, who were often far less restrained in both unfounded claims about Justine's accomplishments and attacks on any critics. (At this point I'd also like to note that I consider some sibling comments to be uncomfortably hostile in a personal way, like the "hit piece" one.) I think that as a public persona, especially one who actively pursues publicity, you have some responsibility to restrain your followers - Justine, I get the sense, instead uses them as deniable proxies, as also seen with the instances where instead of straight up putting their signature on the "RAM usage reduced to 6GB" claim they instead choose to post a collage of screenshots of supporters making it.
This could all be true, but it's hard to evaluate these claims on their own. Not being involved in any way, all I can do is conclude that there is some friction in that community. It's possible that JT is toxic, it's possible that you are toxic, it's possible that neither of you is generally toxic but something about your personalities causes your interactions to become toxic, it's even possible that neither of you were toxic in any way but your impression of things after the fact is as-if Tunney had been toxic. Sometimes one has to stop and think about these things and figure out how to smooth things over, and sometimes it's not possible to smooth things over.
I didn't have any direct interactions with JT then or now - while it was hard to ignore the discussion as an onlooker, it did not touch upon any parts of the code that I was involved with. This seems to be one of the topics where everyone who is even tangentially involved is under a default suspicion of being biased in one direction or another.
>This is true, and for sure pretty much all humans can benefit from increased skepticism (though not cynicism), but that superstar status is achieved from numerous impressive works.
It is achieved through a never ending parade of self aggrandizement.
What Justine is very good at is presenting trivial concepts from a world which few front end developers understand in a language that most front end developers understand.
I had the misfortune of having to find out about her because of how thoroughly she polluted the google search space for lisp with her implementation of sector lisp. For some reason google decided that sector lisp needed to be in the top 5 results for every query about `minimal lisp with quotation` even when quotation wasn't implemented in her version.
> presenting trivial concepts from a world which few front end developers understand in a language that most front end developers understand
Completely ignoring the JT discussion, the argument that something is trivial in some area does not really hold. 1) Science is mostly "just" connecting the dots, and 2) landmark discoveries tend to look trivial in hindsight almost by definition, because they have to be straightforward enough to be widely adopted.
I am also quite impressed by Tunney’s technical chops—Cosmopolitan C blew my mind— but, as with others, am somewhat put off by the self-aggrandizing self-satisfied tone and I-know-best attitude that are always on display. Maybe it’s a cultural, generational, or age thing? My younger coworkers tended to sound like this, and tended to minimize others’ contributions, which seems to be the case with the mmap() situation.
This comment reads like real scientific skepticism, but from my recollection of events, is more of a hit piece that takes what should be a technical discussion and drags in bunch of personal baggage. In particular:
> HN has previously fallen for a claim by the same author to have reduced llama.cpp memory usage for a dense model way below the size of the model,
is not true at all. Someone else made the claims about 6GB RAM usage for a 30B model, I remember reading it at the time and thinking "Yeah, that doesn't make sense, but the loading time improvement is immense!" And it was - I run all my LLMs locally on CPU because I don't have dedicated hardware, and jart's work has improved usability a lot.
> and it's hard to overstate the degree of social pressure that needed to be overcome at the time for the skeptic position to reach fixation
I was reading the same HN discussions you were at the time, and it was pretty trivial to see that the loading time claim held up, and the RAM claim was dubious and likely simply due to not understanding some effect of the change completely. Heck, jart's own discussion of the topic reflected this at the time.
For the current change, I feel like your comment is even more misplaced. The blog post linked to for this story has a huge amount of detail about performance on specific processors (Skylake, Alderlake, RPi5/4, M2 Ultra, and 7995WX) with specific models. So when you say:
> It would be good to see some independent verification of this claim.
What I hear is "4bpp thinks there's a real risk the numbers in the linked post are fabricated, and jart is just trying to get attention."
And that doesn't seem reasonable at all, given the history of her work and the evidence in front of us.
The loading time improvements largely held up, and on the balance the mmap contribution was ultimately good (though the way it was implemented was really quite problematic, as a matter of process and communication). However, as I point out in https://news.ycombinator.com/item?id=39894542, JT quite unambiguously did try to cash in on the "low memory usage" claim - uncritically reprinting positive claims by others about your own work that otherwise would have been largely invisible should really not be treated differently as making those claims yourself.
I do think that there is a real risk that the numbers are wrong (not necessarily "fabricated", as this implies malfeasance, but possibly based on an erroneous measurement insufficiently questioned due to an excess of trust from themselves and others, as the mmap ones were). This is also in part based on the circumstance that at the time (of the mmap story, and myself being more involved in the project) I was actually involved in trying to optimise the SIMD linear algebra code, and unless llama.cpp has since switched to a significantly less performant implementation the proposition that so much more performance could be squeezed out strikes me as quite surprising. Here, your intuitions may say that Justine Tunney is just so brilliant that they make the seemingly impossible possible; but it was exactly this attitude that at the time made it so hard to evaluate the mmap memory usage claims rationally and turned the discussion around it much more dysfunctional than it had to be.
All the core llama.cpp devs are superstar devs and 10x devs or whatever you want to call a super smart person who is also super productive and very good with applied calculus. Jart is very apparently very smart, but their relationship with this project was not without turbulence and at present they (jart) are not a core dev of llama.cpp. So for a while lots of their (i'd like to write her moves, but not sure if correct) actions seem to be aimed at getting attention and perhaps particularly the attention of the same folk.
On the contrary ggerganov, slaren, JohannesGaessler seem to have never chased this sensationalist superstatus, but actually leave their work to speak for them. You'll barely find comments by these people on HN, while jart figures every so often a way to manifest themselves some way on HN. And this behaviour on jart's part now bears fruits - for example Phoronix' Michael Larabel would praise jart for their work on the llamafile, absolutely obliterating the fact that it is largely based on the wonderful work of ggerganov at al.
When they claimed to drastically improve memory utilization through the use of memory maps, despite not doing so and then starting a huge controversy which derailed the project I would say they were a 0.1x dev not a 10x dev.
>HN has previously [1] fallen for a claim by the same author to have reduced llama.cpp memory usage for a dense model way below the size of the model, which should have failed a basic smell test and indeed was debunked shortly after.
Where did Justine claim this? The link you provided is Justine saying that she doesn't have an explanation for the reduction in RAM and that readers shouldn't treat it as fact yet:
>The loading time performance has been a huge win for usability, and folks have been having the most wonderful reactions after using this change. But we don't have a compelling enough theory yet to explain the RAM usage miracle. So please don't get too excited just yet! Yes things are getting more awesome, but like all things in science a small amount of healthy skepticism is warranted.
Was the link supposed to show the false claim or the debunking of the claim?
Plenty of claims about it, e.g. here as a "fact": https://github.com/ggerganov/llama.cpp/discussions/638#discu.... I don't think occasional expressions of lingering doubt (still couched among positive language like calling it a "miracle") can offset all the self-promotion that clearly seeks to maximise visibility of the implausible claim, even as it is attributed to others, as for example in https://twitter.com/JustineTunney/status/1641881145104297985... . A cereal manufacturer would probably be held responsible for package text like "Fruity Loops cured my cancer! - John, 52, Kalamazoo" too.
Where's the 30B-in-6GB claim? ^FGB in your GH link finds [0] which is neither by jart nor by ggerganov but by another user who promptly gets told to look at [1] where Justine denies that claim.
These all postdate the discussions that I linked (from March 31st). By April 1st JT themselves seems to have stopped making/boosting the claim about low memory usage.
I don't read that as a claim of fact at all. From the link you shared:
>Now, since my change is so new, it's possible my theory is wrong and this is just a bug. I don't actually understand the inner workings of LLaMA 30B well enough to know why it's sparse.
I haven't followed her work closely, but based on the links you shared, she sounds like she's doing the opposite of self-promotion and making outrageous claims. She's sharing the fact that she's observed an improvement while also disclosing her doubts that it could be experimental error. That's how open-source development is supposed to work.
So, currently, I have seen several extreme claims of Justine that turned out to be true (cosmopolitan libc, ape, llamafile all work as advertised), so I have a higher regard for Justine than the average developer.
You've claimed that Justine makes unwarranted claims, but the evidence you've shared doesn't support that accusation, so I have a lower regard for your claims than the average HN user.
> I'm glad you're happy with the fact that LLaMA 30B (a 20gb file) can be evaluated with only 4gb of memory usage!
The line you quoted occurs in a context where it is also implied that the low memory usage is a fact, and there might only be a bug insofar as that the model is being evaluated incorrectly. This is what is entailed by the assertion that it "is" sparse: that is, a big fraction of the parameters are not actually required to perform inference on the model.
I think you are making a lot of soup from very little meat. I read those links the same way mtlynch read them. I think you're looking for a perfection of phrasing that is much more suited to peer-reviewed academic papers than random tweets and GitHub comments taken from the middle of exploring something. Seeing your initial comment and knowing little about the situation, I was entirely prepared to share your skepticism. But at this point I'm much more skeptical of you.
You can simply check the Pull Request on llama.cpp on Github. JohanesGaessler (a core maintainer) has already ran the code and says it's an impressive speed-up. There isn't a thorough review by any of the core maintainers yet, but this is very likely just exactly what justine says it is; various significant and insignificant speedups.
I think it's very helpful for someone to point out that the source has been shown to be unreliable before, and we should wait for more verification from others knowledgable in the space.
Agreed. I think there's a blurry gray line between pointing out a potentially unreliable source and a lazy dismissal, but if there's reasonable doubt I think it's good for HN. If the doubt isn't reasonable, it will be torn apart by other commenters, and then it's an explicit discussion that people can read and decide on
If you give such comments a lot of credence without doing that own verification then you open yourself to what is essentially a social denial of service attack.
It's really popular online. I think that's because many people here read a lot of this content but don't actually have the skill or background to do analysis. So they give us history rather than examination. Which has some value, I suppose.
was also surprised that she continues to mention the mmap thing in a positive light even after the facts about the claim were settled to the contrary, even disregarding the whole attribution fiasco.
Unfortunately, I've pulled and built the PR branches and have only seen about a 5% speed increase on a modern zen4 EPYC system. Hardly front-page worthy news.
Its too bad there doesn't seem to be anyone else in this thread trying to actually replicate the results to evaluate these claims on their merits.
[1] https://news.ycombinator.com/item?id=35393284