If they considered having some ethical responsibility they would at least tame the bidding war that turned a well paid ads for an existing, unrelated business show before the legitimate link, or limit it so that the search result to show the legitimate link on the first page.
For certain popular sites, it doesn't. Those businesses got to pay the shelf tax if they want their published piece to ever be - not just seen, but reasonably - found when looking specifically for it.
Have you run the walk-through to reproduce? They provide a highly detailed step by step document. They welcome raising an issue if reproduction doesn't yield the claimed results within 2%.
It's OK to call out fake claims. But it requires going through the process if such is reasonable, it just seems to take a couple of hours to find out.
The fake claim here is compression. The results in the repo are likely real, but they're done by running the full transformer teacher model every time. This doesn't achieve anything novel.
That's not how the method works... The full transformer is only needed once to extract the activation fields. That step can even be done offline. Then the teacher can be discarded entirely. The compression result refers to the size of the learned field representation and the small student head that operates directly on it. Simple. No fake claim there. Inference with the student does not involve the transformer at all.
If you look at the student-only scripts in the repo, those runs never load the teacher. That's the novel part.
Can you please share the relevant code that has the training of such a tiny student model that can operate independently of the big teacher model after training? The repository has no such code.
That's exactly what I was trying to infer from the abstract which sadly doesn't explicitly calls out memory requirements. I assume it increases inference time by getting rid of transformers. What's the memory requirements then ?
Edit: they claim these somewhere in the doc:
> Memory
Teacher model: multi-GB (entire model must be loaded)
AN1 head: a few MB (only head needed after training)
I find the claims surreal, can't wait for someone to validate this or I will do it myself. It would have been handy to upload such "few MB" weight file distilled off llama 70B so that we can see for ourself the 220x inference and in memory model size compression is true.
The memory story is actually much simpler than it looks.
The teacher still has to be loaded at training time, so the footprint is whatever the original model uses. Again, the compression doesn't shrink the teacher. It produces a small student head. After training, the teacher is no longer needed and the student runs by itself. That's why the inference footprint drops to a few MB.
It doesn't increase inference time at all. It removes transformers entirely from the inference path. The student computes directly on the layer-1 field, which is why it's so small and so fast.
On the request for a distilled “few MB” head for Llama 70B,that part is already reproducible right from the repo. The head is always task specific, not a general LLM, so uploading a single checkpoint wouldn't tell the whole story. The better path is to run the extraction script and train the head for any task you want. The pipeline is fully open, end to end. I'm looking for people to validate it independently.
If you need anything else cleared up, just let me know.
OP provided details that make the analogy feel distant.
That's not just graduates. The main difference with the gen Z if OP is even one, is that they have a much longer future than those who already worked decades. Mature workers would just accept to do the remaining legs even if meaning keeps falling. The young have bigger stakes, projecting the trajectory leads to an absolute no go, for them.
> The young have bigger stakes, projecting the trajectory leads to an absolute no go, for them.
If you told the young graduate me where I would end up in 15 years, I wouldn't have believed it.
The young may have a long trajectory ahead of them, but they are absolutely bad at planning and predicting where they will end up (unless you have rich parents, which means you'll probably end up okay regardless)
> I graduated in July 2024 from Avans with a degree in Computer Science.
But I have to confess, I'm not sure I understand your comment if you wouldn't mind clarifying.
I wouldn't suggest people (like mature workers) just accept the misery and run out the clock. But I do think it is extremely important to be able to find the meaning in your work, rather than hoping there is a magical other job out there that otherwise fulfills you.
Ok, so OP doesn't like working to make their boss rich. "Start your own company," you might say. But after the honeymoon period wanes, you might find that "I don't like working for someone" turns into "I don't like having to find all these customers myself" or "I don't like having to spend all my time doing paperwork or talking to investors or wearing a million hats or..."
My point is that there will always be reasons to be miserable at any job, so you need to be able to find the pieces that are meaningful to you.
To stretch the analogy a bit to relationships... if OP is saying, "I don't like my relationship with my current partner" I'm saying, "Sure, you can find a new partner if that's what you want. And maybe you should. But just know, there is no magic partner out there that fulfills all of your needs. You're going to have a relationship with a real, human person, and your new partner will have things you love about them and things that drive you crazy, just like the last one. You need to know how to build a meaningful relationship and find fulfillment in it, otherwise, there is no magic partner that will fill that hole in you."
From OP:
> I want to work on personal projects that I find important and help out other projects, that's it. If rent wasn't an issue I'd be working full-time on open-source
That's going to have exciting parts and miserable parts just like their current role, so they will be quite disappointed after the honeymoon period wears off if they aren't able to find meaning in the drudgery. If OP is looking at this as their magical next partner, they will certainly be disappointed when they realize that their new partner snores and leaves the toilet seat up and leaves dirty dishes in the sink.
One is bootstrap. Do what you care about, and make a dent. If all you want is to be able to sustain a frugal life then this takes less effort, but not that much less than earning far more.
The other option is to join a (true) non profit. Some of them do seek growth, but some don't.
It gave me a decent introduction to biology, it defined what life is, then quizzed me. The problem is, it says to select the appropriate answer, but selection does not work.
It reminds me of the game developer behind "Another World". He made some good games, and was able to raise money from early game investors. He thought he could make a game maker. He would develop it once, and it would make all sort of games. So he pitched it, and investors were more interested than ever. Obviously he realized that such concept would never work. Today we have over ambiguous ideas, but they ship them anyway.
reply