Hacker Newsnew | past | comments | ask | show | jobs | submit | gnulinux's commentslogin

There is no moving goal post. The license isn't an open source license, which by definition means the code is not open source. When you have access to source code of a program, but don't necessarily have the legal rights to distribute original and/or modified programs, it's called "source available".

Well, this is a 270M model which is like 1/3 of 1B parameters. In the grand scheme of things, it's basically a few matrix multiplications, barely anything more than that. I don't think it's meant to have a lot of knowledge, grammar, or even coherence. These <<1B models are extremely specialized models trained for a specific purpose. Models like this are optimized for things like this (not limited):

input: ``` Customer Review says: ai bought your prod-duct and I wanna return becaus it no good.

Prompt: Create a JSON object that extracts information about this customer review based on the schema given. ```

output: ``` { "type": "review", "class": "complaint", "sentiment": -0.853, "request": "return" } ```

So essentially just "making sense of" natural language such that it can be used in programmatic context. (among other applications of course)

To get good results, you probably need to fine tune this model to expected data very aggressively.

The idea is, if a 270MB model can do with fine tuning, why ship a 32GB generalist model?


> this is a 270M model which is like 1/3 of 1B parameters

Did you ask Gemma-3-270M whether 27 is closer to a quarter or a third of 100?


Sure, quarter of a 1B, the point was a generalization about <<1B models.

If it didn't know how to generate the list from 1 to 5 then I would agree with you 100% and say the knowledge was stripped out while retaining intelligence - beautiful. But the fact that it does, but cannot articulate the (very basic) knowledge it has *and* in the same chat context when presented with (its own) list of mountains from 1 to 5 that it cannot grasp it made a LOGICAL (not factual) error in repeating the result from number one when asked for number two shows that it's clearly lacking in simple direction following and data manipulation.

> the knowledge was stripped out while retaining intelligence ... it cannot grasp it made a LOGICAL (not factual) error...

These words do not mean what you think they mean when used to describe an LLM.


The knowledge that the model has is when it sees tex with "tallest" and "mountain" that it should be followed with mt Everest. Unless it also has "list", in which case, it makes a list.

Have you used an LLM? I mean the actual large models? Because they do the exact same errors, just on a slightly less frequent/better hidden manner.

Yes, and obviously this is a question of metrics/spectrum. But this is pretty bad, even compared to several generations old tech (at admittedly much larger size).

Why would there be logic involved? This is a LLM, not electronic intelligence.

Because there is a simultaneous need out of the box generalized models. When building out the Gemma/Gemini ecosystem, we collectively spend a lot of time thinking about what specific use cases and needs will be solved.

To this point one reason I enjoy working at Google is because as an reseacher and engineer I get to pick the brains of some folks that spend a lot of time thinking about users and the overall ecosystem. Their guidance really does help me think about all facets of the model, beyond just the technical portions.


Rerankers are used downstream from an embedding model. Embedding models are "coarse" so they give false positives for things that may not be as relevant as contender text. Re-ranker, ranks bunch of text based on a query in order to find the most relevant ones. You can then take them and feed them as context to some other query.

This is true, but I still avoid using examples. Any example biases the output to an unacceptable degree even in best LLMS like Gemini Pro 2.5 or Claude Opus. If I write "try to do X, for example you can do A, B, or C" LLM will do A, B, or C great majority of the time (let's say 75% of the time). This severely reduces the creativity of the LLM. For programming, this is a big problem because if you write "use Python's native types like dict, list, or tuple etc" there will be an unreasonable bias towards these three types as opposed to e.g. set, which will make some code objectively worse.

Maybe, ever since I graduated from college I learned again and again that pretty much anything worth thinking about in life boils down to math for me. I'd maybe/probably study CS, as a minor or double major, but Pure/Applied Math programs can be more intellectually enriching in this day and age. This is a completely person analysis, it'll change for everyone.

My first impressions: not impressed at all. I tried using this for my daily tasks today and for writing it was very poor. For this task o3 was much better. I'm not planning on using this model in the upcoming days, I'll keep using Gemini 2.5 Pro, Claude Sonnet, and o3.

Imho chatterbox is the current open weight SOTA model in terms of quality: https://huggingface.co/ResembleAI/chatterbox

Thank you, I hadn't heard of it. Will have a look! The samples sound excellent indeed.

Name recognition? Advertisement? Federal grant to beat Chinese competition?

There could be many legitimate reasons, but yeah I'm very surprised by this too. Some companies take it a bit too seriously and go above and beyond too. At this point unless you need the absolute SOTA models because you're throwing LLM at an extremely hard problem, there is very little utility using larger providers. In OpenRouter, or by renting your own GPU you can run on-par models for much cheaper.


Not even that, even if o3 being marginally better is important for your task (let's say) why would anyone use o4-mini? It seems almost 10x the price and same performance (maybe even less): https://openrouter.ai/openai/o4-mini


Probably because they are going to announce gpt 5 imminently


Wow, that's significantly cheaper than o4-mini which seems to be on part with gpt-oss-120b. ($1.10/M input tokens, $4.40/M output tokens) Almost 10x the price.

LLMs are getting cheaper much faster than I anticipated. I'm curious if it's still the hype cycle and Groq/Fireworks/Cerebras are taking a loss here, or whether things are actually getting cheaper. At this we'll be able to run Qwen3-32B level models in phones/embedded soon.


It's funny because I was thinking the opposite, the pricing seems way too high for a 5B parameter activation model.


Sure you're right, but if I can squeeze out o4-mini level utility out of it, but its less than quarter the price, does it really matter?


Yes


Are the prices staying aligned to the fundamentals (hardware, energy), or is this a VC-funded land grab pushing prices to the bottom?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: