More

qeternity · 2025-07-13T11:48:49 1752407329

You can rent H100s for $1.50/gpu/hr these days.

qeternity · 2025-07-02T13:26:35 1751462795

This is not a result of income inequality. This is a result of measures intended to reduce income inequality, which of course almost universally make things worse.

The other lending that the US (and Western governments in general) massively subsidize is education. And look at what has happened to the cost of housing and education. They have both massively exceeded general rates of inflation.

The cure is worse than the disease.

qeternity · 2025-06-22T17:33:25 1750613605

It's almost as if geopolitical environments evolve over time.

I don't think anyone today would feel any differently about preventing Pakistan from attaining nuclear weapons.

Unfortunately it's very difficult to take nukes away from a country.

pixelpoet · 2025-06-22T19:30:12 1750620612

It was a whole lot easier to take nukes away from a country until America fed Ukraine to the wolves. Great job...

qeternity · 2025-06-25T15:55:44 1750866944

Ok? I agree with you. It doesn't change anything I said.

qeternity · 2025-06-22T17:29:39 1750613379

People really need to stop glossing over the very real differences between state controlled media, and media that you think is aligned with a certain political group.

You can believe Fox News is the worst entity in human history, but Fox News is not RT.

qeternity · 2025-06-13T23:24:35 1749857075

> "Fine-tuning LLMs for knowledge injection is a waste of time" is true, but IDK who's trying to do that.

Have people who say this ever actually done it? It works. It works pretty well.

I have no clue why this bad advice is so routinely parroted.

reissbaker · 2025-06-14T23:57:48 1749945468

It technically works with enough data but it's pretty inefficient compared to RAG. However, changing behavior via prompting/RAG is harder than changing behavior via finetuning; they're useful for different purposes.

qeternity · 2025-06-12T16:25:59 1749745559

> How is power discarded

It isn’t, not at scale in any traditional sense.

qeternity · 2025-06-09T18:12:07 1749492727

> the old one is completely valueless to them

This is of course untrue for the same reason that people are still running Windows 2000.

bee_rider · 2025-06-09T18:52:28 1749495148

> This is of course untrue for the same reason that people are still running Windows 2000.

What is the reason?

dcre · 2025-06-09T19:15:56 1749496556

They’ve built processes around it and don’t feel like/can’t afford to/ don’t know to how change them.

bee_rider · 2025-06-09T19:24:05 1749497045

I guess we’ll see how that shakes out.

Because models are getting much better every couple months, I wonder if getting too attached to a process built around one in particular is a bad idea.

stavros · 2025-06-10T08:53:40 1749545620

I would agree if Windows 2000 had the exact same APIs as the next version, but it doesn't. LLMs are text in -> text out, and you can drop in a new LLM and replace them without changing anything else. If anything, newer LLMs will just have more capabilities.

qeternity · 2025-06-10T13:38:46 1749562726

> LLMs are text in -> text out, and you can drop in a new LLM and replace them without changing anything else. If anything, newer LLMs will just have more capabilities.

I don't mean to be too pointed here, but it doesn't sound like you have built anything at scale with LLMs. They are absolutely not plug n play from a behavior perspective. Yes, there is API compatibility (text in, text out) but that is not what matters.

Even frontier SOTA models have their own quirks and specialties.

msgodel · 2025-06-10T13:43:58 1749563038

When I've built things with them I've mostly considered the quirks defects.

Kind of like how httpds will have quirks but those aren't really a good thing and they're kind of plug and play.

stavros · 2025-06-10T13:46:19 1749563179

What kind of quirks have you seen that the next model wasn't better at?

dcre · 2025-06-11T07:33:54 1749627234

A simple example would be when models get better at following instructions, the frantic and somewhat insane-sounding exhortations required to get the crappier model to do what you want can cause the stronger model to be a bit too literal and inflexible.

qeternity · 2025-06-08T12:58:55 1749387535

I think you mean non-deterministic, instead of probabilistic.

And there is no reason that these models need to be non-deterministic.

skybrian · 2025-06-08T13:41:54 1749390114

A deterministic algorithm can still be unpredictable in a sense. In the extreme case, a procedural generator (like in Minecraft) is deterministic given a seed, but you will still have trouble predicting what you get if you change the seed, because internally it uses a (pseudo-)random number generator.

So there’s still the question of how controllable the LLM really is. If you change a prompt slightly, how unpredictable is the change? That can’t be tested with one prompt.

rvz · 2025-06-08T13:49:22 1749390562

> I think you mean non-deterministic, instead of probabilistic.

My thoughts too. It's more accurate to label LLMs as non-deterministic instead of "probablistic".

qeternity · 2025-06-07T20:14:06 1749327246

Append-only would imply yes. There is no overwriting in append-only. There is only truncate and append.

mosselman · 2025-06-07T20:18:45 1749327525

You have misread I think.

There used to be append-only, they've removed it and suggest using a credential that has no 'delete' permission. The question asked here is whether this would protect against data being overwritten instead of deleted.

ThomasWaldmann · 2025-06-08T12:33:00 1749385980

Yes, it also disallows overwriting.

qeternity · 2025-06-06T10:25:20 1749205520

It's the same process as regular training, but instead of a single token for your cost function, it's top-k logits.