You can tell how it’s intentional with both OpenAI and Anthropic by how they’re intentionally made opaque. I cant see a nice little bar with how much I’ve used versus have left on the given rate limits so it’s pressuring users to hoard. Because it prevents them from budgeting it out and saying “okay I’ve used 1/3 of my quota and it’s Wednesday, I can use more faster.”
FWIW neither hoard nor ration imply anything about permanence of the thing to me. Whether you were rationed bread or you hoarded bread, the bread isn't going to be usable forever. At the same time whether you were rationed sugar or hoarded sugar, the sugar isn't going to expire (with good storage).
Rationed/hoarded do imply, to me, something different about how the quantity came to be though. Rationed being given or setting aside a fixed amount, hoarded being that you stockpiled/amassed it. Saying "you hoarded your rations" (whether they will expire) does feel more on the money than "you ration your rations" from that perspective.
I hope this doesn't come off too "well aktually", I've just been thinking about how I still realize different meanings/origins of common words later in life and the odd things that trigger me to think about it differently for the first time. A recent one for me was that "whoever" has the (fairly obvious) etymology of who+ever https://www.etymonline.com/word/whoever vs something like balloon, which has a comparatively more complex history https://www.etymonline.com/word/balloon
For me, the difference between ration and hoard is the uhh…rationality of the plan.
Rationing suggests a deliberate, calculated plan: we’ll eat this much at these particular times so our food lasts that long. Hoard seems more ad hoc and fear-driven: better keep yet another beat-up VGA cable, just in case.
Hoarding doesn't really imply how you got it, just that you stockpile once you do. I think you're bang on rationing - it's about assigning the fixed amount. The LLM provider does the rationing, the LLM user hoards their rations.
One could theoretically ration their rations out further... but that would require knowing the usage to the point to set the remaining fixed amounts - which is precisely whT's missing in the interface.
Rationing is precisely what we want to do: I have x usage this week; let me determine precisely how much I can use without going over. Hoarding implies a less reasoned path of “I never know when I might run out so I must use as little as possible, save as much as I can.” One can hoard gasoline but it still expires past a point.
Anthropic also does this because they will dynamically change the limits to manage load. Tools like ccusage show you how much you've used and I can tell sometimes that I get limited with significantly lower usage than I would usually get limited for.
Which is a huge problem, because you literally have no idea what you're paying for.
One day a few of hours of prompting is fine, another you'll hit your weekly limit and you're out for seven days.
While still paying your subscription.
I can't think of any other product or service which operates on this basis - where you're charged a set fee, but the access you get varies from hour to hour entirely at the provider's whim. And if you hit a limit which is a moving target you can't even check you're locked out of the service.
One thing that has worked for me when I have a long list of requirements / standards I want an LLM agent to stick to while executing a series of 5 instructions is to add extra steps at the end of the instructions like "6. check if any of the code standards are not met - if not, fix them and return to step 5" / "7. verify that no forbidden patterns from <list of things like no-op unit tests, n+1 query patterns, etc> exist in added code - if you find any, fix them and return to step 5" etc.
Often they're better at recognizing failures to stick to the rules and fixing the problems than they are at consistently following the rules in a single shot.
This does mean that often having an LLM agents so a thing works but is slower than just doing it myself. Still, I can sometimes kick off a workflow before joining a meeting, so maybe the hours I've spent playing with these tools will eventually pay for themselves in improved future productivity.
There are things it’s great at and things it deceives you with. In many things I needed it to check something for me I knew was a problem, o3 kept insisting it were possible due to reasons a,b,c, and thankfully gave me links. I knew it used to be a problem so surprised I followed the links only to read black on white it still wasn’t. So I explained to o3 that it’s wrong. Two messages later we were back at square one. One week later it didn’t update its knowledge. Months later it’s still the same.
But at things I have no idea about like medicine it feels very convincing. Am I in hazard?
People don’t understand Dunning-Kruger. People are prone to biases and fallacies. Likely all LLMs are inept at objectivity.
My instructions to LLMs are always strictness, no false claims, Bayesian likelihoods on every claim. Some modes ignore the instructions voluntarily, while others stick strictly to them. In the end it doesn’t matter when they insist on 99% confidence on refuted fantasies.
The problem is that all current mainstream LLMs are autoregressive decoder-only, mostly but not exclusively transformers.
Their math can't apply modifiers like "this example/attempt there is wrong due to X,Y,Z" to anything that came before the modifier clause in the prompt.
Despite how enticing these models are to train, these limitations are inherent. (For this specific situation people recommend going back to just before the wrong output and editing the message to reflect this understanding, as the confidently wrong output with no advisory/correcting pre-clause will "pollute the context": the model will look at the context for some aspects coded into high(-er)-layer token embeddings, inherently can't include the correct/wrong aspect because we couldn't apply the "wrong"/correction to the confidently-wrong tokens, thus retrieves the confidently-wrong tokens, and subsequently spews even more BS.
Similar to how telling a GPT2/GPT3 model it's an expert on $topic made it actually be better on said topic, this affirmation of that the model made an error will prime the model to behave in a way that it gets yelled at again... sadly.)
If you’re on HN you’ve probably been around enough to know it’s never that simple. You implement the counter, now customer service needs to be able to provide documentation, users want to argue, async systems take hours to update, users complain about that, you move the batch accounting job to sync, queries that fail still end up counting, and on and on.
They should have an indicator, for sure. But I at least have been around the block enough to know that declaring “it would be easy” for someone else’s business and tech stack is usually naive.