There are too few examples to say this is a trend. There have been counterexamples of top models actually lowering the pricing bar (gpt-5, gpt-3.5-turbo, some gemini releases were even totally free [at first]).
I love Kagi's implementation: by default it's disabled, you either have to add a question mark to the search, or click in the interface after searching to generate the summary.
This is absurd. Training an AI is energy intensive but highly efficient. Running inference for a few hundred tokens, doing a search, stuff like that is a triviality.
Each generated token takes the equivalent energy of the heat from burning ~.06 µL of gasoline per token. ~2 joules per token, including datacenter and hosting overhead. If you get up to massive million token prompts, it can get up to the 8-10 joules per token of output. Training runs around 17-20J per token.
A liter of gasoline gets you 16,800,000 tokens for normal use cases. Caching and the various scaled up efficiency hacks and improvements get you into the thousands of tokens per joule for some use cases.
For contrast, your desktop PC running idle uses around 350k joules per day. Your fridge uses 3 million joules per day.
AI is such a relatively trivial use of resources that you caring about nearly any other problem, in the entire expanse of all available problems to care about, would be a better use of your time.
AI is making resources allocated to computation and data processing much more efficient, and year over year, the relative intelligence per token generated, and the absolute energy cost per token generated, is getting far more efficient and relatively valuable.
Find something meaningful to be upset at. AI is a dumb thing to be angry at.
I’m curious where you got any of those numbers. Many laptops use <20W. But most local-ai inferencing requires high end, power hungry nvidia GPUs that use multiple hundreds of watts. There’s a reason those GPUs are in high demand, with prices sky high, because those same (or similar) power hungry chips are in data centers.
Compared to traditional computing it seems to me like there’s no way AI is power efficient. Especially when so many of the generated tokens are just platitudes and hallucinations.
> The agreed-on best guess right now for the average chatbot prompt’s energy cost is actually the same as a Google search in 2009: 0.3 Wh. This includes the cost of the answering your prompt, idling AI chips between propmts, cooling in the data center, and other energy costs in the data center. This does not include the cost of training the model, the embodied carbon costs of the AI chips, or the fact that data centers typically draw from slightly more carbon intense sources. If you include all of those, the full carbon emissions of an AI prompt rise to 0.28 g of CO2. This is the same emissions as we cause when we use ~0.8 Wh of energy.
How concerned should you be about spending 0.8 Wh? 0.8 Wh is enough to:
Stream a video for 35 seconds
Watch an LED TV (no sound) for 50 seconds
Upload 9 photos to social media
Drive a sedan at a consistent speed for 4 feet
Leave your digital clock on for 50 minutes
Run a space heater for 0.7 seconds
Print a fifth of a page of a physical book
Spend 1 minute reading this blog post. If you’re reading this on a laptop and spend 20 minutes reading the full post, you will have used as much energy as 20 ChatGPT prompts. ChatGPT could write this blog post using less energy than you use to read it!
W stands for Watts, which means Joules per second.
The energy usage of the human body is measured in kilocalories, aka Calories.
Combustion of gasoline can be approximated by conversion of its chemicals into water and carbon dioxide. You can look up energy costs and energy conversions online.
Some AI usage data is public. TDP of GPUs are also usually public.
I made some assumptions based on H100s and models around the 4o size. Running them locally changes the equation, of course - any sort of compute that can be distributed is going to enjoy economies of scale and benefit from well worn optimizations that won't apply to locally run single user hardware.
Also, for AI specifically, depending on MoE and other sparsity tactics, caching, hardware hacks, regenerative capture at the datacenter, and a bajillion other little things, the actual number is variable. Model routing like OpenAI does further obfuscates the cost per token - a high capabilities 8B model is going to run more efficiently than a 600B model across the board, but even the enormous 2T models can generate many tokens for the equivalent energy of burning µL of gasoline.
If you pick a specific model and gpu, or Google's TPUs, or whatever software/hardware combo you like, you can get to the specifics. I chose µL of gasoline to drive the point across, tokens are incredibly cheap, energy is enormously abundant, and we use many orders of magnitude more energy on things we hardly ever think about, it just shows up in the monthly power bill.
AC and heating, computers, household appliances, lights, all that stuff uses way more energy than AI. Even if you were talking with AI every waking moment, you're not going to be able to outpace other, far more casual expenditures of energy in your life.
A wonderful metric would be average intelligence level per token generated, and then adjust the tokens/Joule with an intelligence rank normalized against a human average, contrasted against the cost per token. That'd tell you the average value per token compared to the equivalent value of a human generated token. Should probably estimate a ballpark for human cognitive efficiency, estimate token/Joule of metabolism for contrast.
Doing something similar for image or music generation would give you a way of valuing the relative capabilities of different models, and a baseline for ranking human content against generations. A well constructed meme clip by a skilled creator, an AI song vs a professional musician, an essay or article vs a human journalist, and so on. You could track the value over context length, length of output, length of video/audio media, size of image, and so on.
Suno and nano banana and Veo and Sora all far exceed the average person's abilities to produce images and videos, and their value even exceeds that of skilled humans in certain cases, like the viral cat playing instrument on the porch clips, or ghiblification, or bigfoot vlogs, or the AI country song that hit the charts. The value contrasted with the cost shows why people want it, and some scale of quality gives us an overall ranking with slop at the bottom up to major Hollywood productions and art at the Louvre and Beethoven and Shakespeare up top.
Anyway, even without trying to nail down the relative value of any given token or generation, the costs are trivial. Don't get me wrong, you don't want to usurp all a small town's potable water and available power infrastructure for a massive datacenter and then tell the residents to pound sand. There are real issues with making sure massive corporations don't trample individuals and small communities. Local problems exist, but at the global scale, AI is providing a tremendous ROI.
AI doombait generally trots out the local issues and projects them up to a global scale, without checking the math or the claims in a rigorous way, and you end up with lots of outrage and no context or nuance. The reality is that while issues at scale do exist, they're not the issues that get clicks, and the issues with individual use are many orders of magnitude less important than almost anything else any individual can put their time and energy towards fixing.
You are cleary biased.
A complex chatgpt 5 thinking runs at 40 Wh per prompt. This is more in line with the estimated load that ai needs to scale. These thinling models wpuld be faster but use similar amount of energy. Humans doing that thinking use far fewer jpiles than gpt 5 thinking. Its not even close.
your answer seems very specific on joules. Could you explain your calculations, since I cannot comprehend the mapping of how you would get a liter of gasoline to 16.8m tokens? e.g. does that assume 100% conversion to energy, not taking into account heat loss, transfer loss, etc?
(For example, simplistically there's 86400s/day, so you are saying that my desktop PC idles at 350/86.4=4W, which seems way off even for most laptops, which idle at 6-10W)
I believe it is the system instructions that make the difference for Gemini, as I use Gemini on AI Studio with my system prompts to get it to do what I need it to do, which is not possible with gemini.google.com's gems
I always joke that Google pays for a dedicated developer to spend their full time just to make pelicans on bicycles look good. They certainly have the cash to do it.
I recently started looking for a new(er) laptop, because it often felt slow. But I started looking at when it was slow, and it was mostly when using things like GMail. I guess my feeling was "if my laptop isn't even fast enough for email, it's time to upgrade". But doing things I actually care about (coding, compiling) it's actually totally fine, so I'm going to hold on to it a bit longer.
This is the exact feeling I had. My 2019 intel MacBook Pro has 12 cores, 32gb ram and a 1TB hard drive. Yet, most consumer web apps like Gmail, Outlook and Teams are excruciatingly slow.
What is surprising is that a few years ago, these apps weren’t so terrible on this exact hardware.
I’m convinced that there’s an enormous amount of bloat right at the application framework level.
I finally caved and bought a new M series Mac and the apps are much snappier. But this is simply because the hardware is wicked fast and not because the software got any better.
I really wish consumer apps cared less about user retention and focused more on user empowerment.
All it would take is forcing an artificial CPU slowdown to something like a 5 year old CPU when testing/dogfooding apps for developers to start caring about performance more.
> All it would take is forcing an artificial CPU slowdown
Technically, yes. But for many large tech companies it would require a large organisational mindset shift to go from more features is more promotions is more money to good, stable product with well maintained codebase is better and THAT would require a dramatic shift away from line must go up to something more sustainable and less investor/stock obsessed.
Obviously not with Gmail/Facebook, in that case it's just 100% incentive misalignment.
The others, probably, VCs are incentivized to fund the people who allocate the most resources towards growth and marketing, as long as the app isn't actively on fire investors will actively push you away from allocating resources to make your tech good.
You would be surprised at how bad the “engineering culture” is at meta. There are surely people who care about page load latency but they are a tiny minority.
I mean, if you look at Meta's main product it's hard to imagine anyone there cares about engineering. It might be the single worst widely used tech product in existence, and considering they produce the frameworks it's built on it's even more embarrassing.
There are a few people who care A LOT about engineering, otherwise everything would completely collapse and not work at all. But they are far from the majority.
I have a 10 gig internet connection (Comcast fiber, 5.6 ms ping to google.com with almost no jitter). Websites are slower today than they were when I got DSL for the first time in the 1990s--except HN of course. It takes multiple seconds to load a new tab in Teams (e.g. the activities tab) and I can see content pop in over that time. It's an utter disgrace.
Lulled into a false sense of security, you'll think you can spot the artificial by the tells that it readily feeds to you. But what happens when deception is the goal?
This is the first standalone headset with an open ecosystem. That's a big deal.
Meta Quests & Apple Visions require developer verification to run your own software, and provide no root access, which slowed down innovation significantly.
There is but one issue with the Lynx XR1 - no one really got it. A few backers randomly got a few pieces but many others (including myself) are still waiting for their device to arrive (and will most likely wait for ever).
This has a serious impact on the developer ecosystem - there are still a few people who got their devices and are doing interesting work, but with so few users actually having devices the community is too small for much progress to be expected.
It's kinda similar to the old Jolla Tablet - it was a very interesting device (an x86 tablet running an open Linux distro in 2013!) but it ended up in too few hands due to funding issues & the amount of Sailfish OS apps actually supporting the tablet (eg. big screen, native x86 builds, etc.) reflected that.
Not to mention Meta abandoned the Quest 1 very quickly. I bought a game when it came out and never got around to playing it (had kids). I tried to play it recently and it no longer even works! £30 down the drain, thanks Zuck.
I guess I can't complain too much given that I got it for free.
I bought an Oculus Go last year for € 30. Its support has been dropped for quite some time, and you can only activate developer mode and sideloading through an old version of the Meta Horizons app [1]. But if you do that, there are 71 GiBs of games to explore on the Internet Archive [2]. Some need patching to remove an online check to a server that no longer exists, but that is easy enough to do with a (regrettably Windows) tool someone published.
The Go is not the best headset of course, but the games are a different style because of the 3DoF tracking without camera's. Somewhat slower paced and sitting down. A style I personally like more.
You can also unlock the device to get root on it [3], which is quite neat, although there doesn't seem to be any homebrew scene at all. Not even the most bare-bones launcher that doesn't require a Meta login.
[1] That doesn't even seem intentional, but it does mean that once the old version of the app can't communicate with Meta servers anymore, any uninitialized Go turns into a brick.
That's not quite true - when did you get your free Quest 1? Only January of this year did Meta officially stop allowing devs to support those devices which IMO is not nice, but probably necessary to put resources towards newer devices since it was extremely outdated and very hard to keep supporting. The Quest 1 launched in May 2019, so it got almost 6 years of updates and if you have one, you can still install older versions of existing apps that choose to support it (which admittedly is very rare). I shut off support for my game back in 2024 when they recommended it, since the device is less than half as powerful as the Quest 2, very few users still had one, and the Q1 was a hard target to hit performance-wise vs newer devices. If you spend $50 to get a Quest 2 you'll get a couple years of updates or even better, spend $299 to get a 3S which is an amazing piece of kit and will probably be supported for at least 5 more years since it just came out.
sorry, maybe i missed it. But how do you know the ecosystem is open?
from the link we don't know if the OS can be changed (might be locked like many Android phones) or if a connected machine is required to run their DRM/Steam. The drivers may also not be open source
from a cursory look
. it seems SteamVR is intended to be used with their DRM platform and isn't open source. Maybe its a bit less limiting vs Meta's offering?
i wouldnt characterize this as an "open ecosystem" though
The key takeaway is that you will rebuild the drivers less often:
1) The stack is mature now, we know what features can exist.
2) For me it's about having the same stack as on a 3588 SBC, so I don't need to download many GB of Android software just to build/run the game.
The distance to getting a open-source driver stack will probably be shorter because of these 2 things, meaning OpenVR/SteamVR being closed is less of a long term issue.
I'm confused. Why would you develop a game on a SBC (that's not powerful enough to do VR)? Why are you not just cross compiling?
It's possible that you can have a full open source stack some day on these goggles.. but I don't think that's something that's obviously going to happen. SteamVR sounds like their version of GooglePlay Services
yeah but is foveated streaming and whatnot going to be opensource, or are we going to have to wait a decade for some grad student to reimplement a half broken version?
Probably, but eye traction is never going to be the focus of indie engines specially if they run on the 3588.
Also about cross compiling that is meaningless as you need hardware to test on and then you should be able to compile on the device you are using to test. Alteast that is what I want, make devices that cannot compile illegal.
Android isn't "just Linux". It's a heavily modified kernel, it's often an even closed source bootloader in many cases and it's completely untrue for userspace, where it incorporates stuff from other OSs (BSDs, etc.). There are huge amounts of blobs.
Yes, there technically is a Linux kernel, but if it's "just Linux" then macOS is "just FreeBSD", because grep -V tells you so, because it has dtrace, because you run (ran?) Docker with effectively FreeBSD's bhyve, etc.
If you wanna spin it even further neither are Safari and Chrome or any other Webkit browsers just Konqueror because they took the layout engine code from KDE (KHTML).
And you can totally install Debian and even OpenBSD, etc. on a Steam Deck and at least the advertisement seems to indicate it won't be all that different for the VR headset.
The problem is that you're talking about the Linux desktop ecosystem whereas the op could be talking about the kernel. Both are just Linux (and the fact we've not evolved our nomenclature to differentiate the two is surprising). Also, fwiw, the android kernel is no longer heavily modified. Most of the custom stuff has been upstreamed.
I'd just like to interject for a moment. What you're refering to as Linux, is in fact, GNU/Linux, or as I've recently taken to calling it, GNU plus Linux. Linux is not an operating system unto itself, but rather another free component of a fully functioning GNU system made useful by the GNU corelibs, shell utilities and vital system components comprising a full OS as defined by POSIX.
Many computer users run a modified version of the GNU system every day, without realizing it. Through a peculiar turn of events, the version of GNU which is widely used today is often called Linux, and many of its users are not aware that it is basically the GNU system, developed by the GNU Project.
There really is a Linux, and these people are using it, but it is just a part of the system they use. Linux is the kernel: the program in the system that allocates the machine's resources to the other programs that you run. The kernel is an essential part of an operating system, but useless by itself; it can only function in the context of a complete operating system. Linux is normally used in combination with the GNU operating system: the whole system is basically GNU with Linux added, or GNU/Linux. All the so-called Linux distributions are really distributions of GNU/Linux!
that doesn't in any way mean you can install an alternate OS. But i get your point that at least you can run Arch stuff. Isnt Arch ARM support unofficial? (its been ages since i tried) You dont hear of people running it on RPis for example
Well. It doesn't say in any docs or specs, but for what it's worth, Valve's hardware has always been open like that. You're free to install windows on your steam deck, for example.
Valve sponsored asahi linux which was a herculean exercise in running another OS on locked down hardware. They've also sponsored wine and fex. It would be a sudden, steep, and unexpected departure for them to go from being leaders in cross platform OS/hardware support to locking down their own hardware platform. It's just not in their nature. They know their nature is good and they know we know it. That's called trust.
They're being a little vague about it but this collaboration to improve Arch's build service/infrastructure is being done in part to faciliate support of multiple architectures.
iirc it was in Tested coverage that Valve said the hardware supports other OSes. It'd be out of character for Valve not to allow for this.
If it's anything like the Deck, then the version of SteamOS on it won't be locked down in any way whatsoever. You can install Windows or any other distro you want on the Deck with 0 issues (other than regular ones you'd experience anyways on any regular computer, nothing to do with Valve locking anything down).
The steam deck was not arm. Unlike the steam machine page, the steam frame page does not insinuate you can put a custom OS on it. On top of custom drivers which are not necessarily upstreamed, qualcomm socs always require closed source userspace daemons which are coupled to the kernel.
Valve have been working with Linaro to develop FOSS drivers for the Adreno 750. This is necessary, given how heavily Valve leans on having integrations with Mesa whereas Qualcomm's drivers are designed for an Android environment.
I don't see why they wouldn't unlock the bootloader, it wouldn't be the first Qualcomm-based product to allow it and in press interviews they have pressed, quite hard, that the Frame is still a PC.
Even just have direct access to hardware apis is already a big win. On Oculus quest. The closest you can get is running with webxr. But webxr suffer from all those performance problem of web platforms. (And bug of meta softwares. The recent quest browser have bug that prevent you from disabling spatial audio, rendering it not usable for watch video at all)
I just want a "dumb" headset that I can use as a portable private display for my laptop.
That's it.
I don't need 3D, I don't need VR, I don't need weirdass controllers trying to be special. Just give me a damn simple monitor the size of my eyes.
Fuck off with your XR OSes and "vision" for XR, not even Apple could get it fully right, the people in charge everywhere are too out of touch and have no clue where the fuck to go after smartphones.
HUD glasses kind of suck since having a display oriented to your head is uncomfortable. Adding 3DOF tracking only partially solves that, so you go 6DOF to maximize optical/vestibular comfort. Now you're rendering a virtual display within a virtual environment, but look at all that wasted space! So add more virtual monitors! Now you need some mechanism to manage them, so you add that and now you have a windowing system... so why are you rendering virtual monitors with fixed space desktops when you can just be rendering the application windows themselves?
The best portable private display for your laptop will inevitably be a 6DOF tracked headset with an XR native desktop.
Yes sorry about my excessive use of French in the comment, I didn't mean it has to be a fixed 1:1 slab of the realspace screen, desktop app windows in XR space would be ideal, but none of the products seem to be able to get it all right yet.
Apple's visionOS comes close but it's crippled by the trademark Apple overcontrolling.
Then this is actually much closer than previous headsets?
There is a lot going on to render the desktop in a tracked 3D space, all that has to happen somewhere. If you're expecting to plug a HDMI cable into a headset and have a good time then I think you're underestimating how much work is being done.
OpenVR and OpenXR are really great software layers that help that all work out.
reply