My first impression is that this should enable approximately what Apple is doing with their AI strategy (local on-device first, then filling back to a first party API, and finally something like ChatGPT), but for web users. Having it native in the browser could be really positive for a lot of use cases depending on whether the local version can do things like RAG using locally stored data, and generate structured information like JSON.
I don't think this is a terrible idea. LLM-powered apps are here to stay, so browsers making them better is a good thing. Using a local model so queries aren't flying around to random third parties is better for privacy and security. If Google can make this work well it could be really interesting.
Apple might have an advantage given they’ll have custom hardware to drive it, and the ability to combine data from outside the browser with data inside it. But it’s an interesting idea.
Apple may have a bit of a lead in getting it actually deployed end-to-end but given the number of times I've heard "AI accelerator" in reference to mobile processors I'm pretty sure that silicon with 'NPUs' are probably all over the place already, and if they're not, they certainly will be, for better or worse. I've got a laptop with a Ryzen 7040, which apparently has XDNA processors in it. I haven't a damn clue how to use them, but there is apparently a driver for it in Linux[1]. It's hard to think of a mobile chipset launch from any vendor that hasn't talked about AI performance in some regards, even the Rockchip ARM processors seem to have "AI engines".
This is one of those places where Apple's vertical integration has a clear benefit, but even as a bit of a skeptic regarding "AI" technology, it does seem there's a good chance that accelerated ML inference is going to be one of the next battlegrounds for processor mobile performance and capability, if it hasn't started already.
For sure many devices will have them, but the trick will be to build this local web model in a way that leverages all of the local chips. Apple’s advantage is in not having to worry about all that. It has a simpler problem and better access to real local data.
Give my personal local data to a model running in the browser? Just feels a bit more risky.
I think they’ll package a variety of model formats and download the appropriate one for the user’s system. Apple and Microsoft both offer OS-level APIs that abstract over the specific compute architecture so within the platform you don’t need further specialization. On Apple hardware they’ll run their model via CoreML on whichever set of compute (CPU, GPU, NPU) makes sense for the task. On Windows it will use DirectML for the same. I’m not sure if there’s a similar OS abstraction layer on Linux, possibly there they’ll just use CPU inference or whatever stack they usually use on Android.
If you compare a random older Windows laptop to a new or nearly new Mac, sure. But new Windows/x86 PCs have gotten pretty efficient. Here's a review showing a Meteor Lake laptop getting 15 hours of video playback on battery: https://www.anandtech.com/show/21282/intel-core-ultra-7-115h...
Exactly. So feature-wise it can work, and out of our "I proudly spent $2500 on my fancy Apple laptop"-tech bubble, people already learned to settle on something hotter.
And would still need to give Chrome access to your contacts, files etc to make it equivalent. It’s not useless, obviously it’ll be good, it’s just not the same. My original comment was replying to:
“enable approximately what Apple is doing with their AI strategy”
It definitely is a terrible idea, but it follows naturally from the "browser is an operating system" approach the industry has been taking for quite a while.
With this, Goog gets to offload AI stuff to clients, but can (and will, I guarantee) sample the interactions, calling it "telemetry" and perhaps saying it's for "safety" as opposed to being blatant Orwellian spying.
> The code below is all you need to stream text with Chrome AI and the Vercel AI SDK. ... `chromeai` implements a Provider that uses `window.ai` under the hood
Leave it to Vercel to announce `window.ai` on Google's behalf by showing off their own abstraction but not the actual Chrome API.
Here's a blog post from a few days ago that shows how the actual `window.ai` API works [0]. The code is extremely simple and really shouldn't need a wrapper:
const model = await window.ai.createTextSession();
const result = await model.prompt("What do you think is the meaning of life?");
Fwiw, this is a very experimental API and we're not really promoting this as we know the API shape will have to change. It's only available by enabling a flag so we're working with a lot of developers via the preview program (link below)
web dev is rife with this stuff. wrappers upon wrappers with a poor trade off between adding api overhead / obscuring the real workings of what's going on and any actual enhanced functionality or convenience. it's done for github stars and rep.
Or maybe it’s done because people like to try and experiment new things, see what works and what doesn’t, with sometimes surprising results. I thought the name of this site was Hacker News, let people do weird things
As an example jQuery literally started as a wrapper lib around JS features, and it became so influential over time that tons of features from jQuery were upstreamed to JS.
Yes, wrapping stuff to give a different developer experience contributes to new ideas, and can evolve into something more.
It's a double edged sword. I suppose when I started my career I would just take the popularity of something as proof positive it was truly helpful. At this point I am a lot more cynical having gone experienced the churn of what is popular in the industry. I would say jQuery was definitely a good thing, because of the state of web at the time and the disparity between browsers.
More recently, and on topic, I am dubious about langchain and the notion of doing away with composing your own natural language prompts from the start. I know of at least some devs whose interactions with llm are restricted solely to using langchain, and have never realized how easy it is to, say, prompt the llm for json adhering to a schema by just, you know, asking it. I suppose eventually frameworks/ wrappers will arise around in-browser ai models. But I see a danger in people being so eager to incuriously adopt the popular even as it bloats their project size unnecessarily. If we forecast ahead, if LLMs become ever better, then the need for wrappers should diminish I would think. It would suck if AI and language models got ever better but we still were saddled with the same bloat, cognitive and code size, just because of human nature.
Someone in our community created a provider and I wanted to showcase it.
It’s nice insofar with very little abstraction, runtime, and bundle size overhead, you can easily switch between models without having to learn a new API.
If this is the API that Google are going with here:
const model = await window.ai.createTextSession();
const result = await model.prompt("3 names for a pet pelican");
There's a VERY obvious flaw: is there really no way to specify the model to use?
Are we expecting that Gemini Nano will be the one true model, forever supported by this API baked into the world's most popular browser?
Given the rate at which models are improving that would be ludicrous. But... if the browser model is being invisibly upgraded, how are we supposed to test out prompts and expect them to continue working without modifications against whatever future versions of the bundled model show up?
Something like this would at least give us a fighting chance:
const supportedModels = await window.ai.getSupportedModels();
if (supportedModels.includes("gemini-nano:0.4")) {
const model = await window.ai.createTextSession("gemini-nano:0.4");
// ...
> But... if the browser model is being invisibly upgraded, how are we supposed to test out prompts and expect them to continue working without modifications against whatever future versions of the bundled model show up?
Pinning the design of a language model task against checkpoint with known functionality is critical to really support building cool and consistent features on top of it
However the alternative to an invisibly evolving model is deploying an innumerable number of base models and versions, which web pages would be free to select from. This would rapidly explode the long tail of models which users would need to fetch and store locally to use their web pages, eg HF's long tail of LoRA fine tunes all combinations of datasets & foundation models. How many foundation model + LoRAs can people store and run locally?
So it makes some sense for google to deploy a single model which they believe strikes a balance in the size/latency and quality space. They are likely looking for developers to build out on their platform first, bringing features to their browser first and directing usage towards their models. The most useful fuel to steer the training of these models is knowing what clients use it for
I don’t think you’d need to download on the fly. You can imagine models being installed like extensions where chrome comes with Gemini installed by default. Then have the API allow for falling back to the default (Gemini) or throwing an error when no model is available. I’d contend that this would be a better API design because the user can choose to remove all models to save space on devices where AI is not needed (ex: kiosk).
Only that it will take 5-10y to be regulated, until they will have to pay a measly fine, and let users choose. But then we will have the same game as with GDPR conformity now, companies left and right acting as if they just misunderstood the "new" rules and are still learning how this big mystery is to be understood, until a judge tells them to cut the crap. Then we will have the big masses, that will not care and feed all kinds of data into this AI thing, even without asking people it concerns for consent. Oh and then of course Google will claim, that it is all the bad users doing that, and that it is so difficult to monitor and prevent.
In the LLM engineering community we call those "evals", and they're critical to deploying useful solutions. More on that here: https://hamel.dev/blog/posts/evals/
See this reply from someone on the Chrome team [0]. It's not a final API by any stretch, which is why you can't find any official docs for it anywhere.
What’s the point in JavaScript? At the end of the day that’s still equivalent to Models[“GeminiNano04”]
In C# you can’t compile a reference to Models.Potato04 unless Potato04 exists. In JS it’s perfectly legal to have code that references non-existant properties, so there’s no real developer ergonomics benefit here.
On the contrary, code like `ai.createTextSession(“Potato:4”)` can throw an error like “Model Potato:4 doesn’t exist, try Potato:1”, whereas `ai.createTextSession(ai.Models.Potato04)` can only throw an error like “undefined is not a Model. Pass a string here”.
Or you can make ai.Models a special object that throws when undefined properties are accessed, but then it’s annoying to write code that sniffs out which models are available.
Look at WebNN [1]. It's from Microsoft and is basically DirecttML but they at least pretend to make it a Web thing.
The posture matters. Apple tried to expose Metal through WebGPU [2] then silent-abandoned it. But they had the posture, and other vendors picked it up and made it real.
That won't happen to window.ai until they stop sleepwalking.
It's a very experimental API and that's why it's only behind a flag and not available to the general web for people to use. We will be taking these through the standards process (e.g the higher level translate API - https://github.com/WICG/translation-api)
The only reason IE lost its monopoly is because MS entirely abandoned working on it for years. They had to build a new team from scratch when they eventually decided to begin work on IE7.
If we thought websites mining Monera in ads was bad, wait until every site sells its users’ CPU cycles on a gray market for distributed LLM processing!
I can’t seem to find public documentation for the API with a cursory search, so https://github.com/jeasonstudio/chrome-ai/blob/ec9e334253713... might be the best documentation (other than directly inspecting the window.ai object in console) at the moment.
It’s not really clear if the Gemini Nano here is Nano-1 (1.8B) or Nano-2 (3.25B) or selected based on device.
1.8B/3.25B is still too much for edge devices. Ideally tens or hundreds mega would be ok. Is there an option to change the builtin Gemini Nano to other smaller models?
By the way, haven't touch the lastest JS code for a while, what does this new syntax mean: "import { chromeai } "
Also not get the textStream code:
for await (const textPart of textStream) {
result = textPart;
}
does result get override for each loop step?
`import { chromeai } from ...` is doing destructuring of the exported symbols of the module being imported. So, here it only imports the variable or type or function named chromeai.
Today I finally clicked on that "Create theme with AI" on chrome's default page. I'm really having a hard time trying to differentiate it with selecting any random theme.
At this point I'm going to create an image generator that's just an api to return random images from pixabay. pix.ai (opensource of course)
So many AI applications right now are just buzzword plays like that, burning hundreds of watts and seconds of latency on features that are already solved better and cheaper with traditional programs.
But lots of both stakeholders and users currently value the "magic" itself over anything practical.
> So many AI applications right now are just buzzword plays like that, burning hundreds of watts and seconds of latency on features that are already solved better and cheaper with traditional programs.
Internet has been already like genAI for decades. Need a picture? Prompt in the Google Image search a few keywords. There are billions of human made images to choose from. Need to find information about something? Again prompt the search engine, or use Wikipedia directly, it's more up to date than LLMs.
Need personalized response? Post on a forum, real humans will respond, better than GPT. Need help with coding? Stack overflow and Github Issues.
We already had a kind of manual-AI for 25 years. That is why I don't think the impact shock of AI will be as great as it is rumored to be. Various efficiencies of having access to an internet-brain have already been used by society. Even in art, the situation is that a new work competes with decades of history, millions of free works one click away, better than AI outputs, no weird artifacts and giveaways.
I disagree. Take code generation. Right now stackoverflow wins on correctness but your question may not overlap entirely with the asked question. LLMs answer your actual question but you sacrifice correctness.
So there is a tradeoff so Generative AI is more useful in many circumstances.
AI is getting more accurate with time not less. It is using less energy per byte with time too for a given quality.
It's going to be like the tide of "we saved $XYZ by moving to the cloud" articles followed by the inevitable "we saved 10x $XYZ by moving back to on prem". We just aren't at that 2nd half of the cycle yet, but it's coming. All that LLM processing isn't free.
So it's loading an instruct model for inference? That seems a fair bit less useful than a base model, at least for more advanced use cases.
What about running LoRAs, adjusting temperature, configuring prompt templates, etc? It seems pretty early to build something like this into the browser. The technology is still changing so rapidly, it might look completely different in 5 years.
I'm a huge fan of local AI, and of empowering web browsers as a platform, but I'm feeling pretty stumped by this one. Is this a good inclusion at this time? Or is the Chrome team following the Google-wide directive to integrate AI _everywhere_, and we're getting a weird JS API as a result?
At the very least, I hope to see the model decoupled from the interface. In the same way that font-family loads locally installed fonts, it should be pluggable for other local models.
You should also be able to set it by adding the switches: `chrome --args --enable-features=OptimizationGuideOnDeviceModel:on_device_model_temperature/0.5/on_device_model_topk/8` (source: https://issues.chromium.org/issues/339471377#comment12)
Of course we can't trust Google to allow AI to strip away ads. But other browsers with other AIs, open sourced ones, will do that. Run the web in a sandbox and present a filtered output. Could remove ads, annoyances and rerank your feeds by time, or impose your own rules on the UI and presentation of the web. The browser AI should also monitor activity for unintended information leaks, because proper data hygiene is hard, and people need their own agent to protect against other internet agents trying to exploit them.
They’re implying that you could trick the model into providing an up to date way to block ads on the page since it’s local and could just inspect the page.
The joke comes across fine even with the wrong AI call, lol
Can someone specialized in applied machine learning explain how this is useful? In my opinion, general-purpose models are only useful if they're large, as they are more capable and produce more accurate outputs for certain tasks. For on-device models, fine-tuned ones for specific tasks have greater precision with the same size.
I think you may be extrapolating a ChatGPT-esque UX for what these on-device models will be used for. Think more along the lines of fuzzy regex, advanced autocomplete, generative UI, etc. Unlikely anybody will be having a long-form conversation with Gemini Nano.
I guess the lessons of Clippy have vanished from history.
Yesterday upon restarting my PC a Skype dialog popped up inviting me to see how CoPilot could help me. So naturally I went into the task manager and shut down the Skype processes.
YES!!! Back when Opera was adding a local AI to their browser UI, I had explained how I wanted it to be exposed as an API, as it seems like one of the few ACTUAL good uses for a user agent API: letting me choose which model I am using and where my data is going, rather than the website I am using (which inherently will require standardizing an API surface in the browser websites can use instead of trying to compete for scant memory resources by bringing their own local model or shipping my data off to some remote API).
> So while I am usually the person who would much rather the browser do almost nothing that isn't a hardware interface, requiring all software (including rendering) to be distributed as code by the website via the end-to-end principal--making the browser easy to implement and easy to secure / sandbox, as it is simply too important of an attack surface to have a billion file format parsing algorithms embedded within it--I actually would love (and I realize this isn't what Opera is doing, at least yet) to have the browser provide a way to get access to a user-selected LLM: the API surface for them--opaque text streaming in both directions--is sufficiently universal that I don't feel bad about the semantic lock-in and I just don't see any reasonable way to do this via the end-to-end principal that preserves user control over tradeoffs in privacy, functionality, and cost... if I go to a website that uses an LLM I should be the one choosing which LLM it is using, NOT the website!!, and if I want it to use some local model or the world's most powerful cloud model, I 1) should be in control of that selection and 2) pretty much have to be for local models to be feasible at all as I can't sit around downloading and caching gigabytes of data, separately, from every service that might make use of an LLM. (edit: Ok, in thinking about it a lot more maybe it makes more sense for this to be a separate daemon run next to the web browser--even if it comes with the web browser--which merely provides a localhost HTTP interface to the LLM, so it can also be shared by native apps... though, I am then unsure how web applications would be able to access them securely due to all of the security restrictions on cross-origin insecure port access.)
This doesn't seem useful unless it's something standardized across browsers. Otherwise I'd still need to use a plugin to support safari, etc.
It seems like it could be nice for something like a bookmarklet or a one-off script, but I don't think it'll really reduce friction in engaging with Gemini for serious web apps.
I'm sure you realize that Google's strategy has long been the opposite: that users will continue to abandon other engines when theirs is the only one that supports capabilities, until standards no longer matter at all and the entire browser ecosystem is theirs.
Chrome may have been a darling thing when it was young, but is now just a fresh take on Microsoft's Internet Explorer strategy. MS lost it's hold on the web because of regulatory action, and Google's just been trying to find a permissible road to that same opportunity.
> users will continue to abandon other engines when theirs is the only one that supports capabilities
That’s why people chose chrome? Citation needed. I’ve very rarely seen websites rely on new browser specific capabilities, except for demos/showcases.
Didn’t Chrome slowly become popular using Google's own marketing channel, search? That’s what I thought.
> MS lost it's hold on the web because of regulatory action
Well, not only. They objectively made a worse product for decades and used their platform to push it, much more effectively than Google too. They are still pushing Edge hard, with darker patterns than Google imo.
In either case, the decision to adopt Chromium wasn’t forced. Microsoft clearly must have been aligned enough on the capability model to not deem it a large risk, and continued to push for Edge just as they did with IE.
By default, the W3C process actually requires multiple implementations before something is supposed to be standardised. So it is actually necessary that browser vendors ship vendor-specific implementations before the standards process can properly consider things like this.
If Mozilla jumps on board and makes a compatible implementation that back ends to eg: local llama then you would have the preconditions necessary for it to become standardised. As long as Google hasn't booby trapped it by making it somehow highly specific to chrome / google / Gemini etc.
E.g. Promptfoo, ChainForge, and LocalAI all have abstractions over many models; also re: Google Desktop and GNU Tracker and NVIDIA's pdfgpt: https://news.ycombinator.com/item?id=39363115
Browser extensions should be able to shim or even overwrite the `.ai` object in the window, so it should be possible to add ollama etc to all browsers through extensions with the same API, making it a defacto standard
It should be simple enough to do that I believe at least 3-5 people are going to be doing this if it's not done already
I had been thinking and speaking in public about how to make a "Metamask but for AI instead of crypto" but I thought it would be impossible for websites to adopt it
Now thanks to Google it's possible to piggy back onto the API
I don't think so. Chrome is already the most popular browser. If a website decides to use this they are just going to tell users to use Chrome. And then Chrome sustains its dominant position. It's the right strategy for them to further their dominance.
And the right way to think about it isn't other browsers. It's Google seeing what Apple is doing in iOS 18 and imitating that.
Going a touch further: make it a pluggable local model. Browser fetches the first 10 links from google in the background, watches the YouTube video, hides the Google ads, presents you with the results.
Now not only can Google front the web pages who feed them content they make summaries from, but the browser can front Google.
“Your honour, this is just what Google has been saying is a good thing. We just moved it to the edge. The users win, no?”
Yes, as demonstrated by the reaction to articles like this. Started out as a few dissenters and lots of praise. Now we’re down to mostly negative comments.
Another few years and most of these won’t even make it to the front page.
If they're going to cram a small LLM in the browser, they might as well start cramming a small image generating diffusion model + new image format to go along with it.
I believe we can start compressing down the amount of data going over the wire 100x this way...
I should correct myself a bit here.
I believe it's actually UNET type models that can be used to very lossily "compress" images. You can use latents as the images, and use the locally running image generation model to expand the latents to full images.
In fact any kind of decoder model, including text models can use the same principle to lossily compress data. Of course, hallucination will be a thing...
Diffusion models, depending on the full architecture might not have smaller dimension layers that could be used for compression.
Seemingly very interesting moves, but I wonder what specific needs remain for a 'browser-level' local LLM, because local LLMs will be on devices in the near future. So if we're not connected to the internet, maybe a device-level LLM would be better. On the other hand, when we open the browser, we're connected to the great internet! I know browser-level LLMs can have several benefits like speed, privacy protection, and cost-effectiveness, but these features are covered by internet-based LLM APIs or device-level LLM APIs.
I mostly think it's an interesting concept that can allow many interesting user experiences.
At the same time, it is a major risk for browser compatibility. Despite many articles claiming otherwise, I think we mostly avoided repeating the "works only on IE6" situation with chrome. Google did kinda try at times, but most things didn't catch on. This I think has the potential to do some damage on that front.
Microsoft, Mozilla and Apple all have the resources to provide competitive small LLMs in their own browsers’ implementation of window.ai if this catches on. A 3B sized model isn’t a moat for Chrome.
Even if the model data is the same - we have seen that fingerprinting can be applied to WebGL, so if hardware acceleration is used to run those models, it might be possible to fingerprint the hardware based on the outputs?
These models make heavy use of RNG (random number generator), so it would be difficult to fingerprint based on the output tokens. It may be possible to use specially crafted prompts that yield predictable results. Otherwise, just timing how long it takes to generate tokens locally.
There's already so many ways to fingerprint users which are far more reliable though.
All I have is a tweet stating a new property `ai` is being added to the browser window context. That and a short video with a link to a Vercel app that stops me because I'm not using Chrome.
Why is it brain rot? It still blows my mind that it’s even possible for a low-power device to talk and behave like us. For all practical purposes, LLMs pass the Turing test.
This a major leap forward in human innovation and engineering. IMO, this could be as influential as the adoption of electricity/setting up of the power grid.
I share a bit of parent's skepticism: the possibilities are infinite (as many things really), but do we need to dive in head first and sprinkle "AI" dust everywhere just in case some of it could be useful ?
For instance I don't need my browser to pass the Turing test. I might need better filtering and better search, but it also doesn't need to be baked in the browser.
Your analogy to electricity is interesting: do you feel the need to add electricity to your bed, dining table, chairs, shelves, bathroom shower, nose clip etc.
We kept electric and non electric things somewhat separate, even as each tool and appliance can work together (e.g. my table has a power strip clipped to it, but both are completely separate things)
These are all currently independent plugins or applications that operate partly regardless of the browser they target.
In particular I get to choose the best options for each of them (in particular search, filtering and security being independent from each other seems like a core requirement to me). The most telling part to me is how extensions come and go, and we move on from one to the other. The same kind of rollover won't be an option with everything in Apple's AI for instance.
This could come down the divide between the Unix philosophy of a constellation of specialized tools working together or a huge monolith responsible for everything.
I don't see the latter as a viable approach at scale.
Because there is a ton of hyper fixation and rash decisions being made over something that puts words together. It seems very unwise to add a new browser API for something that is in its infancy and being developed.
One small aspect: it is training developers not to learn their tools. We now have a generation of "software developers" who think they can just lean on "AI" and it'll make them more productive. Except all they're capable of is outputting poorly put together inefficient crap, at best.
Yes! Please, I also regularly write in three different languages (Spanish, French and English in my case) and it's just insufferable using my phone vs using a keyboard, that I can't really interact fluently with my phone and third party services.
Wow, no need to even send your data to the server for mining and analysis. Just use your local CPU/GPU and your power to do comprehensive ad analytics of all your data. No need to maintain expensive server farms!
I don’t want the API. I don’t want random websites burning my battery on AI nonsense I never asked for to make the ads on page more engaging or some other nonsense.
There was already talk of using it in Safari to allow users to create custom filters to persistently block portions of a given website.
Sort of like using ChatGPT to help figure out how to use FFmpeg to accomplish a task from the command prompt, but used to create the equivalent of greasemonkey scripts.
In my mind there is a big difference between the browser using it for features and letting any random website burn my electricity on a new scale for crap I never agreed to.
I used to hold Google Chrome in high esteem due to its security posture. Shoehorning AI into it has deleted any respect I held for Chrome or the team that develops it.
What a needlessly hostile and cynical blog post. I also can't relate to their appraisal of copilot. It's been an incredible productivity booster for me in how it removes so much cognitive load that's tangential to the problem I'm solving. It mostly completes the same line/s I was about to type, sometimes even pointing out an error in my code when the output is odd. It's also awesome when it spits out 20+ lines unexpectedly that do exactly what I was about to do, taking only a brief moment of code review by me to verify.
Funny, that's been pretty much the opposite of my experience with Copilot (or any other LLM-based code assistant). It constantly spits out lines that are similar to what I was planning to write, but are subtly wrong in ways that take me more time to figure out and fix than it would have to just write it myself in the first place.
It's handy if I want a snippet of example code that I could've just found on Stackoverflow, but not useful for anything I actually have to think about.
> It's been an incredible productivity booster for me in how it removes so much cognitive load that's tangential to the problem I'm solving. It mostly completes the same line/s I was about to type, sometimes even pointing out an error in my code when the output is odd. It's also awesome when it spits out 20+ lines unexpectedly that do exactly what I was about to do, taking only a brief moment of code review by me to verify.
If Copilot is so great, why does your employer even need you? Replacing you with Copilot would be more capital-efficient.
mataroa is quite... grumpy, but with good cause. I like these kinds of pragmatic, burned out posts from engineers. Sometimes VCs are too much about hype and so very, very distant from reality.
I don't think this is a terrible idea. LLM-powered apps are here to stay, so browsers making them better is a good thing. Using a local model so queries aren't flying around to random third parties is better for privacy and security. If Google can make this work well it could be really interesting.