This is awesome.
If you try the demo they provide [0], the inference is handled purely in the client using a ONNX model that only weights around ~8Mb [1] [2].
Really impressive stuff! Congrats to the team that achieved it
> What platforms does the model use?
> The image encoder is implemented in PyTorch and requires a GPU for efficient inference.
> The prompt encoder and mask decoder can run directly with PyTroch or converted to ONNX and run efficiently on CPU or GPU across a variety of platforms that support ONNX runtime.
You can download the model yourself on GitHub and run it locally. The biggest one is about 2.5GB and certainly took some time on my M1 CPU. I couldn't get mps to run as the tensor dtypes are incompatible (could be a quick fix).
The small ONNX model just decodes the output of the larger model into the masks etc. But the bulk of the "computation" is done by a much larger vision transformer somewhere else. It really needs a GPU with a fair amount of memory to run anywhere close to real-time.
> The image encoder is implemented in PyTorch and requires a GPU for efficient inference.
WebGPU just shipped today in Chrome, there are no people reporting that the demo doesn't work with their days-old browser, so it doesn't use WebGPU.
While it's possible, without WebGPU it's really tedious to run NN in the browser.
Also, the model is implemented in PyTorch and wasn't converted to other model format for other runtime. While technically you can compile CPython and PyTorch to WASM and run the duo in browser, there are definitely no GPU access.
Given that they explicitly mentioned the decoder was converted to ONNX, it's obvious this isn't done for the encoder and they really mean PyTorch, running with Python, on a server.
Okay, so you browser can't run the encoder, yet the web demo works, it's quite obvious on which server the encoder run.
I downloaded the code from their repo, exported their pytorch model to onnx, and ran a prediction against it. Everything ran locally on my system (cpu, no cuda cores) and a prediction for the item to be annotated was made.
Really impressive stuff! Congrats to the team that achieved it
[0] https://segment-anything.com/demo
[1] https://segment-anything.com/model/interactive_module_quanti...
[2] https://segment-anything.com/model/interactive_module_quanti...