This is a port of Meta's Segment Anything computer vision model which allows eas...

dekhn · on Sept 5, 2023

Do you know how the time to do the image embedding takes? In SAM, most of the time is spent generating a very expensive embedding (prohibitive for real-time object detection). From the timing on your page it looks like yours is also similarly slow, but I'm curious how it compares to the pytorch Meta implementation.

yavorgiv · on Sept 6, 2023

I am the creator of the repo.

Depends on the machine, number of threads selected and the model checkpoint used (Vit-B or Vit-L or Vit-B). The video demo attached is running on Apple M2 Ultra and using the Vit-B model. The generation of the image embedding takes ~1.9s there and all the subsequent mask segmentations take ~45ms.

However, I am now focusing on improving the inference speed by making better use of ggml and trying out quantization. Once I make some progress in this direction I will compare to other SAM alternatives and benchmark more thoroughly.

billrobertson42 · on Sept 6, 2023

This is amazing. Thank you!