Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a port of Meta's Segment Anything computer vision model which allows easy segmentation of shapes in images. Originally written in Python, Yavor Ivanov has ported it to C++ using the GGML library created by Georgi Gerganov which is optimized for CPU instead of GPU, specifically Apple Silicon M1/M2. The repo is still in it's early stage



Do you know how the time to do the image embedding takes? In SAM, most of the time is spent generating a very expensive embedding (prohibitive for real-time object detection). From the timing on your page it looks like yours is also similarly slow, but I'm curious how it compares to the pytorch Meta implementation.


I am the creator of the repo.

Depends on the machine, number of threads selected and the model checkpoint used (Vit-B or Vit-L or Vit-B). The video demo attached is running on Apple M2 Ultra and using the Vit-B model. The generation of the image embedding takes ~1.9s there and all the subsequent mask segmentations take ~45ms.

However, I am now focusing on improving the inference speed by making better use of ggml and trying out quantization. Once I make some progress in this direction I will compare to other SAM alternatives and benchmark more thoroughly.


This is amazing. Thank you!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: