Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What I'm seeing from all of these things is very accurate single navigable 3D images.

What I haven't seen anything of is feature and object detection, blocking and extraction.

Hopefully a more efficient and streamable codec necessitates the sort of structure that lends itself more easily to analysis.




3D understanding as a field is very much in its infancy. Good work is being done in this area, but we've got a long ways to go yet. SMERF is all about "view synthesis" -- rendering realistic images -- with no attempt at semantic understanding or segmentation.


"It's my VR-deployed SMERF CLIP model with LLM integration, and I want it now!"

It is funny how quickly goalposts move! I love to see progress though, and wow, is progress happening fast!


It's not always moving goalposts - sometimes a new technology progresses on some aspects and regresses in others.

This technology is a significant step forward in some ways - but people are going to compare it to state of the art 3D renders and think that it's more impressive than it actually is.

Eventually this sort of thing will have understanding of lighting (delumination and light source manipulation) and spatial structure (and eventually spatio-temporal structure).

Right now it has none of that, but a layman will look at the output and think that what they're seeing is significantly closer due to largely cosmetic similarities.


You mean something like this? https://jumpat.github.io/SA3D/

Found by putting "nerf sam segment 3d" into DuckDuckGo.


Checkout the LERF work from the NerfStudio team at UC Berkeley. SMERF is addressing a different problem, but there are definitely ways to incorporate semantics and detection as well.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: