Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I feel vindicated! I'm building a tool with VLMs and I've noticed the answer is always what I expect to see, but wrong if the input is slightly different than expected.

Just like the article - if I have picture of a cup, it says cup, if I have a picture of a dog, it says dog, if it's a dog with a cup, it says a dog with a ball (noticed this with Qwen and InternVL).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: