Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thinking about it some more as I read through more comments. I guess in the stated case of research papers it can make sense if your task is looking for the common themes and not specific details. If you are embedding a sentence or a paragraph you miss out on the connection between those sentences across the whole paper...or at least its harder to manage that. By encoding a large number of pages from the paper (or the entire paper) you can hopefully do a better job of capturing the theme of that paper.

This also opens up another question though, how would that compare to using a LLM to summarize that paper and then just embed on top of that summary.



I would guess that the embedded summary is better, but for many tasks where you use embeddings (like document search), summarizing every document with an LLM is too expensive and slow.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: