How usable would the document embedding be for a nearest neighbor search if the ...

fzliu · on Jan 25, 2023

Might be usable for low column count tabular data, but it would be pretty terrible for any other semantically dense modality e.g. video, molecules, geospatial, etc.

hansvm · on Jan 25, 2023

Nearly useless for most applications unless there's been a major improvement in the SOTA that I missed.

sdenton4 · on Jan 25, 2023

can i use the three dimensions to encode a space-filling curve over a 1000-dimensional embedding?

williamcotton · on Jan 25, 2023

Not precisely, but if you had 50 documents in that 1000-dimensional embedding and you reduced the dimensions to three and still got at least the exact same nearest neighbor ordering then it would at least still function, right?

I guess the problem is taking a new document (like a search term) in the higher dimensional embedding and reducing it to three dimensions for searching in that reduced space and expecting that to also maintain the same nearest neighbor ordering.