This is wild. I've been creating my own dataset of trending articles and ironically this is how I came across your post. I'm doing a similar project for my uni thesis.
I set out with similar hypotheses and goals like you (on a slightly different scale though, haha) but I've been completely stuck on the interactive map part. Definitely getting a lot of pointers from how you handled this!
Maybe one key difference in approach is that I've put more emphasis on trying to extract key topics as keywords.
For ex:
article (title): "Useful Uses of cat"
keywords: ['Software design', 'Contraction', 'Code changes', 'Modularity', 'Ease of extension']
My hypothesis is this will be a faster search solution than using the embeddings, but potentially not as accurate. Not that far yet to really prove this though.
I set out with similar hypotheses and goals like you (on a slightly different scale though, haha) but I've been completely stuck on the interactive map part. Definitely getting a lot of pointers from how you handled this!
Maybe one key difference in approach is that I've put more emphasis on trying to extract key topics as keywords.
For ex:
article (title): "Useful Uses of cat"
keywords: ['Software design', 'Contraction', 'Code changes', 'Modularity', 'Ease of extension']
My hypothesis is this will be a faster search solution than using the embeddings, but potentially not as accurate. Not that far yet to really prove this though.
Would love to hear what you think! Any other cool ideas on what could be done with the keywords? I explain my process a bit more here if interested: https://hackernews-demo.streamlit.app/#data-aggregation-meth...