I understand Clay, which your Websets product is clearly inspired by, does a fair amount of matching based on domain name or LinkedIn url.
If Websets is doing fuzzy or naive matching, that's okay. I'm just trying to understand the limitations and potential uses cases of your current system.
Deduplication is mainly driven by LLMs with search results as context. Our entity resolution works well because Exa’s main business is crawling and indexing the web at scale, and we can control how we search across that within Websets.
As far as I know ChatGPT’s search is primarily a wrapper around another company’s search engine, which is why it often feels like it’s just summarizing a page of search results and sometimes hallucinates badly.
My question is how you can confirm the entity you're referencing in each source is actually the entity you're looking for?
An example I ran into recently is Vast (https://www.vastspace.com/). There are a number of other notable startups named Vast (https://vast.ai/, https://www.vastdata.com/).
I understand Clay, which your Websets product is clearly inspired by, does a fair amount of matching based on domain name or LinkedIn url.
If Websets is doing fuzzy or naive matching, that's okay. I'm just trying to understand the limitations and potential uses cases of your current system.