I'm a long time lurker who has only recently started posting. What's in the archives themselves are just JSON files. I'll post an article next time with what's here, that way it isn't just Dropbox links.
This dataset does include Bloomreach Discovery, Coveo and Algolia. These were detected by looking through HTTP responses for publicly available web pages. For example, Coveo was detected by searching a script tag's src attribute for "static.cloud.coveo.com".
I had just released https://versiondb.io a few hours ago. It's something where you're able to get a slice of what's running on the web without breaking the bank. The full version contains over 4M domains and over 3K detected technologies.
It feels like the type of tool that the HN crowd would be building...
Bang on, I've been working on something that I've intended to be a cost-effective alternative to BuiltWith. I was thinking about just selling the datasets and allowing users to extract whatever they need from it. What technologies are you after?
Sept 2025: https://file.kiwi/25eb6dab#z6CSL-2JqYMO9VYa12n0ZA
Oct 2025: https://file.kiwi/7d2254b1#kYSlZMbEJczuTlMrk-Bo0g
reply