The two other main competitors, aimed at startups (and really venture investors) are Pitchbook and CB Insights.
There are several equivalent products like CapitalIQ (owned by S&P Global), Preqin, and offerings from Factset/Refinitiv that are aimed more at private equity investors (later stage) but also include some startup data.
Finally, there are specialized startup data providers like Harmonic.ai (in depth scraping of stealth startups), G2 (Yelp for enterprise software), or Clay.run (innovative UI) that all specialize in something specific but are not at the scale of the above.
How do they get the data?
The first place is SEC Form D Filings. These are required in the US after private funding rounds (lots of caveats, if, buts, etc. but let's keep it simple). This data alone can give you a decent database to start with. After that, it is web-scraping news articles, news wires, LinkedIn, etc. For very specialized areas (ie. Dev Tools), specialized data sources (say Github Archives) might be useful.
Most importantly, many of these providers aim for give-to-get dynamics. Once they become popular enough, startups will actually seek out having a profile (create data) or fix incorrect data (contribute). This is a great dynamic, of course, because it essentially creates proprietary but free data collection.
Websites like TheOrg.com have done a nice job with org charts -- they take a guess at who you report to... and a lot of employees, annoyed at being "layered", will freely fix the data. If you get enough volume, you create a give-to-get flywheel.
I agree with you what is valuable here is the proprietary data. But, behind that, is the _process_ for creating the proprietary data. You could get very good at web-scraping, parsing esoteric government filings, etc. And, maybe that space can get disrupted by someone better (say with LLMs). But ultimately, if you can get users to contribute data -- that's the "promised land" in DaaS.
I also think UI/interface is not value-less. Companies like Clay.run have done a great job making proprietary data accessible to more users. There is value there -- but the data owner collects a (fair) toll on that.
SEC filings + crowd sourcing content seems the way to go. Plus, who wouldn't want to celebrate their latest funding round :)
Curious, how much would you pay for a service where you get the same data as crunchbase, but with a delightful UI, focused on Pre Seed to A, in a vertical like "Dev Tools"?
I think there is a subset of VCs that would pay for this.. unfortunately, that very particular subset of VCs has the smallest budget to pay for things based on their fixed fees/fund sizes.
Firms like CapitalIQ or Pitchbook have their largest contract with giant asset managers for whom a 6- or 7-figure deal would be a very small percentage of AUM (and thereby small percentage of management fees).
For angels/seed stage VCs, you are likely looking at "pro-sumer" like prices. So, something like 100-1000/month at most.
There are several equivalent products like CapitalIQ (owned by S&P Global), Preqin, and offerings from Factset/Refinitiv that are aimed more at private equity investors (later stage) but also include some startup data.
Finally, there are specialized startup data providers like Harmonic.ai (in depth scraping of stealth startups), G2 (Yelp for enterprise software), or Clay.run (innovative UI) that all specialize in something specific but are not at the scale of the above.
How do they get the data?
The first place is SEC Form D Filings. These are required in the US after private funding rounds (lots of caveats, if, buts, etc. but let's keep it simple). This data alone can give you a decent database to start with. After that, it is web-scraping news articles, news wires, LinkedIn, etc. For very specialized areas (ie. Dev Tools), specialized data sources (say Github Archives) might be useful.
Most importantly, many of these providers aim for give-to-get dynamics. Once they become popular enough, startups will actually seek out having a profile (create data) or fix incorrect data (contribute). This is a great dynamic, of course, because it essentially creates proprietary but free data collection.
Websites like TheOrg.com have done a nice job with org charts -- they take a guess at who you report to... and a lot of employees, annoyed at being "layered", will freely fix the data. If you get enough volume, you create a give-to-get flywheel.
I agree with you what is valuable here is the proprietary data. But, behind that, is the _process_ for creating the proprietary data. You could get very good at web-scraping, parsing esoteric government filings, etc. And, maybe that space can get disrupted by someone better (say with LLMs). But ultimately, if you can get users to contribute data -- that's the "promised land" in DaaS.
I also think UI/interface is not value-less. Companies like Clay.run have done a great job making proprietary data accessible to more users. There is value there -- but the data owner collects a (fair) toll on that.