Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Our renewal bill for Datadog came to –$83,000/year before we canceled (twitter.com/dhh)
47 points by lopkeny12ko on Aug 26, 2024 | hide | past | favorite | 51 comments


I gotta hand it DataDog's sales team - it's the biggest combination of expensive and useless I've ever seen in my life, but they've somehow managed to convince the people with the checkbooks that they can't live without it.


What I’ve seen happen is that people pay Datadog and, for some reason or another, they:

  1. Cannot really get the whole team on Datadog onboarding (Development too, not just “Infra”), so there are these really deep gaps that will never, ever be covered.

  2. Don’t use all the services they pay for: “Oh, these CI/CD observability pipelines thing that costs us half our infrastructure? Yeah, I suppose they run fine.”, “Oh, these Profile Hosts? Yeah, just in case we use them, although it was 11 months the last time we did use them...”

  3. Pay in excess for services they don’t take advantage of because of #1.

  4. Cannot allocate time to follow through with their Account Manager and apply the spending suggestions they sometimes make. “Hey, you’re spending tons on Profile Hosts, but I don’t see usage. Have you looked into that? Oh, yeah, some day we will...”
I would definitely not call them useless, but gosh they are expensive for the minimum of things, which I agree if you cannot commit to the points above, you might as well setup your own Prometheus + Grafana + Loki + Mimir + Tempo stack and see what you can come up with, or even try one of the other alternatives I’ve seen around, like SigNoz[1].

-----

[1]: https://signoz.io/


What you described is the work of a scamming sales team, nothing less. Blame the right people, could you?


Who’s the scamming Sales team? Datadog’s?

I’ve seen it happen first hand where our side promises to be on board, and we start cancelling meetings with them, don’t follow Datadog training, report we don’t have time this week on Datadog meetings, etc. All after having paid for it.

I don’t see how that’s Datadog’s fault, but I’m eager to understand it better.


You just described that someone/some team sold something that doesn't meet the customer's need. If you can't see what happened, I can try to explain it for you:

Someone from Datadog sales team talks to someone from marketing at Corporation X. Corp. X Marketing loves what Datadog salespeople sell and what amazing things they can get from it and, guess what? pay the first bill. Great job, team!

Now is when engagement with the technical team of corp X starts but the technical team that wasn't the one that agreed with the bill. Tech-savvy people knows that Datadog isn't worth it (that's why they aren't the ones that salespeople reach to). Now they have to justify a $83.000/year kinda bill they didn't sign up for. Great salespeople, great sales team. Scammers, if you ask me.

As someone from marketing, you've just signed the bill and are essentially the one who "colluded" with Datadog to take money out of Corp X's deep pockets, without fully understanding the technical implications or value of the product.


My understanding is that DD is widely regarded as one of the best observability platforms if you can stomach the bill. Could you elaborate on the comment that DD is "useless"? Would love to understand where it comes short compared to other platforms.


He doesn't know what he's talking about. Don't feed the trolls.


What even is an "observability platform" ?

A cursory reading has not enlightened me.


the whole pipeline: from an SDK to add metrics collection to your program, to an agent running on machines, to a collector for those stats, to a dashboard with pretty graphs you can use, sliced up in meaningful ways, and to monitor/troubleshoot problems when they arise, to alerting, so you don't have to worry about the system not running when you're not looking at it.


> What even is an "observability platform" ?

$83,000/year to store your log files for you (you have to do the work of storing the log files).


Just a suite of tools to aggregate and action telemetry.


Having had to use many different logging/observability platforms, I have to say Datadog is one of the better ones I've used.

Biggest strength is that it is user-friendly. I had very little trouble finding logs I care about, and to set up monitors + dashboards.


IMO, I wouldn't call it useless, it's "useless for most teams".

I've been at many companies where they roll out Datadog and collect a crap ton of metrics/logs and so forth. Teams act all gung ho about "being data driven team" but quickly learn, almost all the numbers don't matter. As SRE, I attempt to drill down but any observations are met with silence or "Oh yea, that, meh, toss another 2 cores at the problem and let's move on."

At the cost of Datadog, their cost rarely exceeds outage cost or higher cloud bill cost. If you think you need it, buy it for a year where you can probably get good discount rate, use the data to rapidly improve most glaring issues you have then cancel the contract until you need it again in 3 years.

Obviously, if you are in highly responsive business, none of this applies but for us just cranking out B2B Rest APIs, you will be fine without Datadog.


I use it every day. I’m curious what you’re building that it’s useless for?


Datadog is New Oracle and Observability is ripe for Open Source Disruption.

Great to see projects like SigNoz, NetData, Coroot appearing and Open Telemetry ecosystem getting more and more usable Opens Source backends and visualization tools not just agents.


Thanks for mentioning SigNoz. Totally agree that OpenTelemetry and open source should be the way forward here.

If anyone wants to check out SigNoz repo - https://github.com/SigNoz/signoz

PS: I am one of the maintainers at SigNoz


I found that "observability platforms" like Datadog, New Relic etc can be really expensive and not that valuable unless you place a centralized "real-time observability pipeline" that's data agonistic between the originating telemetry data sources and the end destination like Datadog, New Relic, Looker, etc.

This approach allows you to see the data stream in real-time and then you can reduce, enrich, transform the data and then route it to any end destination like Datadog, New Relic, etc.

We did this and saw some pretty significant saving from the observability platform vendors we are using. Not only that the fact we were able to enrich the data made our troubleshooting much faster because we had data ingested now that we can make sense of and act on. We used datable.io who is a startup out of SF accomplish this. We talked with them about a month ago and their service was still in beta but they are allowing us to use it free of charge for now (not sure if they are still doing that or not?). I think the founder is an Ex New Relic guy?


There's a class of control plane startups that specialize in shaping/dropping observability data points (Honeycomb, Signoz, ChronoSphere) to both focus the results and reduce ingest/storage costs.

OTel can do this independent of providers at the collector, for better (all downstream tooling benefits) and worse (with considerable fussiness and bespoke config).


“… and Kibana for metrics logging”

I love the ELK stack. Very much; but last I worked with it at a well-oiled company 7ish(?) years ago, we had one full time employee’s worth of time dedicated to keeping it stable, not to mention the infra costs.

Was this really a savings when you now you have to manage your own SLA and have an on-call rotation for it?


This right here is why you pay. I've alternatively walked into places where no one was responsible for updating and managing these tools and they degrade into yet another problem. It's still cheaper and more efficient.


There's this cycle of posts that are _When other people do things for you, it's expensive_

Look at these outrageous AWS prices. Look at these outrageous Datadog prices. Look at these outrageous Snowflake prices...

And then of course, they lead to _Look how much cheaper my homegrown solution is_

It's a little tiring. You make trade-offs all the time. If the price isn't worth it, don't pay for it.


When almost all the money is going to the software and setup, it sure shouldn't cost less to do it yourself. It takes money to make a robust solution, plus healthy profit, but that money is spread across every customer.

Renting equipment is an entirely different issue, but I get the impression that the equipment cost is a pretty negligible percent of the number being discussed.


> It takes money to make a robust solution, plus healthy profit, but that money is spread across every customer.

The requirements of supporting the more demanding customers typically are amortized across all customers. Requirements re: security, SLA (uptime, data durability, etc.), data governance for example. It is difficult to both implement and market products that break these dimensions into tiers. You can sometimes do it for the very niche, tail of your customers (air gap regions, for example), but in a lot of products, everyone is sharing in these costs even if they only care about a subset of the capabilities.

The cost savings of scale are offset by the costs of say, needing to support Fortune 100s that need 99.99% availability even if Bob's Wordpress Page doesn't care too much about 99% vs 99.99%. You can apply that analogy to every dimension.

A homegrown solution targets specifically what your company actually cares about. It makes sense that it may be cheaper.

But again. YOU are intelligent, you are informed, you can look and see that Datadog provides a bunch of capabilities you don't need and you're paying for things you don't get benefit from and make the trade-off of building your own. Just stop telling me about it.

Making an argument that there's something abnormal about market competition and that's why prices stay high is different -- I'd read that.


If their amortized support cost is an appreciable fraction of the 83k, I feel like something has gone wrong.

And if someone is demanding 99.99% SLA, that cost should not be passed on to customers that only want the normal 99.8% with no refunds.

> provides a bunch of capabilities you don't need and you're paying for things you don't get benefit from

Capabilities such as?

It sounds like you're making a generic defense that could be applied to companies that justify their costs and to companies that don't.


> it sure shouldn't cost less to do it yourself.

This is an interesting point and something I often wonder about. I think there's a number of different reasons for it, obviously markup and the need to make a profit, but one thing I realized is that when you buy something like AWS, you're often paying for the platinum version, with redundant power, networking, etc, and not the bronze version, which a lot of people would settle for. If I setup my one single instance Grafana VM or physical server, it's going to be a lot less expensive than a solution that's both multiple layers of "platinum" and multiple layers of "markup"

Datadog probably runs on the cloud, which means you get the cloud markup and datadog's mark up both built in.


> I realized is that when you buy something like AWS, you're often paying for the platinum version, with redundant power, networking, etc, and not the bronze version, which a lot of people would settle for.

This is the difference in most cases.

Just taking availability as an example. Datadog has a 99.8% availability SLA. I suspect most companies would be fine paying for a 99% availability SLA, for example, at 1/4th the cost, but in most cases there's no way to scale down SLAs like that and launch a cheaper version.

There is a giant leap in cost (complexity, staffing, etc.) between 99% and 99.9% that most of these home grown solutions don't account for.

These nuances are lost in posts like these, and it is tiring.


99% is down three days a year, and 99.8% can still go down for 17 hours each year.

Neither of those is particularly hard to reach. The cost increase to go from one to the other is not a big percentage.

What do you even get if the SLA is breached? Looking it up I see it excludes planned maintenance, and "in the event the Service availability drops below 99.8% for two consecutive months, Customer may terminate the Service". That's useless.


> it sure shouldn't cost less to do it yourself

The companies produce generic solutions. Because software is complex, and their customers exist in many domains with different needs and whatever.

When you can make a really focused solution, it becomes cheap.

Look at the C++ STL. How much am I using? Not much in comparison to the whole thing. I could write an implementation of std::vector in an afternoon and use it in my apps. But std::vector is like 10,000 lines long and almost impossible to read. Because it's a generic solution, like the rest of the STL.

DD provides solutions with adapters upon adapters. Using this language? Use this SDK. How about this one? Then this other SDK. Have these types of data points? Use this. Need this graph? Use this. And on and on.

But most customers are only gonna use a little bit. Okay, we'll use the ruby SDK to collect metrics. What about the other 30 SDKs because this is a generic solution? I dunno. But you're paying for them.


But this isn't a company with a small amount of customers. If you have a hundred adapters, and a thousand customers, your development costs per customer should be lower than DIY could reach.

Especially because you really know what you're doing by the time you write that tenth adapter, so each one is higher quality and much faster to make.

In the analogy, yes you could write a vector implementation in an afternoon. But if 50 other devs do the same thing, and then we add on all the debugging time, it starts to look like they're putting in more work than the big generic centralized version took.


> If you have a hundred adapters, and a thousand customers, your development costs per customer should be lower than DIY could reach

Hmm I don't think so because I think software complexity isn't linear. Meaning going from 1 unit of complexity to 2 is probably much, much easier than going from 99 to 100.


A bunch of adapters in parallel shouldn't add many units of complexity.


Depends because sharing logic like that can be dangerous. Each system is going to have exceptions and unique properties, and SOME will leak into the shared space. This is a hard problem. If what you were saying were true, software would be perfect. It's not though, it sucks ass.

I mean, try writing an API for 100 different consumers with different needs. Good luck! They're gonna ask you for X ... Z and B depends on F but F can't exist with C and C is just a bad idea overall but also O is there and that changes things and...


Then you take the title and the average dev salary, and think about your best case scenario (only setting up the open source tools, onboarding your devs), your mid case failures (need to reboot, out of storage) and your worst case scenarios (hardware fails, lost storage, the high speed link goes down)

And you realize you're getting a deal


The worst case scenario is the most convincing one there, but the worst case scenario with a purchased service is often the same as DIY.


Why not self host grafana and Loki? It’s not hard and the cost for redundancy and backups running on tons of nodes with fast internet isn’t anything new. Much cheaper


My experience is with Kibana and Honeycomb; but in my experience, Grafana, even with Tempo, pales in comparison to the power that the ELK stack or a robust, clean UI’d data aggregator, like Honeycomb is capable of.


I guess it’s a cost benefit analysis. If the UI of X is worth $50k, that’s something some companies can say ok to.

Moreso speaking for the companies that feel hopeless when the UIs of these open source tools might probably be ok


DD was great at the beginning, but over time they moved to value-based pricing.

It's sad but understandable that all the good monitoring tools move up the food chain.


I saw the fall of NewRelic when they started copying DD pricing practices, but to be honest, it was all the startup ecosystem at the time that started doing stuff like that.


"The man who needs a new machine tool, and hasn't bought it, is already paying for it." Agree though ballooning cost will prove to be Datadog's Achilles heel. But $83k/year is a bargain no? Where do you even find an engineer working less than $150k/year who could provide the same level of service as Datadog?


Its a weird glimpse into DHH’s psychology that he writes such a bombastic take on what is an extremely nuanced decision.

If it has any point, it must be to brag about how elite their ops team is to build and maintain obs infrastructure with less than 1/3 of a headcount. THAT would make for interesting reading.


DHH you shouldn't complain - you invented this pricing model !


Time to make the move to OTel with an Observability backend like https://www.kloudmate.com (or any other APM of choice) and get most of the stuff done for a fraction. The big Dog has dictated terms for far too long.


You should disclose that this is your company.


Done. Thanks!


PS: I am associated with KloudMate


Ah, but you did pay it.


Once.


The title is misleading; it says -$83,000/year when it should say ~$83,000/year, as in DHH's post. Those mean very different things.


Email the mods and they’ll correct it?


Excuse my for rolling my eyes at DHH complaining about the price of software. It's enterprise software. You saw the price when you signed up. DD isn't cheap. Neither are the direct competitors (have you looked at Splunk licenses lately?).

Why is the cost so high? Because it's easy to add logging in places it isn't needed or useful. Every place I've ever worked has been able to save costs on these licenses by taking a pass at cleaning up their logs. If you pay by volume and you're logging things that aren't useful and never ever ever get read, you're burning cash. And if you think you're not doing this, I promise you probably are. DD (and others') costs can be reasonable if you don't just spray log data at it indiscriminately.

Want it for cheap? Cloudwatch is pennies compared to DD. It's not good, but it works. Want something faster or more featureful? You can stand up ElasticSearch or your favorite log aggregation software, and then you're probably paying a substantial percentage of what you paid for DD in server costs and time required to keep the thing plugging along. And then whatever time you spend dealing with "Grafana started timing out and we don't know why" and similar flavors of incidents.

Yes, good software costs about as much as a FTE. Because you'll (hopefully!) save about a FTE's worth of value by using the tool.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: