Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As someone who helped design, then manage a massive NoSQL store which hundreds of terabytes of data, of which most of it was hot, I’ll humbly disagree and say it’s not necessarily as bad as you make it sound.

Did we have someone who managed that system? Sometimes, but mostly it just did it’s own thing. We ironically invested way more time on our MySQL database over the years because we couldn’t get that to scale the way we wanted to, but I think that was specific to a problem we were having.

Did we later invest a lot of time in that system (the NoSQL one)? Yes, because it was very cost effective for us to do so. At certain scale throwing people at optimization problems can pay huge dividends. But, this can be said of most infrstracture. It’s usually worth revisiting every year or two and seeing how what you can squeeze out.

Did we have churning CPU, infinitely expanding disk, replication issues, and more? Sure, but not very commonly and mostly it was fairly easily resolved. More importantly though, it was a solid system that was the underpinning of a colossal system, and it behaved admiraly more than 99.9% of the time.

Will most projects benefit from a hugely distributed KV store? Nope. But I’m still glad they exist!




Caveat, this comment isn't directed at you (I agree with your comment), but rather the points around what you are saying.

One thing that helps is if people stop referring to things as SQL / NoSQL as what ends up happening is various things get conflated.

When talking about stores, it's important to be explicit about a few things:

1. Storage model

2. Distribution model

3. Access model

4. Transaction model

5. Maturity and competence of implementation

What happens is people talk about "SQL" as either an NSM or DSM storage model, over either a single node, or possibly more than that in some of the MPP systems, using SQL as an access model, with linearizable transactions, and a mature competent implementation.

NoSQL when most people refer to it can be any combination of those things, as long as the access model isn't SQL.

I work on database engines, and it's important to decouple these things and be explicit about them when discussing various tradeoffs.

You can do SQL the language over a distributed k/v store (not always a great idea) and other non-tabular / relational models and you can distribute relational engines (though scaling linearizable transactions is difficult and doesn't scale for certain use cases due to physics, but that's unrelated to the relational part of it).

Generally people talk about joins not scaling in some normalized form, but then what they do is just materialize the join into whatever they are using to store things in a denormalized model, which has its own drawbacks.

As to the comment above you, SQL vs NoSQL also doesn't have anything to do with the relative maturity of anything. Some of the newer non-relational engines have some operational issues, but that doesn't really have anything to do with their storage model or access method, it just has to due with the competence of the implementation. MongoDB is difficult operationally not because it's not a relational engine, but because it wasn't well designed.

Just like people put SQL over non-tabular stores, you can build non-tabular / relational engines over relational engines (sharding PostgreSQL etc.). In fact major cloud vendors do just that.


Wonderful response. Thank you. Wish I could give multiple upvotes. I’ll add some of those points to my thought process going forward.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: