SQL wins because big iron wins. At the end of the day, the big, clunky, slow, co...

wjossey · on Sept 26, 2017

As someone who helped design, then manage a massive NoSQL store which hundreds of terabytes of data, of which most of it was hot, I’ll humbly disagree and say it’s not necessarily as bad as you make it sound.

Did we have someone who managed that system? Sometimes, but mostly it just did it’s own thing. We ironically invested way more time on our MySQL database over the years because we couldn’t get that to scale the way we wanted to, but I think that was specific to a problem we were having.

Did we later invest a lot of time in that system (the NoSQL one)? Yes, because it was very cost effective for us to do so. At certain scale throwing people at optimization problems can pay huge dividends. But, this can be said of most infrstracture. It’s usually worth revisiting every year or two and seeing how what you can squeeze out.

Did we have churning CPU, infinitely expanding disk, replication issues, and more? Sure, but not very commonly and mostly it was fairly easily resolved. More importantly though, it was a solid system that was the underpinning of a colossal system, and it behaved admiraly more than 99.9% of the time.

Will most projects benefit from a hugely distributed KV store? Nope. But I’m still glad they exist!

cachemiss · on Sept 26, 2017

Caveat, this comment isn't directed at you (I agree with your comment), but rather the points around what you are saying.

One thing that helps is if people stop referring to things as SQL / NoSQL as what ends up happening is various things get conflated.

When talking about stores, it's important to be explicit about a few things:

1. Storage model

2. Distribution model

3. Access model

4. Transaction model

5. Maturity and competence of implementation

What happens is people talk about "SQL" as either an NSM or DSM storage model, over either a single node, or possibly more than that in some of the MPP systems, using SQL as an access model, with linearizable transactions, and a mature competent implementation.

NoSQL when most people refer to it can be any combination of those things, as long as the access model isn't SQL.

I work on database engines, and it's important to decouple these things and be explicit about them when discussing various tradeoffs.

You can do SQL the language over a distributed k/v store (not always a great idea) and other non-tabular / relational models and you can distribute relational engines (though scaling linearizable transactions is difficult and doesn't scale for certain use cases due to physics, but that's unrelated to the relational part of it).

Generally people talk about joins not scaling in some normalized form, but then what they do is just materialize the join into whatever they are using to store things in a denormalized model, which has its own drawbacks.

As to the comment above you, SQL vs NoSQL also doesn't have anything to do with the relative maturity of anything. Some of the newer non-relational engines have some operational issues, but that doesn't really have anything to do with their storage model or access method, it just has to due with the competence of the implementation. MongoDB is difficult operationally not because it's not a relational engine, but because it wasn't well designed.

Just like people put SQL over non-tabular stores, you can build non-tabular / relational engines over relational engines (sharding PostgreSQL etc.). In fact major cloud vendors do just that.

wjossey · on Sept 26, 2017

Wonderful response. Thank you. Wish I could give multiple upvotes. I’ll add some of those points to my thought process going forward.

jjirsa · on Sept 26, 2017

There is so much wrong with this post I don’t know where to start

Old school sql replication is awful. It’s awful in Postgres (single threaded replay? A single vacuum can make your slaves fall behind). It’s awful in MySQL (go ahead and tell me how you do dual master without rewriting your app).

People use nosql not because it’s eady or flawless, but because doing it with sql is an order of magnitude harder.

Go ahead and describe resharding a sql DB as your company grows 100x

Or how you’ll handle a dc failover and fail back

Or how you’ll shrink your resources after the holidays - without downtime.

These are trivial with Cassandra and god damn near impossible for most MySQL users.

More reliable? Your “big iron” is still a single point of failure. If not a single cpu, a raid card or a top of rack switch or a DC in a hurricane zone.

6nf · on Sept 26, 2017

> damn near impossible for most MySQL users.

MySQL? Nobody, and I mean NOBOBY is saying you should use MySQL. All of the things you mention is trivial using a decent SQL solution like PostgreSQL.

jjirsa · on Sept 26, 2017

Except the person to whom I replied said to use traditional SQL databases:

> Traditional SQL databases end up being many times simpler in practice, and because of their inherent scaling limitations, much easier to manage. And most importantly: their implementations aren't buggy.

6nf · on Sept 26, 2017

They mean traditional SQL databases like PostgreSQL, SQL Server, DB2, Oracle etc.

Not MySQL. MySQL has almost none of the important features of a real SQL database management system. Nobody should be using MySQL for anything.

jjirsa · on Sept 26, 2017

Quick, tell YouTube and Facebook they’ve been doing it wrong all along

6nf · on Sept 26, 2017

They are NOT using stock-standard MySQL and haven't done so for many years. If you are big enough to make your own programming languages the normal rules do not apply.

And if you ask the Youtube team for a recommendation for your project, they will not point you to MySQL.

jaequery · on Sept 26, 2017

i haven't touched mysql in years but quite frankly, 99% of websites out there (blog/cms) will run fairly fine on a Mysql database. as for the other 1%, they would be big enough to pick the right db of their choice as the time comes. so there is no need to tell everyone you should be using this or that.

this is kind of like telling everyone that they should ALL be using emacs, when vim/nano is still sufficient for the job.

6nf · on Sept 26, 2017

I honestly don't see a single reason why MySQL should be used over PostgreSQL for anything, thus my sweeping statement. I could be wrong but I don't think so.

elvinyung · on Sept 26, 2017

You got downvoted, but I agree with you. Traditional RDBMS is inherently single-node and a pain to scale. The actual sad thing here is that there's no mature/reliable commodity (i.e. open source) distributed RDBMS for transaction processing yet.

jinqueeny · on Sept 26, 2017

I have to bring this up again: NewSQL. There are quite a few new choices out there, Google Spanner, Cockroachdb(https://github.com/cockroachdb/cockroach), TiDB (https://github.com/pingcap/tidb). All of them are very easy to scale while maintaining the ACID transactions.

elvinyung · on Sept 26, 2017

I said open source and mature :)

IMO really the best option for distributed OLTP RDBMS/NewSQL right now is basically either Citus or Vitess, only because they are built on the backs of extremely mature databases. It really feels like all other options are not mature enough (except maybe VoltDB).

jinqueeny · on Sept 26, 2017

Could you please elaborate how you define "mature"?

elvinyung · on Sept 27, 2017

IMO, in descending order of importance to me:

* Dependability. I can be completely certain that standard usage of the database won't lose my data or leave it in a bad state, at the very least.

* Community. There's an active IRC channel (or the like) where I can ask questions. When an error show up, I can Google it and easily find an applicable solution most of the time.

* Performance. Not just with synthetic (i.e. TPC-C, TPC-H, YCSB, etc.) workloads, but proven to have reasonably satisfactory QPS in many/most real production settings.

* Ecosystem. There's a good collection of first-party (and also ideally third-party) tools, constantly being improved. (Mere wire-compatibility with MySQL or Postgres might be good, but probably isn't enough, since 100% wire compatibility seems rare.)

jinqueeny · on Sept 27, 2017

Thanks for sharing！These criteria seems quite practical. You are probably right. Time will tell.

rhencke · on Sept 26, 2017

CockroachDB? I'll grant you it's still young, but it pretty solidly hits all your other checkboxes.

elvinyung · on Sept 26, 2017

IMO, it's very promising but still extremely immature.

I still believe in Curt Monash's cardinal rule of RDBMS development: http://www.dbms2.com/2013/03/18/dbms-development-marklogic-h...

yenwel · on Sept 26, 2017

VoltDB?

stephenr · on Sept 26, 2017

Galera + HAProxy or ProxySQL should give you a multi-master MySQL with no app rewrites required, no?

jjirsa · on Sept 27, 2017

Primary key conflicts are still a thing. Especially problematic where you're either not using auto_incrementing PKs, and/or you try to switch writes from one master to the other and app has to be aware of the switch.

stephenr · on Sept 27, 2017

Not using auto incrementing keys is surely a rarity with MySQL projects.

Even before galera was a thing regular MySQL replication could use auto_increment_(increment|offset) to make each server generate unique id's.

jjirsa · on Sept 27, 2017

There exists a LOT of data models that don’t use auto incrementinf primary keys. Rarity or not, it’s a real use case and a real problem for master/master replication

Also, even if you do use auto increment with offsets for each server, you’ll have to know how many servers you’ll eventually have, or waste offsets and eventually risk outrunning your data types

stephenr · on Sept 27, 2017

Non auto keys will already need some way to ensure uniqueness that will involve the application expecting conflicts.

For those that do use auto inc, galera will manage the offset/increment value based on cluster size automatically.

imtringued · on Sept 26, 2017

SQL is just a querying language, the point of the article is that it's not exclusive to RDBMS, it is also used by NoSQL databases. There is also no law that says RDBMS must be strongly consistent. They can also be eventually consistent. The opposite also applies to NoSQL databases.