"Rule 11: Which database technology to choose: Choose SQL when you need to do ad...

JoshuaDavid · on April 30, 2020

I feel like the unstated caveat is "there will almost always come a time when you need to do ad hoc queries, and there will almost always come a time when you need transactions or the equivalent." Which translates to "use sql unless you are sure your don't and won't need to do ad hoc queries or transactions." Which... seems correct.

arduinomancer · on May 1, 2020

From what I've seen a lot of people solve needing ad-hoc queries with NoSQL by exporting everything to a data warehouse.

seangrogg · on April 30, 2020

I feel like this is a case of "probably shouldn't have a default". SQL should likely be a default consideration but if you're going to say "time to build an app, let's spin up a (insert thing here) to store data" rather than "Let me take some time to consider what my data looks like and select a data persistence strategy accordingly" then you're probably going to wind up also writing a "how my team migrated from <x> to <y> because man did <x> not fit our use case at all" article.

Although I guess if you need blog fodder...

flukus · on April 30, 2020

It makes sense as a default because it will be the correct choice 99% of the time. Even when no-SQL is the better option it tends to be a better option for only part of the application.

Picking it as the default would make us wrong less often.

seangrogg · on May 1, 2020

I question whether it's the "correct choice 99% of the time" or if, anecdotally, you're making the same kind of application and using technology you're comfortable with rather than considering alternatives.

flukus · on May 1, 2020

> you're making the same kind of application and using technology you're comfortable with

To an extent yes, most apps (the ones that need a database anyway) are data oriented at their core, just a combination if input, storage and display. The data gets input here, it goes into rows, it gets displayed mixed with other data and displayed to someone else over there. Even when the specifics differ the data generally follows the same sorts of patterns because data is naturally relational and the combination of tables/rows/joins works really well nearly all the time.

> rather than considering alternatives.

The thing is, when you consider alternative you also have to consider the huge risks of what you don't know about them too. I've worked on one product where no-SQL seemed like the right choice, it was document oriented rather than the usual data oriented and it worked really well in our proof of concept. Problem is that the more we fleshed things out the more and more normal relational data we had and no-sql was making things more and more difficult and we started wishing we'd just stored the documents as a blob in an sql database.

So no, it's not just because I'm comfortable with it (although going with what you know has it's own merits), it really is the superior solution most of the time.

Shorel · on May 1, 2020

I so much agree with you.

The issue with SQL is that the DB needs to be designed first. But when it is done correctly, the advantages are numerous.

wolfgang000 · on May 1, 2020

Not exactly, the fact that a NoSQL database doesn't enforce a data scheme doesn't mean you don't need a clear scheme which your app use, even if the schema is as simple as just take whatever the format that the frontend use.

Because if you don't your database essentially becomes a write-only vault since you don't have any idea of how your data is stored or was stored in the past.

tikiman163 · on April 30, 2020

To be completely frank, I'm seeing less and less reason to use traditional sql databases. MongoDB offers the ability to make sql queries and even has Acid transactions. Everything SQL can do, it does without slowing down when dealing with big data. The only thing it doesn't offer an efficient solution for is something SQL can't do either, and that's advanced search engine capabilities like Elasticsearch provides.

Some people will argue that PostGreSQL is better in certain ways, but the argument really always comes down to 2 factors. Are you going to hit the cost efficiency performance limits of traditional SQL servers, and do you require advanced searching capabilities like graph queries or synonym matching. Even if both answers are No, I'd still argue for Mongo because it makes it easier to distribute Acid compliant coppies of the data by region, providing backup redundancy as well as fast responses in multiple regions.

Youden · on April 30, 2020

> MongoDB offers the ability to make sql queries and even has Acid transactions. Everything SQL can do, it does without slowing down when dealing with big data. The only thing it doesn't offer an efficient solution for is something SQL can't do either, and that's advanced search engine capabilities like Elasticsearch provides.

You seem to be looking at this solely from a perspective of what kind of queries you can run but there's a lot more to it than that. For example how do you model and maintain relational data, which I'd argue is most data? Does MongoDB have support for foreign keys or something like them these days? A quick Google brings up DBRefs but these seem very soft.

OOPMan · on April 30, 2020

You made the point I wanted to make without sarcasm. Thanks :-)

Youden · on April 30, 2020

I don't want to sound too preachy but I find it often helps to assume the best of everyone. Most people aren't idiots, they just see things differently sometimes.

I thought about it a bit and I think that if you see something you disagree with or think is silly, usually that person either has different priorities to you (e.g. they might work in a document-centric company) or might just not have the same knowledge or experience. Either way, if you state your assumptions (in this case "relational data is important") and ask a question ("how does MongoDB handle this"), you should usually be able to trigger a respectful and productive discussion.

Of course sometimes there are just arseholes and trolls on the internet, in which case you can usually tell quickly and stop engaging.

vosper · on May 1, 2020

I manage a team that's responsible for the Mongo DB that powers essentially the whole business. This is a 10 year old company that started right about when Mongo was trendy. After 10 years it's a nightmare to understand what's going on in that database.

And it's now extremely difficult to get off of it precisely because it doesn't have the schema and referential integrity and constraints that we need to be able to understand our data well enough to actually do the migration. We really want to switch to an RDBMS, but it's going to be risky and difficult.

You could say this is all bad engineering, and I guess that's true in a reductive sense. But it's like arguing that you don't need to climb with a safety rope because good climbers don't fall. Over 10 years and many engineers "bad" engineering happens.

I also believe that reasoning about data is hard, and you should therefore try to avoid doing it. You should do that hard thinking one time, and then rely on your database to enforce the rules until they need changing. Aka: Don't Make Me Think (About This Constantly).

If I believed in conspiracy theories I'd say that Mongo was one of the best vendor lock-in plays in tech. Mongo Corp is going to be profitable for a while because once you're down the Mongo rabbit hole it's a real pain to climb back out. But they'll host your database at least, so you don't also have to deal with that. I will give them credit for having a nice management UI.

But from my experience of the past few years I would never choose Mongo. For documents, Elasticsearch, or Postgres if you don't have too many. For relational data, a relational DB.

And Mongo's slow, too.

OOPMan · on April 30, 2020

Riiiiiiightttttttt, because having well defined data is not useful at all.

winrid · on April 30, 2020

You can have well defined data and use Mongo. Those two things are not related. I worked at a place that used Postgres and put an object w/ 11k unique paths in a single JSONB column with no schema or documentation whatsoever.

winrid · on April 30, 2020

... and this system was responsible for billing, and many other things....

unnouinceput · on April 30, 2020

and? That's not PGSQL fault, is the DB architect (or lack of) fault. You definitely can drive the safest car in a ditch, no?

winrid · on May 1, 2020

That was my point right? It has nothing to do with the DB.

throw1234651234 · on April 30, 2020

Also, MongoDB queries for related records can be painful. This takes a while to realize, but definitely shows itself when you are working with more complex data.