Which problem is more serious? 1) your small company has an over-complex system ...

feoren · on March 30, 2024

#1 is more serious. #2 limits the growth of your already successful company. #1 sinks your struggling small business. You have to be successful to be a victim of your own success, after all. Not to mention the fact that #1 is way more common. Do you know how far Postgres scales? Because it's way past almost any medium- scale business.

jvans · on March 30, 2024

Exactly. A lot of us work at #2 so we wish our predecessors saved us our current pain. But if they went that route we wouldn't be employed at that company because it wouldn't exist

nfw2 · on March 30, 2024

Exactly, if a medium-sized company is struggling with Postgres, either they have very niche requirements or the scalability problems are in their own code.

derefr · on March 31, 2024

What about #1b: you have an overly-complex "system", but most of that "system" is serverless (i.e. managed architecture that's Somebody Else's Problem), with your own business-logic only being exposed to a rather simple API?

I'm thinking here of engineering teams who, due to worries about scaling their query IOPS, turn not to running a Hadoop cluster, but rather to using something like Google's BigTable.

jeffbee · on March 31, 2024

Sounds like a best practice to me?

alanbernstein · on March 30, 2024

Probably 3) the system you overengineered too early solved the wrong problem, and your replacement is six months away, but you've paid for it twice.

davidw · on March 30, 2024

I have very rarely seen the second scenario, but the first seems more common.

growse · on March 30, 2024

Isn't the second example representative of all tech debt / neglect ever? If so, it's very common.

sitkack · on March 30, 2024

In the second scenario, they can't do math. They could have bought themselves 6-18 months by getting the most powerful machine available using probably at most 1-2 salaries worth of those 40 people.

Less a single digit percentage of workloads needs massive, hard to use horizontal scale out (for things that can solved on a single machine, or a single database).

MR is useful as an adhoc scheduler over data. Need to OCR 10k files, MR it.

Hadoop was the worst possible implementation of MR, wasted so much of everything. That was its primary strength.

hinkley · on March 30, 2024

Very early on in my enterprise career, in a continuance of a discussion where it was mentioned that our customer was contemplating a terabyte disk array (that would fill an entire server rack, so very fucking early) I learned about the great grandfather of NVME drives: battery backed RAM disks that cost $40k inflation adjusted.

“Why on earth would you spend the cost of a brand new sedan on a drive like this?” I asked. Answer: to put the Oracle or DB2 WAL data on so you could vertically scale your database just that much higher while you tried to solve the throughput problems you were having another way. It was either the bargaining phase of loss or a Hail Mary you could throw in to help a behind-schedule rearchitecture. Last resort vertical scaling.

PaulHoule · on March 30, 2024

Reminds me when I had a 3-machine Hadoop cluster in my home lab and 2 nodes were turned off but I was submitting jobs to get and getting results just fine.

I remember all the people pushing erasure code based distributed file systems pointing out how crazy it is to have three copies of something but Hadoop could run in a degraded condition without degraded performance.

sitkack · on March 30, 2024

I agree. I used Disco MR to do amazing things. Trivial to use, like anyone could be productive in under an hour.

Erasure codes are awesome, but so is just having 3 copies. When you have skin in the game, simplicity is the most important driver of good outcomes. Look at the dimensions that Netezza optimized, they saw a technological window and they took it. Right now we have workstations that can push 100GB/s from from flash. We are talking about being able to sort 1TB of data in 20 seconds (from flash) the same machine could do it from ram in 10.

https://github.com/discoproject/disco

I need to give Ray and Dask a try.

I don't know where to put this comment so I'll put it here. DeWitt and Stonebraker are right, but also wrong. Everyone is talking past each other there. Both are geniuses, this essay wasn't super strong.

If I was their editor, I would say, reframe it as MapReduce is an implementation detail, we also need these other things for this to be usable by the masses. Their point about indexes proves my point about talking past each other. If you are scanning the data basically once, building an index is a waste.

nfw2 · on March 30, 2024

No, plenty of tech debt is caused by over-engineering or pre-maturely optimizing for the wrong thing.

I'm not sure if the second outcome is meant to blame Postgres specifically on under-engineering in general, but neither seems to me like it should be a concern for an early-stage startup.

natebc · on March 30, 2024

I generally classify tech debt more as a long todo/wish list that we'll never get a chance to work on rather than a server or service being on fire.

voakbasda · on March 30, 2024

I have found that these fires become uncontrollable because of tech debt. Whole rarely the spark, it’s a latent fuel source.

It’s like our modern forests; unless something clears out the brush, we see wildfires start from the smallest spark. Once it starts, it’s almost impossible to do anything but try to limit the extent of the disaster.

asah · on March 30, 2024

This was true in 2009. Since then, multiple PostgreSQL-compatible databases have launched.