Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's easy to say "scalability is overrated" if you've dealt with unnecessary k8s deployments.

It's easy to say "scalability is underrated" if you've dealt with businesses built on hundreds of standalone PHP/Perl/Python/JS scripts with zero unifying architecture, where migrating a data schema is virtually impossible because of the state of technical anarchy.

Scaling is hard.



You damned kids never heard campfire stories of being brought in to consult on “scaling” a bespoke system for a growing company that has been limping along with VBScript in an Excel spreadsheet for five years past when it was still tenable to do so. The amount of feature parity required to get out of that blind alley often killed the project and injured the company. Some lived it, the rest of us knew someone who had.

There was a brief moment after Oracle purchased Sun where I thought Oracle might try to engineer am Excel competitor on Open Office that had an easier migration to a real database (Oracle), but that dream died early.


FWIW I have a friend who works with market analysis and his excel scripts save an enormous amount of manual labor, today. It was the best tool for the job, for them. There are even international competitions in excel automation, which is kinda funny but also points to how far ahead Excel is for actual business uses.

Are there scaling issues? Version control issues? Absolutely! But again, that doesn’t mean that it’s not the best tool for the job.

It’s easy to mount the highest horse from a technical perspective, but as engineers it’s also our responsibility to be curious about what people are using voluntarily, and not just what we think they should be using.


Microsoft's commitment to not adding modern accommodations to Excel, Access and VBA is infuriating.

A git integration in the editor, and a decent test runner. Some API hooks do Jetbrains can do something with it maybe.


Office documents are zip files - you could get a start on this by version-controlling the contents individually.


I was this many years old when I learned that tar and zip on linux have an rsync compatibility mode that tries to do some cleverness with compression blocks to make it easier to diff two archives.


I thought they were xml?


XML files in a zip file (plus images and other things). There are many individual xml files in a single zip file, each file (part) is generally responsible for different area, e.g. one file for one sheet cells, one file for style definition, one for comments, one for workbook structure and so on.

The whole structure is called Open Packaging Conventions and it is implemented in `System.IO.Packaging`.


As recently as in 2014 my college friend was asked to help a company which was running on Excel and reached the point you mentioned.

IIRC he just optimized their sheets and set up a CRM for the things that had no business being stored in an Excel file and that was already very helpful.

Attempts at writing actual software to deal with the problem would have failed and everyone was acutely aware of that.


At my last job, my work was basically to read data from proprietary APIs and shit an excel table.

I think I was hit by everything. Easy stuff at first: Crlf issues, XML Apis, weird rpc apis.

Then, halfway through the project, the results had to change. Not only the order, datatype and headers (I actually overengineered the first version so those were configurable), but the format, duplicate on multiple columns (and empty fields counted as duplicate...). Worst job I've ever done. I'm also disappointed in myself tbh.

But now I'm a bit of an expert on excel format issue and limitations, and that already helped me.


FWIW IRC ANAL, a couple jobs ago there was a policy about only hiring copy/paste from Stack Overflow headcounts. Since good code was literally not allowed, scaling meant more servers with more RAM.

They eventually replaced me with a handful of their relatives, or at least it seemed. It was a lot of fun watching how many LOC one could "write" to request some trivial data from an endpoint. Luckily I was only golfing the latter half of that tenure so I have no regrets.


Scaling is hard. True.

But the question is, are you trying to make your life miserable by scaling before exhausting other options?

Most applications can be optimised for performance by orders of magnitude, much easier than trying to scale them by orders of magnitude. Any problem is much easier to solve and faster to deliver when you can fit it on a single server.


Some people just don’t know how many users can be served from one server.

Usually it is simple thinking that goes wrong. System is slow - add more hardware. But then it turns out developers did bad job and you could still run it from single small server only that someone would have to write code understanding big O notation.

Main point of big O notation is that there are things that implemented incorrectly will be slow regardless of how much hardware you throw at it.


I don't know if knowledge of big O notation is that big a deal - many of the issues I've seen at least in recent years have come about in O(n) code where the value of n in production use cases was several orders of magnitude higher than anyone had bothered to test (classic example, all our test files were ~50k in size. But recent customers expected to be able to work with files over 1Gb - which of course took 20000x longer to process). And the user perception between something taking 100ms and taking well over 30 minutes is rather a lot. In fact in this particular case realistically there's no way we could process a 1gb file in the sort of time it's reasonable to have a user sit there and wait for it, so it really requires a rethink of the whole UX. In other cases it turned out some basic DB lookup consolidation was sufficient, even if it did require writing new DB queries and accepting significantly higher memory usage (as previously the data was read and discarded per item). If I have found the occasional bit of O(n^2) code that didn't need to be it was usually just a simple mistake.


Notation alone maybe not but O(n) just like you write needs to be addressed. Users or stakeholders expect that they can get "all data" loaded with one click and it should always be instant. With N getting bigger just like you write UX or workflow often has to be changed to have data partitioning even if it O(n) - like adding pagination, moving crunching statistics to OLAP. It quickly gets worse when you have database joins and you might have to understand what database does because you can also have O(n^2) queries then even if db engines are insanely fast and optimized on their own knowing what query does like full table scans also can kill performance.


Until it was bought by AOL, ICQ presence scaled as the largest Digital 64-bit Unix box they sold. (Messages were peer to peer.) it worked remarkably well (solid hardware platform, so not HA but pretty available nonetheless). Network communication was UDP, which was quite a lot cheaper at the time.


There are some big blunders that you can commit to that are incredibly difficult to fix. I think the advice should be on avoiding nailing down closed doors, just keep them locked instead and put the key under the door mat.

I have a Java codebase that calls more constructors inside their constructor. So you have this massive god object that instantiates the whole project in side the constructor. If you want to run parts of it in separate threads you can't just take it apart. You first have to rewrite all the constructors.

“… Because the problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle. “ —Joe Armstrong, creator of Erlang progamming language

I don't think this is a problem with object oriented languages. You could certainly do the same thing in any programming language. All you have to remember is to keep your constructors as simple as possible. Passing in dependencies through the constructor is often an easy solution so I don't get the hate for dependency injection frameworks. The original XML driven iteration of Spring was pointlessly overcomplicating it but nowadays you can just define your "beans" in code which basically means the DI Framework just helps you set up a dependency graph and nothing more.


On a technical note, people underestimate the costs of horizontal scaling.

The Silicon Valley world would do well to learn some things from finance. You’d never write a horizontally scaled order matching engine.


You'd think... I worked at one that tried. I joined the team after the project had been running for quite some time and they said their goal was 10,000 transactions per second. I said, OK, what can we accomplish in 100 microseconds? They laughed at me and said they were going to horizontally scale.

It wasn't successful....


That's conflating scaling with paying off technical debt


> It's easy to say "scalability is overrated" if you've dealt with unnecessary k8s deployments.

Exactly. The author forgot the put the word "premature" before the word scalability.


TFA isn't talking about that kind of scalability. Its scalability in business processes.


Looking at the article title and comments I wasn't able to clearly tell. The second heading "do things that don’t scale" is clearly recognizable and meaningful, but then in agreement, I'd have no reason to click.


I have worked with both and 100% disagree. Give me a mess of bad PHP (my by far least favorite language) any day of the week. It is usually trivial to scale, unlike when I get handed some complex Kubernetes mess. Fixing failed attempts at scaling is usually much harder than making naive bad PHP code scale. It is amazing how much harm "clever" engineers who do premature optimizations can do.


> PHP/Perl/Python/JS scripts with zero unifying architecture,

That's why you use something like a Rails monolith....but oh wait, Ruby doesn't scale well!


That's why you write a monolith on the JVM or CLR, which do scale well.


JRuby!


As long as you can scale (shard) your persistence layer, I don't see why RoR won't scale.

Look at Github, for instance.


Not everything is a glorified CRUD app.

If you're doing any computation or highly concurrent workloads then you will discover the performance issues with Ruby well before you outgrow your persistence layer.


I have done both in Ruby, and addressing it was not a big problem. E.g. my MSc. involved doing a lot of statistics and image processing in Ruby, and solving the performance bottlenecks meant rewriting about ~30 lines or so in C using Ruby Inline. Later I did map tile rendering on demand handling thousands of layers uploaded by customers in Ruby. Both using 1.8.x, btw. - far slower than current versions.

It took more CPU than if we'd rewritten the core of it in something else, but it let us iterate the rendering engine itself much faster, and most of the expensive computations were done by extensions in C anyway (e.g. GDAL and the like).

Of course you can find areas where it's still not viable, but my experience is that if you start by prototyping in whichever high level language - no matter how slow - that is most productive for you, you'll inevitably find you need to rewrite far less than you might think to get it fast enough. But more importantly: The parts you end up rewriting will very often be entirely different parts than what you expected, because being able to iterate your architecture quickly tends to make you end up in a very different place.


I know the argument, I don't buy into the argument (actually not a rubyist). Ruby just imposes some factor x on the performance as compared to Java roughly speaking. Which means you'll start a little earlier to look into queuing requests, distributing stuff in ruby as opposed to a java or go application. Nevertheless if we talk about scale, this is just a constant multiplier.

With a kafka queue, and worker nodes, compute-heavy jobs are easily distributed across many worker nodes.

For many parallel requests, you load-balance.

If the PostgreSQL or mysql table is the bottleneck (persistence layer), well this is actually a design decision thats orthogonal to the programming language (PostgreSQL won't scale better with a JDBC ORM).


Usually web applications fork and runs processes to do the computational work you describe (ffmpeg, imagemagick, git, etc). These are usually written in variety of fast(er) languages like C, C++, Java, etc. Plus now you get multicore scaling for free.


Agree it’s a balance, you need to incrementally pay off tech debt




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: