What would your solution for 10 million such requests within a day be?

hmeh · on May 26, 2024

You may be conflating microservices and horizontal scaling. You don't need to have multiple (disparate) microservices to scale. Microservices have absolutely nothing to do with scaling. That's a myth started by people who never understood the actual point of microservices, which was partitioning and continuity of developer productivity.

gumby · on May 26, 2024

You know: write a program like in the old days.

I would perform each request in a single process: read in the metadata (mainly structure) and then either process each tab sequentially or more likely map the whole thing into memory and spawn a thread for each tab, then write the whole thing out in order.

No need for the overhead of microservices: locating, invoking, transferring data, and synchronizing responses, much less dealing with all the pain of lost connections, abnormal termination and so on.

The largest excel sheet I've worked on is only about 500 MB and (does a quick search of my local filesystem) almost all are less than one MB. So in the (rare) worst case the transmission doesn't justify spreading it around; in the common case there's no benefit.

handwarmers · on May 26, 2024

So what happens when this hypothetical machine of yours, that has enough NIC bandwidth to process the scale of data that needs to be streamed in both directions, enough CPU power to handle millions of concurrent requests in a process or a thread of their own, and enough ram/ fast enough disks to to map and swap all the files that are being converted, goes down?

In 2022, Google Workspace apparently had ~ 3 billion users (8 million of which paying) https://developers.googleblog.com/en/year-in-review-12-aweso... .

Not every solution needs microservices. But also, we have problems today that we did not have solutions for "in the old days".

swiftcoder · on May 27, 2024

You keep confusing horizontal scaling with microservices. The two are basically unrelated. You have to horizontally scale regardless of whether you are running regular services or microservices - the goal of micro services is just to increase the granularity of horizontal scaling (or, more often, to solve organisational issues around feature/code ownership)

cess11 · on May 26, 2024

Ten million per day seems unlikely to translate to millions of concurrent requests.

This kind of task is typically suited to be made into a single thing, you don't want partial conversions hanging around in 'microservices' if something goes wrong.

As for scaling, I'd likely put this in a process definition and run it on the BEAM if I were to make such a product. That way millions of requests per hour can hit my cluster and those that fail somehow will just get cleaned up and the transaction rolled back, the clients get 'sorry, try again' or 'sorry, we're working on fixing it', and the rest happily chug along.

handwarmers · on May 26, 2024

Apache Beam?

cess11 · on May 26, 2024

No, the BEAM: https://en.wikipedia.org/wiki/BEAM_(Erlang_virtual_machine)

gumby · on May 26, 2024

You don't have to run them all on a single device! But you hardly need a bunch of microservices, or any, really, to do any one tranlation.

camgunz · on May 27, 2024

Maybe this'll come off as snarky, but I would build Excel! It would spread 10 million requests across the 10 million users, be totally immune to network outages, and my users could rest assured I wasn't thumbing through their data.

grugagag · on May 26, 2024

Couldn’t the conversion run locally?

handwarmers · on May 26, 2024

My assumption is that the comment I was responding to addressed Google Docs specifically (which afaik is web only?). If this is the case, then your options to run locally would be for the conversion to run in your browser, or for you to have some Google agent run on your computer that can handle these requests instead of the Google servers themselves, neither of which is a scalable option in this case (due to browser differences / expecting people to be able to install software locally to their devices)?

jraph · on May 26, 2024

> due to browser differences

Surely a file conversion should not be affected too much by browser differences, it should be a pure function, pure calculation not requiring too many APIs and which doesn't have much to do with rendering.