LinkedIn shelved plan to migrate to Microsoft Azure cloud

thenewwazoo · on Dec 15, 2023

I'm obviously not going to comment on anything internal, and I'm obviously speaking for myself and not the company, but it's worth bearing in mind that this migration was not from "on-prem" in the traditional sense. LinkedIn has its own internal cloud, complete with all the abstractions you'd expect from a public cloud provider, except developed contemporaneously with all the rest of the "clouds" everyone is familiar with. It was designed for, and is tightly coupled to, LinkedIn's particular view on how to build a flexible infrastructure (for an extreme example, using Rest.Li[1], which includes client-side load balancing).

There was no attempt to "lift-and-shift" anything. There are technologies that overlap and technologies that conflict and technologies that compliment one another. As with any huge layered stack, you have to figure out which from the "LinkedIn" column marry well with those in the "Azure" column.

I personally appreciate LI management's ability to be clear-eyed about whether the ROI was there.

[1] https://linkedin.github.io/rest.li/

foobarian · on Dec 15, 2023

Oof, I'm twitching just reading that because we're in exactly the same boat. The problem with the ROI is that any kind of not-self-run cloud is guaranteed to be more expensive in direct costs. This has been shown time and time again for any reasonably large enterprise. However, there is a long list of things that are hard to express in money that support a cloud move, mostly to do with keeping up with modern tech, hiring, DR, better resiliency, etc. and so the decision can be quite dependent on the particular execs in the chain of command and their subjective values.

devjab · on Dec 16, 2023

A lot of organisations were already doing “cloud” before the big migrations to AWS and Azure started in the EU enterprise scene. Around the 2010 it became a much better business case to get into the local cloud, which is basically where we bought our own hardware and put it in rented rack space at local businesses which were specialised in running it, instead of having it in our own basements. This mostly didn’t scale beyond the national, or even regional, level but then that is still not a big issue for most organisations today.

Then came the move to AWS and Azure, again because the business case made sense. A big part of this was Microsoft being good at selling their products in a bundle, so that getting into Azure sort of just made sense because of how well the IT-operations side benefitted from moving into things like Azure AD (now Entra ID for whatever reason), Azure Automation instead of scheduled tasks and so on. If your IT-operations infrastructure is already largely in Azure then your development will also be in Azure because it’s going to be such a tiny blip on your Azure bill that nobody will ever notice.

With the price hikes, and also just how complicated it is to actually run and maintain IT-Operations in Azure (I can’t speak for AWS) we’re now in a world where more and more organisations here in Denmark are looking to get out of Azure/AWS to reduce cost, and many have already made the move.

Of course no one is leaving Office365, so many of us are mostly just trying to broker better deals with Microsoft.

jwestbury · on Dec 18, 2023

> With the price hikes, and also just how complicated it is to actually run and maintain IT-Operations in Azure (I can’t speak for AWS) we’re now in a world where more and more organisations here in Denmark are looking to get out of Azure/AWS to reduce cost, and many have already made the move.

It's an endless loop, I think. I've observed a very different behaviour in other parts of the world, where companies which either migrated away from the cloud already, or companies which never moved to the cloud to begin with, are now moving things into the cloud and other managed services in order to better focus on their core competencies.

Dropbox, for instance, shifted everything out of AWS in 2014 -- but are now moving a lot of their corporate infra back into AWS in order to focus on effective and efficient storage, which is what they're good at, without having to build everything else themselves. Likewise hedge funds - if you look at job postings for a lot of systematic trading firms (e.g. Jane Street, HRT, 2Sigma, etc.), you'll find loads of them are hiring extensively for people with cloud experience, because there's been a sort of wake-up moment where companies are breaking free of the idea that everyone can be Google, and realising that you can't build everything in-house if you only have 1000 employees.

So, interesting to hear of a move in the opposite direction in Denmark. I'm sure that ten years down the line, a new generation of engineers will be coming in and asking, "Why in the world are we managing all of this infrastructure on our own?"

Hikikomori · on Dec 16, 2023

Is colocation the cloud now?

xmcqdpt2 · on Dec 16, 2023

At work, there is a push to move on prem applications to cloud but as much as possible without changing any of the code. We end up basically renting expensive instances for significant period of time and recreating all the on prem services on them.

So anyway, more like the reverse: cloud is colocation but expensive.

mcny · on Dec 16, 2023

I'd argue taking on prem applications and putting them in large (expensive) virtual machines on AWS ec2 or azure windows vm directly while postponing or deprioritizing any refactoring is not the cloud either.

dspillett · on Dec 16, 2023

Yes. And no. Depending on what you mean by "cloud" how much kit you have in colo, and how you manage the resources of that kit.

You can certainly use colo to provide the facilities most need from cloud infrastructure.

Just stuffing a server in someone's rack isn't cloud, but larger installations are often managed in similar ways.

Cupprum · on Dec 16, 2023

I am curious, which companies in Denmark are trying to get out of cloud?

VadimPR · on Dec 16, 2023

Basecamp/HEY is one publicly famous example.

mobilio · on Dec 16, 2023

Another example is Ahrefs

sargun · on Dec 15, 2023

This is based on the assumption that Azure has modern tech, hires well, DR, and better resiliency than LinkedIn's "cloud" for LinkedIn's needs. There's a bit of a problem around incentives here, where Azure is built to sell to Azure's customer base, whereas LinkedIn has evolved their own stack over the years.

The questions become:

1. Does it make sense to dump our special features in the stack, or move them to a higher level in the stack? 2. Does Azure have comparable capabilities for the LinkedIn stack? 3. Is LinkedIn worth it to Azure to sell to?

---

Often times, "at scale", you can support custom solutions outside of cloud providers that are purpose-built, and often times more resilient and efficient than the cloud providers.

AWS has taken a very interesting approach of building an incredibly wide set of solutions to support every customer under the sun, and their approach to being "customer obsessed" leads to them building super niche solutions if the deal is worth it.

I'm not sure how Google and Azure handle these engagements.

alphager · on Dec 16, 2023

Google doesn't acknowledge that the customer exists or has wishes.

Microsoft will allow you access to the PM of the product if you are big enough, but his answer will be a not-exactly-committing "in 3 years" because their roadmap is full.

vasco · on Dec 16, 2023

Every once in a while Google acknowledges the customer exists, but usually for a brief moment, either to close their account, or to tell them they are using the product wrong. They promptly go back to work after such brief outings, so as to not fall into the mistake of actually listening.

everforward · on Dec 17, 2023

> 3. Is LinkedIn worth it to Azure to sell to?

Microsoft owns both, so that question gets convoluted. It's not really about whether LinkedIn pays Azure enough to care.

kjellsbells · on Dec 16, 2023

Telco is a really good example of how hard it can be to move to the cloud no matter how hard you want to. I imagine financial services are similar. Telcos built their own clouds because there was no other way to get the primitives that they needed (classic case: a floating IP that could appear in data centers on different sides of the country and be ready to serve clients in the middle of a call in under 100ms without dropping the audio). I mean, cloud just was not built for that.

The flip side is that it is freakishly expensive to keep doing that, and telco is not a business that is freakishly profitable.

I'm watching efforts like Azure Operator Nexus (https://learn.microsoft.com/en-us/azure/operator-nexus/overv...) which looks to be taking a stab at this, but even there, its more like OpenShift on specific hardware with some Azure dust, rather than a remaking of Azure primitives like redundancy for the telco universe.

Sylamore · on Dec 16, 2023

Microsoft purchased tech and people from AT&T to help boost Azure's telco capabilities as a cloud solution[0].

I haven't followed what's happened since it was spun out, but I know that Network Cloud was heavily OpenStack based so I'd would be interesting to see how that may have changed but I no longer work in the Telco space either.

[0]: https://azure.microsoft.com/en-us/blog/improving-the-cloud-f...

grepfru_it · on Dec 16, 2023

Telco used to be all for openstack. They have all started the shift to containers and kubernetes

otabdeveloper4 · on Dec 16, 2023

> direct costs

> vs ""modern tech"" FOMO

Very tough decision, I'll need two MBA degrees to figure this out.

TuringNYC · on Dec 16, 2023

1000% Agree. So many ROI calculations I've seen in the past ignore key-person risks from internal systems, hiring premiums needed to keep specialized knowledge, and also carve-away infra human costs into other departments. Once you carve out enough stuff from the ROI, you can sway the ROI calculation heavily.

The folks who are too senior arent sufficiently technical to understand. The folks who understand benefit from the key-person risk and premiums paid, and thus are disincentivized to challenge the ROI comparison.

foobarian · on Dec 16, 2023

> and also carve-away infra human costs into other departments

That's another great point; with the on-prem system where we run our own massive k8s/vm/data lake cloud, every team just throws things at it willy nilly and even though we know how to calculate total operating cost it's very difficult to set up backpressure to individual orgs and their budgets so there is quite a bit of waste I feel. With something like AWS the billing system is very detailed and is easy to attach to teams upfront. I'm not sure if it's good or bad that things are more efficient due to teams being incentivized to pay attention, but I kinda like it.

brightball · on Dec 16, 2023

Key person risk should be understandable as redundancy to just about any exec…at least I would hope so.

ljm · on Dec 15, 2023

It's not really 'the cloud' as much as it's a managed mainframe you allocate resources from. Only it's actually quite expensive to allocate resources but it becomes more palatable with a monthly bill compared to setting up on-prem.

Costs more money but easier on the cash flow.

dfxm12 · on Dec 15, 2023

Yeah, based on my own experience with AWS and Azure (that has nothing to do with Linked In), my immediate reaction to the headline was, "well, you can be keen on Azure, but "stuck" on AWS for a myriad of other reasons". Reading the article pretty confirmed it.

grepfru_it · on Dec 16, 2023

this describes github to a tee.

think very hard before you start building your cloud-native application on a specific cloud

whoknew1122 · on Dec 16, 2023

I'm not sure how this has anything to do with AWS. But I guess you can use it to confirm your idea about vendor lock-in if you're bound and determined to convince yourself you're right. For what it's worth, working at AWS, I see a lot of companies who use Entra for AD and then federate into AWS using that.

ajcp · on Dec 16, 2023

I think you misunderstood OPs anecdote and took it a little personally?

They were simply saying that much as you can love one platform there are many valid reasons why your deployment on another makes it hard to shift that and recapture that configuration. Nowhere did they intimate it was a fault of the current platforms capabilities...

junto · on Dec 16, 2023

From that repo:

> At LinkedIn, we are focusing our efforts on advanced automation to enable a seamless, LinkedIn-wide migration from Rest.li to gRPC. gRPC will offer better performance, support for more programming languages, streaming, and a robust open source community. There is no active development at LinkedIn on new features for Rest.li. The repository will also be deprecated soon once we have migrated services to use gRPC. Refer to this blog for more details on why we are moving to gRPC.

DeathArrow · on Dec 16, 2023

If anything, this highlights the dangers of building your app around the details of a particular clod infrastructure.

PeterStuer · on Dec 16, 2023

I guess in this case it was the other way around: building the cloud architecture specific to the needs of the app.

alephnan · on Dec 16, 2023

It’s not a good cloud if they built the wrong abstractions that aren’t generic

hasty_pudding · on Dec 16, 2023

> I personally appreciate LI management's ability to be clear-eyed about whether the ROI was there.

Kind of confusing statement as they threw away tens of millions of dollars in a failed attempt to migrate.

thenewwazoo · on Dec 16, 2023

The sunk-cost fallacy is a brutal one. Better to cut your losses than throw good money after bad.

hasty_pudding · on Dec 16, 2023

Sunk cost doesn't necessarily mean good management.

It actually means the opposite. Instead of management identifying a boondoggle ahead of time they got the company into a mess.

snowwrestler · on Dec 16, 2023

This kind of sentiment always shows up after the fact but the hard part is actually knowing it ahead of time.

Emphasis on knowing. I’m sure some people thought it was a bad idea ahead of time. But there are always people who think any new idea or big change is a bad idea.

The reality is that anything significant has to be tried out, for real, to find out. That’s the premise behind proven useful decision frameworks like agile or MVP. And as the GP says: good management shows up when they read the attempt clearly and resist the sunk cost fallacy.

Bad management sounds like “let’s never change anything” or “we’ve already come this far, we need to finish it no matter what.”

And to a large enterprise, tens of millions of dollars is not an existential amount of money to lose on trying something significant. Big sports teams lose that much on bad player moves all the time. “We can’t ever waste money on trying new things” is a great way to sink deeper and deeper into whatever rut you happen to be in at the time.

hasty_pudding · on Dec 16, 2023

> Emphasis on knowing

Literally your primary job as an executive. Strategic knowledge and wisdom

If wasting tens of million on a bullshit initiative isn't an example of bad management then literally nothing is.

The only place these types of terrible managers can exist are too big to fail companies like LinkedIn and the Government where theres zero repercussions for poor management decisions because money keeps pouring it.

In smaller leaner companies where money is actually important executives actually have to be good at their jobs.

> "We can’t ever waste money on trying new things"

There's a difference between a good strategic decision and starting a "facebook for pets".

Some ideas are terrible at the outset.

And executives whole reason for existence is to make this determination.

Blueshift was an objectively terrible decision and whoever made that decision is objectively bad at their jobs.

lallysingh · on Dec 16, 2023

It was MS buying LI and then wanting them on Azure. There's a huge Azure branding play here that - I'm guessing - drove this work. A distant second was some cost savings in the far future.

The hit to Azure's brand for this is probably significant. Who was looking at a migration to Azure before who's now discouraged?

hasty_pudding · on Dec 16, 2023

> was MS buying LI and then wanting them on Azure.

LinkedIn's been extremely independent of Microsoft and still are just like GitHub and openAI.

This was a 100% internal LinkedIn management decision.

lallysingh · on Dec 17, 2023

Could LI have selected a competitor?

hasty_pudding · on Dec 17, 2023

LI is a monopoly in professional social networking.

there are no competitors

lallysingh · on Dec 18, 2023

I meant a competitor to Azure

hasty_pudding · on Dec 18, 2023

The only other option is their own internal stack.

I cant imagin being owned by MS they would be allowed to use Google cloud.

richbell · on Dec 16, 2023

At least they canceled the project when it became apparent things weren't going to work out as expected.

I've seen massive projects that have been train wrecks since day one pushed through until the bitter end because management can't admit failure.

hasty_pudding · on Dec 16, 2023

That's not good management though.

Recognizing you fucked up and only cost the company tens of millions instead of hundreds of millions is "Not Terrible" management.

prepend · on Dec 16, 2023

It’s not perfect, but it’s good. Or better than spending more.

hasty_pudding · on Dec 16, 2023

Id like to be your financial manager.

magicalhippo · on Dec 16, 2023

We have that going on in Norway right now. Our state hospitals are switching several aging systems to some Epic-based monstrosity, and so far it's been a complete shit-show[1].

A major hospital has been running it for a year with lots of issues, some which has certainly affected patients gravely. It was rolled out despite heavy protests from doctors and nurses who had seen or experienced the new system at the smaller pilot hospitals.

Yet management is pushing on, and just decided the next big hospital should get the new system this spring, despite the many severe issues that are still unsolved.

[1]: https://www.nrk.no/emne/helseplattformen-1.15838445

brightball · on Dec 16, 2023

Any amount of employee time spent exploring options that aren’t pursued could be framed that way.

I think it’s a pretty successful operation when you can make the exploration and come to the conclusion that it’s not a going to work, then pivot.

Sticking to the decision no matter what would be much worse.

hasty_pudding · on Dec 16, 2023

Id like to be your financial manager.

brightball · on Dec 17, 2023

All a matter of perspective. Did you lose $10 mil to avoid losing $1 billion in wasted effort?

Totally worth it. Your entire org also learned in the process.

hasty_pudding · on Dec 17, 2023

Its like buying both Blockbuster and Radio Shack stock while their stock was in freefall then selling right before they declare bankruptcy and being proud at how good of an investor you are for not losing more money?

brightball · on Dec 17, 2023

I feel like you’re reaching really hard to make this work.

Companies, teams and people do research and test projects all the time to determine direction. Making a decision like moving the entire LinkedIn data center is just a larger version of that.

hasty_pudding · on Dec 17, 2023

a management decision lost tens of millions of dollars.

you're saying they're good managers because they didn't lose billions.

going by your "logic" its literally impossible for management to make a mistake because there's always some slippery slope they didn't slide down.

"""we put the company out of business but at least we didnt cause the collapse of matter into a black hole! we deserve some bonuses!!"""

this is how people who work at monopoly tier companies actually think lol

brightball · on Dec 18, 2023

That's actually not what I said at all. What I said was...

> Any amount of employee time spent exploring options that aren’t pursued could be framed that way.

Employee time costs money. If 20 people are in a meeting for 1 hour, that 1 hour meeting costs the equivalent of 1 hour of each of attendee's salary. 20 people attend a meeting making $100 / hour, you just spent $2,000 for that meeting.

All time spent planning, also costs money. Every Agile "spike" to do research for a story costs money. Every architecture document. Every time you have somebody do a couple of prototype projects to test your options before you make a decision. Sometimes you can't easily find out what's involved until you really dive into the work. It's the nature of the beast.

For LinkedIn, purchased by Microsoft who also owns Azure I'm quite certain that during the process they assumed consolidation within Azure would be on the table. If anything it was probably expected to be a huge marketing push for Azure to be able to say that LI ran on Azure so it's likely a management decision that involved a lot of pressure to make it happen.

Being willing to abandon that decision after exploring to see what was involved is a big deal. In many companies you'd have just heard platitudes like "well we have to do it, find a way" perpetually extending the project while taking away from more valuable initiatives.

I used to work for a business telco that was acquired by a residential telco. They were so convinced they could roll this newly acquired company into their existing systems that they ran off all of the staff who built the systems. 2 years later, they finally realized that business telco was a lot more involved but by then it was too late.

I understand the thinking that got them there, but despite a lot of people trying to stop them they proceeded forward until their hand was forced. This LinkedIn decision could have gone much, much worse.

hasty_pudding · on Dec 18, 2023

> being willing to abandon that decision after exploring to see what was involved is a big deal

Yeah but spending several years and many millions of dollars isnt 'exploring'.

I'd say it's a solid fail for the management.

But we can agree to disagree on this. I see you strongly believe that losing tens of millions is good management.

For what it's worth there's many posts nowadays on blind about how the culture of LinkedIn has tanked and I'm assuming management probably believes they're doing a great job in that area too.

brightball · on Dec 18, 2023

You're really big on putting words in my mouth on this.

> I see you strongly believe that losing tens of millions is good management.

Costs are relative to size. There are a lot of people involved, it's going to cost a lot of money. Microsoft's operating expenses are estimated to be just shy of $350 million per day. At that scale, 10s of millions is not as impactful.

Could it have been done better? Of course.

Could it have been much worse? Yep. Significantly.

joshhart · on Dec 15, 2023

This was cancelled over a year ago - which the articles notes and is old news. It was clear the effort would have needed a very significant push that would have required a large halt in product development and management wasn't willing to stomach it due to high growth in 2020/2021. Which made sense. But LinkedIn revenue growth has heavily slowed with the pullback in tech hiring and they had the space to do it and consider it optimization time.

Also as part of Blueshift the plan was to do batch processing first but LinkedIn had a culture belief in colocation of batch compute & storage, which is against the disaggregated storage paradigm we see now. IMO this led to some dragging of feet.

Source: Worked at LinkedIn 12 years, am a director at Databricks now.

ThomasMoll · on Dec 15, 2023

Not only that but the Hadoop team literally had the guy who wrote the original HDFS whitepaper. Moving a service with that much in house expertise first never made sense. I worked on one of the original Azure PoCs for Hadoop, even before Blueshift and it was immediately clear that we operated at a scale that Azure couldn't handle at the time. Our biggest cluster had over 500PB and total we had over an exabyte as of 2021 [1]. It was exorbitantly expensive to run a similar setup on VMs, and at the scale that we had I think it would have taken over 4,000 - 5,000 separate Azure Data Lake namespaces to support one of our R&D clusters. I believe most of this "make the biggest cluster you can" mentality was a hold over from the Yahoo! days.

[1] https://engineering.linkedin.com/blog/2021/the-exabyte-club-...

jabiko · on Dec 15, 2023

Since LinkedIn is using Kafka quite extensively, I'm wondering whether Azures poor support of Apache Kafka is at least partially responsible for this.

Azure is developing its own inhouse message broker that they call EventHubs. And while it is able to speak the Kafka protocol it has some weird limitations and quirks¹. I'm wondering why they are so hellbent on developing their own inferior version of something when they could just use the real deal and host it as SaaS.

¹ No support for KIP-71 (Enable log compaction and deletion to co-exist), limited to 10 topics (Azure lingo: EventHubs) per broker (Azure lingo: EventHubs namespace), mandatory client config parameters that you must not forget to set, even though the documentation just calls it "recommended" (https://learn.microsoft.com/en-us/azure/event-hubs/apache-ka...), some weird protocol level incompatibilities that are nowhere documented, for example a consumer group state transitions to "Dead" after the last consumer left instead of "Empty", and auto scaling that only ever scales up, but never down.

TideAd · on Dec 16, 2023

Lots of Azure stuff is like that. Some big services internally at Microsoft refused to use their Service Fabric product, it couldn't handle the scale & complex deployment needs.

jiggawatts · on Dec 16, 2023

Even at a very small scale I found their Service Fabric platform painful to work with. For comparison, AKS is also quite wonky, but miles ahead of SF.

isbvhodnvemrwvn · on Dec 16, 2023

Their hosted postgres was pitiful until very recently too.

jameshart · on Dec 16, 2023

> Azure is developing its own inhouse message broker that they call EventHubs

This seems to imply Event hubs is some sort of upcoming beta or something. Event Hubs are one of the oldest parts of Azure IaaS - I think it was launched in 2014?

jabiko · on Dec 17, 2023

I didn't want to imply that its brand new or beta. Just that it is still being actively developed. For example the compacted topic support did get into GA in May 2023.

Banditoz · on Dec 16, 2023

You could still run your own Kafka cluster just using their VMs, right? Or is EventHubs cheaper?

PenisBanana · on Dec 16, 2023

well, looks like you know what you're talking about.

karp773 · on Dec 16, 2023

Embrace, extend, and extinguish

https://en.m.wikipedia.org/wiki/Embrace,_extend,_and_extingu...

RoyTyrell · on Dec 15, 2023

My company has invested in moving to Azure except where we need to stay on Google. Apparently MS gave us a package on all of their products if we use Azure and it was enough to sway the execs.

We were then given the directive that everyone at my level would need to get some certifications so we could properly use Azure, assist the architects and more jr devs. It’s a good idea but my god the training is so poorly executed. I want to like Azure but it also seems like an uncoordinated mess.

Maybe I’m just a grumpy dev. Anyone else have a better and more positive perspective? Who has good training for certs such as the Data Engineer or AI Engineer?

wodenokoto · on Dec 16, 2023

> Apparently MS gave us a package on all of their products if we use Azure and it was enough to sway the execs.

For a long time this is why Azure has been beating GCP. GCP has been adement (not sure if still are) that prices are “what you see is what you get) and hope that small projects will suck you in. Microsoft however would wine and dine execs and then offer 50% discount on a xx million commitment.

I agree on training and the like. Azure feels like such a mess.

aryonoco · on Dec 16, 2023

Can confirm that even at I work, a state government department in Australia, which would be considered a medium size enterprise in most aspects, our standard price is a discount of just over 50% off the published price. When I want to calculate the price of a solution, I use their calculator, and then basically divide it in half.

I assume if you're bigger, you probably get an even larger discount.

ajcp · on Dec 16, 2023

> GCP has been adamant (not sure if still are) that prices are “what you see is what you get.

Yes and no. GCP is going after the non-cloud products instead of the big cloud providers now. So going after Snowflake customers with BigQuery.

wodenokoto · on Dec 16, 2023

I'm pretty sure its snowflake that went after BigQuery.

whoknowsidont · on Dec 16, 2023

>I want to like Azure but it also seems like an uncoordinated mess.

It's literally a "me too" cloud.

Azure is notorious for not being "finessed" out and god speed if you have to use AKS.

jtwaleson · on Dec 16, 2023

What are your main problems with AKS? I’m setting up a new greenfield project and have to use it. Would be nice to avoid done pitfalls.

aryonoco · on Dec 16, 2023

I second this. Seriously, don't use AKS. It looks so nice in the docs. In practice, you'll run into so many issues, I'm convinced no one actually uses AKS in production for anything major.

If your apps are simple enough that you can get away with Azure Container Instances/Apps, use them.

If you need proper managed k8, use OpenShift. Microsoft has a partnership with RedHat and RH provides full support.

ahoka · on Dec 16, 2023

Some examples of those issues would be nice to back up your claim.

whoknowsidont · on Dec 16, 2023

https://movingfulcrum.com/horrors-of-using-azure-kubernetes-...

jtwaleson · on Dec 17, 2023

Alright that’s pretty bad but also 5 years ago.

whoknowsidont · on Dec 18, 2023

Literally 4 weeks ago we had to migrated over to another host because we couldn't create another node pool.

I don't think the age of the article matters.

jtwaleson · on Dec 18, 2023

Interesting, thanks!

Tknl · on Dec 16, 2023

At my employer teams are preparing to extensively use hybrid AKS. I'd really appreciate hearing any of the issues encountered.

hunterjrj · on Dec 17, 2023

Microsoft uses AKS for their production workloads, but only for “minor” projects like “Office 365” which only has a “few” (200 million) users.

https://customers.microsoft.com/en-us/story/1536483517282553...

whoknowsidont · on Dec 18, 2023

What on God's green earth makes you think MS is going to release their dirty laundry to the public about one of their products using another one of their own products?

Like there's a litany of literature of how AKS completely dropped the ball, you don't have to take anyone's word on it from this thread. The difference between AKS and other offerings is _striking_.

whoknowsidont · on Dec 16, 2023

They come in different shapes and sizes. Stability of the host environment is the main thing. Don't have to take my word for it: https://movingfulcrum.com/horrors-of-using-azure-kubernetes-...

ajcp · on Dec 16, 2023

Architect here: I work in Azure; I love Azure; I totes feel your pain about Azure. Half of my solutions are hopscotching networks and NICs between services that should all just be one service. There are no obvious or repeatable patterns or configurations; every answer is "the right answer", which as an architect drives me up the f*king wall.

I know MSFT likes to sell that as "complete and flexible configuration" but more often than not it's like: if your prescription to use this one service is so rigid as to use these four other services in such a way then when not just package them completely as such?

ahoka · on Dec 16, 2023

When I was last working with Azure it was impossible to just do one training every half year or so, because you needed three completed courses to complete the certificate, but they changed them every year, so you would need to take three different ones next year. Nonsense.

carlivar · on Dec 16, 2023

We are working to migrate from Azure to on-prem because it's so bad. It is an uncoordinated mess indeed.

duxup · on Dec 15, 2023

Any move to any cloud is going to depend on the environment you're coming from. I've been in on decisions not to use X, Y, Z ... doesn't mean there was anything wrong with them, we just weren't ready for that yet or had different priorities or the ever present weird deal-breaker issue / requirement.

konschubert · on Dec 15, 2023

You have to

1) Be comfortable routing traffic between on-prem and cloud over the internet or at least over a VPN, and

2) avoid the temptation to build your own platform (Terraform templates are a liability, not an asset!) and

3) Move tiny stuff first and bigger stuff later.

It’s amazing how many companies fail at that.

bobthepanda · on Dec 15, 2023

Doing things the right way and learning from before usually requires a certain degree of humility, and more often than not the person leading these projects is either required to be, or is deluding themselves to be a hotshot who can succeed in a bold new way.

konschubert · on Dec 15, 2023

You mean somebody who posts 3 bullet points on the internet and claims that they solve everything ? ;)

bobthepanda · on Dec 15, 2023

I mean, we invented the term Promotion Oriented Architecture for a reason.

It’s like politics; the best person for the job is a sufficiently experienced person who does not want it.

satvikpendem · on Dec 16, 2023

Resume driven development is another similar paradigm, where they don't wait to be promoted but instead skip around companies.

brodouevencode · on Dec 15, 2023

Exactly. The cost to retool for the cloud is not insignificant.

JOnAgain · on Dec 16, 2023

This is why AWS is better. They dogfood when it is hard.

I remember when S3 and EC2 were coming out. They tried to make us working on the retail website move year after year. Our excuses became their roadmap. We (developers working on not AWS) really didn’t want to move to AWS. It was 100% worse than what we had before.

“Network is slow”, “it’s not PCI compliant”, “the drive is unreliable”, “the load balancer is missing a feature”.

It took years and years. But they did it. Google and Microsoft don’t have the willpower to force this, and its why they will always be behind AWS.

When Google tries to sell you Google Cloud, remember they don’t use it for anything internally. They don’t think it’s good enough. So why should I?

seper8 · on Dec 16, 2023

... You think Microsoft doesn't dogfood?

Where do you think Teams, Bing, OpenAI, the billion internal systems they use and Xbox are hosted

k12sosse · on Dec 16, 2023

A lot of Microsoft products have had years-long complaints finally resolved in the post-Azure world.

It seems while some of the features implemented pre-Azure were supported, but not fully working as intended because they were sets of features that weren't necessarily in use by Microsoft themselves. But they became an issue as Microsoft was now interested in it working properly after they started using that capability, or parts of the Azure infrastructure depended on it, or some big money customers "'how about'ed" these features. I'm talking minor but important aspects of larger products, or Interoperability.

Windows Server had a lot of life breathed into it post Azure. I'm too lazy to find specific incidents, or recall the past annoyances as a sysadmin of years past (PTSD?), but they absolutely dogfood and we are all better off about it. What they can never dogfood however is the pricing.

I'm k-12 education sector (almost 200,000 staff and students to our district) and we get what I would imagine one of the sweetest of discounts, and if we were to move all of our operations to Azure, our TCO actually increases to maintain the same level of service and availability.

That being said, the performance hits aren't that terrible, but it's the little things that add up quickly at scale.

Of course, as public funded k-12, it's sometimes a decision that gets made when having to decide on buying new things for the data centre (capital cost) vs. subscribing it to Azure (operational cost). The money for both piles comes from the same vault but there destination pile is loaded with implicit reasoning excuses. For example if we spend $200,000 a year on a service as a subscription, it's easier to get money for that versus requesting $400,000 on something that would last us 3 or 4 years.

It takes great leadership at the IT level to liaison this impact to the business movers and shakers, and sometimes that's not there.

alternatex · on Dec 16, 2023

Yeah reading that comment felt like the twilight zone. Microsoft is foremost a software company and they host everything on Azure. Amazon's software services offering is a speck of dust in comparison. Amazon cannot dogfood AWS more than Microsoft dogfoods Azure.

carlivar · on Dec 16, 2023

Well, to take it full circle, they don't dogfood LinkedIn on Azure. How about GitHub for that matter? Wouldn't that indicate Amazon can dogfood more?

prepend · on Dec 16, 2023

I think those are different because they are acquisitions.

It doesn’t mean they will never use Azure, just that they’re being rational about what to use.

It would be very different if they designed GitHub today and didn’t use Azure.

k12sosse · on Dec 16, 2023

Add to that, future product offerings determined by real world use cases based on the reasons why they don't or can't do it today.

prepend · on Dec 16, 2023

I can’t imagine they use Teams internally.

inglor · on Dec 16, 2023

Microsoft mostly gives different groups autonomy in what tools to use, where to host etc.

Most groups in Microsoft use the Microsoft stack as it's best supported and already compliant and works pretty well - so most groups use teams, office, Azure etc.

Groups are free to use other tech and may for example use zoom+slack (some groups do) or GSuite. This is also true for technology (e.g. a team can use yarn and flow instead of npm and TypeScript). It's just very rare they do.

pixel16 · on Dec 16, 2023

This is not true. Microsoft uses teams heavily internally. Zoom slack and others banned unless it's a customer requirement

Source: msft employee

k12sosse · on Dec 16, 2023

I engage frequently with the Canada Education team and office 365 national reps and they are wonderful and the way Microsoft uses their own stuff to interact with their customers is a true blessing. Teach by showing instead of telling.

hyperliner · on Dec 16, 2023

I think the point is that the scale of requirement intake that came into AWS from the website is orders of magnitude more than whatever level of dogfooding that Azure might undergo from internal solutions. Not that they don’t dogfood at all, only that the massive amount of engineers moving one of the most complex websites in world’s history into AWS, and AWS taking them in as real input into the product, made AWS a lot better than the old AWS.

23km2oij · on Dec 16, 2023

AWS is extremely overengineered for startups. In order to create a simple SaaS on AWS you have to invest much more time comparing to Azure. So, I only use their S3/Route53/Email services and do the rest on Azure.

bob1029 · on Dec 16, 2023

We started on AWS in ~2014 but it got too complicated for us to tolerate. My latest AWS complexity trigger was trying to set up a public S3 bucket. It's almost like they want you to screw it up on purpose. We were mostly working with .NET/Windows Server so we looked at alternatives sometime around 2020.

Our stack today has us using AWS for domain registration & S3. We use Azure for everything else. We actually log into AWS by going to the Microsoft MyApps portal and authenticating via our AAD/Entra credentials. Microsoft's docs regarding how to set up SCIM/SAML auth to AWS are excellent [0].

In Azure, we use ~5 products: AAD/Entra ID, DNS, Azure Functions, SQL Server, Azure Blob Storage. That's it. There isn't really any VM/container presence in our go-forward infra. Everything is managed/'serverless' now. There are some VMs but they are supporting legacy systems that will eventually be turned off. We have ZERO custom networking. I couldn't explain how to manage an azure vnet to save my life. We don't need VPN tech anymore either.

Github Actions->Azure Functions is pretty much the end game CI/CD experience for me. I am not rocking this boat. I never want to try anything else. I've spend a decade of my life keeping some flavor of Jenkins alive.

Could we do all this "cheaper"? Sure. But at what cost? My mental state is something that is a non-zero factor in all of this. Keeping track of hundreds of moving pieces in production is simply overwhelming. It's unnecessary suffering. I need something I can explain in 20 minutes to a new hire if we want to remain competitive.

[0]: https://learn.microsoft.com/en-us/azure/architecture/referen...

Hamuko · on Dec 16, 2023

Technically you could start off with just Lightsail if you want to start off as simple as possible.

paulryanrogers · on Dec 16, 2023

But then you cannot update without tearing down and rebuilding. (Since it's based on Bitnami.) Unless there is something I'm missing.

Thorrez · on Dec 16, 2023

>remember they don’t use it for anything internally

Google does use Google Cloud internally for some things. Source: I work for Google.

ametrau · on Dec 16, 2023

For anything critical?

luma · on Dec 16, 2023

You see a story how a company manages to build out their internal infrastructure such that there is no appreciable benefit to migrating the whole thing to the cloud, and your takeaway is that "AWS is better"?

I feel like you might not have read the article.

datavirtue · on Dec 16, 2023

Microsoft is long game. Azure will be here long after AWS. Microsoft is like the Borg. Embrace, extend, extinguish. Resistance? Futile.

hasty_pudding · on Dec 15, 2023

As someone who worked at LI.

They spent years and god knows how many millions TRYING to move to Azure with the Blueshift project..before pulling the plug. They hired armies of contractors.

They didn't stop by choice.

They stopped because their tech stack is a giant over engineered unmovable turd.

bbkane · on Dec 15, 2023

As a current employee, there's things I don't like, but the infrastructure is more custom than bad (far better than my last job)

vasco · on Dec 16, 2023

Custom is usually bad, just takes longer to reveal the problems. Sometimes companies need to create custom things, the mistake is continuing to invest in them when a community project appears and doubling down for years until nobody in your org knows how it's done "normally" and nobody that you hire knows anything about your stack.

hasty_pudding · on Dec 16, 2023

The entire company has no Q.A team.

The amount of bugs I troubleshot in that tech stack was staggering just for basic day-to-day stuff.

promotion based engineering means an engineer shits something out that looks cool in a demo then bails to the next cool demo project while the rest of us are stuck with the reality of the turd.

bushbaba · on Dec 16, 2023

^ it’s not an azure issue but a LinkedIn issue.

carlivar · on Dec 16, 2023

Didn't LinkedIn create Kafka? Was that some of the overengineering?

hasty_pudding · on Dec 17, 2023

Kafka was made 15 years ago.

green-eclipse · on Dec 15, 2023

Microsoft bought Hotmail back in 1997. Hotmail was powered by Unix servers until 2004, despite MS's best efforts to transition to their own Wintel-powered backend [0]. These things take time.

[0] https://news.softpedia.com/news/Windows-Live-Hotmail-Was-Pow...

mfer · on Dec 15, 2023

This doesn't appear to be about Microsoft's cloud but rather Public Cloud.

The whole migration of LinkedIn from their own data centers to the public cloud (Microsofts) isn't going well.

It appears they are still going to operate on-premise for many things. Some things moving or have moved to the public cloud.

Isn't this more a shot at the public cloud for all the things than to any specific one?

carimura · on Dec 15, 2023

Yes I came away with the same thing. It's The Register's modus operandi to use cheeky clickbait titles.

that_guy_iain · on Dec 15, 2023

I don’t see anything that points to it being a general public cloud issue. And instead they talk about Azure software specifically as something that they couldn’t take advantage of, no?

mlhpdx · on Dec 15, 2023

I would not assume that it is a specific Azure problem from that statement. Many, many teams struggle to take advantage of cloud infrastructure because of habits and knowledge retained for operating the existing systems.

It’s possible given what they have, I t’s simply best to keep it on premise - at least to some degree. That would likely not be true with a successful re-architecture, but not everyone is up for that.

mfer · on Dec 15, 2023

It may not be about the teams. For example, when you control the data center you can do certain things around performance and scale you can't do in a public cloud.

There are so many unknowns about how things are setup that it's hard to know.

lumost · on Dec 15, 2023

It's incredibly difficult for a mature software business to justify infrastructure and tooling investments. This is why we think that startups are a haven for modern tooling and the largest legacy firms are ... well ... difficult.

The last 15 years possibly broke this rule by virtue of low interest rates, enabling the justification of large internal teams focused on modernization efforts which sometimes went as far as moving the state of computing forward.

I wouldn't be surprised to see legacy enterprises return to form now that interest rates are 7%

cynicalsecurity · on Dec 16, 2023

On premises hardware can hardly be called legacy. Clouds are way too expensive for startups; they are a milking machine for the big corpo.

lumost · on Dec 16, 2023

Startups are typically capex constrained, at least until series C. Clouds are favorable to capex constraints.

Having personally tried to be cheaper than the cloud in 2013 through large hardware buys, negotiated contracts etc. I found that the roi relative to a progressively discounting cloud weren’t there.

oooyay · on Dec 15, 2023

Maybe I'm a bit contrarian on this one but once I saw data center, Azure, and the phrase "lift and shift" it filled in a lot of context for me. I spent a lot of my early to mid career participating in these strategies. They don't work. VM images almost always are different in some way, there's something one vendor provides that another doesn't - in general there's enough minute details that add up to make a series of mini-mountains in terms of blockers.

jeffbee · on Dec 15, 2023

Yep, there are always differences. Just one thing I stumbled into recently was one of our program images that has long worked fine in AWS can't start in Azure because something their hypervisor does to the virtual address layout conflicts with the way that we remap .text to a huge page. It is both trivia and a showstopper.

pphysch · on Dec 15, 2023

Yeah, there is a vast gulf between "it works for us" and "every dependency was implemented strictly according to open standards and is therefore seamlessly portable". See also the joke of migrating between "SQL" databases.

helsinkiandrew · on Dec 15, 2023

4 years ago: "LinkedIn is moving to Microsoft’s Azure public cloud three years after $27 billion acquisition"

https://www.cnbc.com/2019/07/23/linkedin-is-moving-to-micros...

femiagbabiaka · on Dec 15, 2023

Most cloud migration projects at large companies fail. It usually takes 3 or 4 tries at least before all the necessary lessons are learned.

bob1029 · on Dec 16, 2023

Our customers are really good at making a total shitshow of cloud and our jobs very easy as a result. Much sentiment over cloud looks to me like stick-in-bike-wheels meme. It's hard because you are making it hard.

Strategically, I don't think you try to help someone with cloud until they've burned out their ego trying some grand delusion. Even one bad actor can throw the whole thing off. Cloud is about doing more with less, so committee bullshit is cursed.

Once you've got your customer completely humbled by reality, they be willing to listen to reason and you will save so much frustration.

We've got a huge one in the pipeline right now. They've been trying to "go to the cloud" for about 5 years now and executive management is ready to reset the entire tech org chart over the lack of progress.

Cloud native is the solution but many technologists perceive it as the end of their careers. Anyone pitching a "cloud native" solution that still has container spam managed by the tenant owner is either incompetent or trying to protect their career at this point.

femiagbabiaka · on Dec 18, 2023

Sad but very true. When careers have been crafted on the current architecture of a company, it's hard to shake.

risfriend · on Dec 16, 2023

My company migrated to Azure few years back. They can give good bulk discounts but on the flip side the experience with some of their infra and AKS has been choppy at times, with the support team taking time to fix or rca h/w issues. They do come back though. Would love to know how does it compare to long term experiences with other major cloud providers.

bob1029 · on Dec 16, 2023

I've been with all 3 major and Azure is the only one that even feels like it has support. Even so, unless you are a big customer on some special agreement, you aren't going to get any red carpet treatment. We pay for the $100/m support plan right now and it's pretty goddamn mediocre. Maybe submitting tickets for "legacy" AD domain services outages doesn't touch the rockstar support team these days. Support quality is probably variable across product team at some level.

lgkk · on Dec 15, 2023

How do you move that much data over to another cloud provider?

Without losing data or disrupting the customer?

Or do the databases just stay in the data center and not migrate.

tqi · on Dec 16, 2023

Roughly, pick a date and start writing all new data to both, while running etls to backfill data from old to new. Once that is done, you use a feature flag to do a small % of reads from the new system and wait and pray. If nothing major pops up, you slowly ramp up the % of reads until you're confident (as much as one can be) that everything is working, then you move 100% of reads to the new system. Finally, you turn off writes to the old system, clean up the feature flag and remove any old unneeded code.

jk of course you never do any of the other stuff after turn off writes to the old system you just leave the FF turned on to 100% and never think about it again :-p

twosdai · on Dec 16, 2023

I did a migration like this before intercloud between es clusters. Just wanted to additionally confirm that this is broadly the way that I've seen it done before.

If you can afford downtime, or data loss that does make it easier. But that's not an engineering question, more a product one.

ThomasMoll · on Dec 15, 2023

We (when I worked at LinkedIn) did it with ETL clusters, we already had built them out for moving data between datacenters nightly. They would mirror an HDFS cluster, then ran batch jobs to transfer either directly to the outbound cluster or to another ETL cluster in another DC.

We used one of our ETL clusters to ship data to MSFT for various LinkedIn integrations, like seeing LinkedIn profile information in Outlook or Office products.

junto · on Dec 16, 2023

Which tools were you using for ETL? Or were they completely custom?

wharvle · on Dec 15, 2023

Live replicas (perhaps initialized with a cold backup, initially, if the dataset's really huge), carving off parts of it for separate migration if that's at all feasible, and some expensive folks doing a lot of butt-clenching-worthy activity for an hour or two (unless it goes very poorly...) for the final cut-over, some evening.

miguelazo · on Dec 15, 2023

There are plenty of issues with Azure, but LinkedIn is hardly at the vanguard of innovation. And that was still the case before Microsoft vastly overpaid for it.

joshhart · on Dec 15, 2023

I left LinkedIn 1.5 years ago. I was there 12 years. I saw the revenue & profitability growth that occurred post acquisition. I am very very confident LinkedIn would be worth north of $100B on public markets today and Microsoft made the acquisition for $26B. You might argue that in the subsequent 6 years post acquisition that wasn't enough growth and they should have bought back shares instead but it was completely a debt financed acquisition and very high ROI for Microsoft.

manquer · on Dec 16, 2023

Adjusting for inflation 26B from 2016 is worth 32B now. Going by just market returns 26B in S&P 500 in 2016, would be worth 70B today.

Also 26B is just initial investment, MS surely invested more money in the division in last 8 or so years, Linkedin was not exactly highly profitable entity in 2016, while it was not burning a lot of money, growth experienced in last few years would have needed additional investments in the business.

I don't have specific opinion on whether MS overpaid are not, just want to point out even 100B valuation today does not necessarily mean its high RoI for MS yet.

joshhart · on Dec 16, 2023

LinkedIn was already very FCF positive. They tightly managed margins to get to net income positive (account for dilution and so on) but it took maybe 2 years after the acquisition.

manquer · on Dec 16, 2023

Of course, they were and are healthy company, i was just trying to say they while very healthy it was not so profitable that just by using the free cash flow MS could have grown the business to today's size, it likely required external cash infusion and therefore $26 Billion is likely not the only money MS has spent on Linkedin.

As far as M&As go, it is very successful outcome, vast majority of them fail spectacularly, RoI is not perhaps the right metric to judge a strategic acquisition.

miguelazo · on Dec 17, 2023

Financially, perhaps the numbers made sense. It was (and still is) a very basic social networking site with poor UX. Microsoft paid for a user base and a rudimentary surveillance system. It would have probably been better for them to just start from scratch. I don’t know a single person who likes LinkedIn. Most see presence on it as an obligation for certain industries. Most of the commenters on there are far worse than what you see on Facebook. It’s usually just one long stream of retired or underemployed old men complaining about “wokeness” and environmentalists ruining the world.

tjpnz · on Dec 15, 2023

Sounds like they would've faced a similar set of issues moving to AWS or GCP.

mgaunard · on Dec 16, 2023

why would you ever want to move to cloud if you have a functional non-cloud setup?

Cloud platforms charge you much more than what the hardware is worth, the only advantages are that they provide a simplified resource management system (which is mostly an illusion, as any non-trivial system will require you to build in-house tech regardless) and the ability to scale more easily (which is done at a prohibitive cost, so a good reason to switch away if you need to scale up).

ksec · on Dec 15, 2023

I wonder what sort of scale do LinkedIn operate in terms of Server count.

And Github also under Microsoft seems to be doing fine with on-prem as well. Why force LinkedIN to use Azure?

rdoherty · on Dec 15, 2023

When I was there it was in the low hundreds of thousands. Probably more as growth was still in double digit percentages per year of user base.

ksec · on Dec 15, 2023

>When I was there it was in the low hundreds of thousands.

Blows my mind every time I see these kind of numbers.

fbdab103 · on Dec 16, 2023

Indeed. According to this in 2022[0], Stack Overflow is still running off of 9 web servers.

[0] https://www.datacenterdynamics.com/en/news/stack-overflow-st...

manquer · on Dec 16, 2023

Social media org like Linkedin is hardly comparable with SO style KB the R/W workloads are much higher with Social media, than a KB where vast majority of the transactions can be cached, Wikipedia would be a better comparison to SO.

Also Linkedin is 100x the size in users : 1B vs just around ~15M for StackOverflow.

ksec · on Dec 16, 2023

There are 1B user on LinkedIn ? WOAH !

All of a sudden the scale makes much more sense although I will have to fact check this because the number seems absurdly high for a professional platform.

manquer · on Dec 16, 2023

Here you go, they hit 1B milestone last month

https://news.yahoo.com/linkedin-now-1b-users-turns-125845570...

ksec · on Dec 17, 2023

Thanks. Considering Facebook only has 3B user worldwide. I say a social media platform with 1B user is extremely impressive.

manquer · on Dec 18, 2023

DAU/MAU is the more accurate metric to track by. Not as many people would log in to linkedin everyday also people create a profile and never log in again or only login when they are looking actively for a new job.

Facebook itself has been having significant loss of engagement in recent years. Instagram and WhatsApp are far more likely to have higher DAU than FB main app.

Too · on Dec 16, 2023

Quick math: 10000 users per machine?

Not considering that 90% of all users are likely sleepy accounts.

Too · on Dec 16, 2023

Rumor has it, half of them are busy sending out all those millions of unread emails I have in my inbox, about how my “career is on a roll”.

bushbaba · on Dec 16, 2023

FYI even at their scale the headcount cost is greater man hardware cost.

wredue · on Dec 15, 2023

If I had to guess, there are hordes of businesses out there that maintain operations on prem, and a large lift like this is great for the resume.

Of course, I could also be entirely wrong, but I also am not going to pretend that IT resume padding then jumping ship and leaving a shart of an architecture behind doesn’t happen all the time in this industry.

astockwell · on Dec 15, 2023

Sparkyte · on Dec 16, 2023

Only really benefits a company if they lack the technologies or cloud solution.

Not like they are paying anyone to host something besides themselves.

chucke1992 · on Dec 17, 2023

I guess Azure has run out of hardware. AI requires too much computation power.

FpUser · on Dec 16, 2023

If their stack works fine and I assume it it why the fuck do they need to move it at all?

seanhunter · on Dec 16, 2023

Well they decided they didn't need to. That being said, you can't necessarily wait until things no longer work fine before deciding to move. If you can reasonably anticipate a problem in the future, sometimes even if things work fine today you might be best off moving in order to avoid future pain.

FpUser · on Dec 16, 2023

>"you can't necessarily wait until things no longer work fine before deciding to move"

You mean cloud is the holy grail where everything works and everything else is a failure waiting to happen? I claim BS.

seanhunter · on Dec 16, 2023

No I don’t mean that at all, which is why I didn’t say that.

What I mean is that in planning anything in tech you can’t always wait until problems arise before you take action to deal with them, particularly if the consequences of failure are extreme. The specifics of what is the right setup are going to depend heavily on the circumstances of the use case. Definitely being on cloud isn’t right for everyone, nor is trying to maintain on-prem hardware the right choice for everyone.

nyokodo · on Dec 16, 2023

> why … do they need to move it at all?

Because they’re owned by Microsoft and it looks bad if Microsoft doesn’t dogfood its own cloud infrastructure. Plus it’d be more economical long term if they didn’t invest in a parallel cloud infrastructure. They will migrate eventually, just apparently it’s not ready for them yet.

FpUser · on Dec 16, 2023

>"it looks bad if Microsoft doesn’t dogfood its own cloud infrastructure"

Their cloud infra is already mostly Linux. Bought company which already works fine not gonna add insult to the injury.

>"Plus it’d be more economical long term if they didn’t invest in a parallel cloud infrastructure"

This is big if. It may or it may not. Unless you are deep into their internal kitchen and have all the info your suggestion amounts no nothing more than a wild guess.

devaiops9001 · on Dec 15, 2023

LinkedIn doesn't have competent people. Anyone who has peaked behind that curtain sees they struggle with very simple things.

dboreham · on Dec 16, 2023

I don't have deep intel, but all the ex-LI people I've worked with or met seemed pretty competent.

personalityson · on Dec 16, 2023

Isn't LinkedIn owned by Microsoft?

yellow_lead · on Dec 15, 2023

> LinkedIn was having a hard time taking advantage of the cloud provider's software. Sources told CNBC that issues arose when LinkedIn attempted to lift and shift its existing software tools to Azure rather than refactor them to run on the cloud provider's ready made tools.

I think I need this translated back into tech-speak.

wharvle · on Dec 15, 2023

“Lift and shift” is a term for when you move to “the cloud” but really just replace your physical servers with clones in cloud VMs. It’s a relatively cheap (in terms of effort) way to get on “the cloud” but gains you basically zero of the benefits. The term’s in wide use, talk to anyone involved with cloud-anything and they’ll be familiar with it.

I’m not sure what else needs to be translated? Nothing, I think?

mmcgaha · on Dec 15, 2023

Lift and shift is a sales term to make it sound like the internal team is trying to over-complicate the migration. The sales guy will normally phrase it as "just lift and shift."

neilv · on Dec 15, 2023

I love it. I could totally believe the etymology starts as slick sales persuasion trying to downplay the implementation difficulty of something that's being sold.

And then people also pick it up for non-persuasion, because it also sounds like a catchy name for an engineering approach we already had.

Of course it can still be used for persuasion for awhile, but will grow baggage over time, as efforts linked to the term don't play out that way.

neilv · on Dec 15, 2023

Thanks for the explanation, but no need to imply that someone is out of the loop if they didn't know it.

The term didn't sound familiar to me (though the concept was), and the term might not have been familiar to some others.

People might not want to contradict an assertion because of language like "The term’s in wide use, talk to anyone involved with cloud-anything and they’ll be familiar with it. [...] I’m not sure what else needs to be translated? Nothing, I think?"

wharvle · on Dec 15, 2023

> People might not want to contradict an assertion because of language like "The term’s in wide use, talk to anyone involved with cloud-anything and they’ll be familiar with it. [...] I’m not sure what else needs to be translated? Nothing, I think?"

> > LinkedIn was having a hard time taking advantage of the cloud provider's software. Sources told CNBC that issues arose when LinkedIn attempted to lift and shift its existing software tools to Azure rather than refactor them to run on the cloud provider's ready made tools.

The only other terms I can see that are jargon are "cloud provider" and "refactor", and those are already technical (more or less) so don't need to be translated into technical language.

As for the other bit, I just meant that it's a widely-used term so one may continue to encounter it in these contexts. It truly is ubiquitous in discussion of and around "enterprise transformations" to the cloud, and among cloud practitioners more generally, so anyone connected to that space will know what it means. It's also kinda already a technical term, in that developer/devops and SRE sorts throw it around and do mean a specific thing by it, which doesn't need to be translated for other technical folks in that area.

neilv · on Dec 15, 2023

"Ten Thousand" https://xkcd.com/1053/

The original person might've instead asked for an explanation in a way that didn't come across as criticizing the article.

But probably best not to insist that everyone should already know the term; just explain it.

wharvle · on Dec 15, 2023

Yeah, you're probably right. Feedback received.

vergessenmir · on Dec 15, 2023

Lift and shift is a cloud migration strategy which involves moving your applications to the cloud with little to no modification. For example, you have an application running on a server in your data-centre, you then deploy a VM in the cloud with a similar spec and install the application.

It's usually done to avoid the engineering cost of making the services more cloud native. What tends to happen a lot is that after a considerable portion of the migration is completed, the cost of the lift-and-shift effort start to overtake the savings, and the projected costs, dwarf the future savings.

I suspect this is what happened with Linkedin.

Agingcoder · on Dec 15, 2023

which savings ? It’s never been obvious to me that cloud was cheaper if you’re a large company

hibikir · on Dec 15, 2023

It really depends on workloads. Imagine you need massive spikes of compute for, say, flash sales, or people watching the superbowl in your streaming service. Buying all that hardware for just the spikes might not make sense vs just scaling up vms in a cloud provider and scale them down.

In the real world, for baseline load, the big advantage for many large companies isn't price, but the massive lack of alacrity of many inhouse ops teams. If it takes me 3+ months to provision compute for the simplest, lowest demand services (as is the custom in many large companies full of red tape and arguments about who bears costs), letting teams just spin up anything they want and get billed directly is often a winner, even if it's more expensive. Having entire teams waste months before they can put something in prod is a very different kind of expense in itself.

maccard · on Dec 15, 2023

The simplest example is if you have on-prem hardware, you need to have capacity for your peak load. In a lift and shift, you would replace your fleet of 96 core xeons with a fleet of 96 core xeons in AWS.

The cloud native approach would be to modify your app so that it can be scaled up and down so you keep a few machines always running, and scale up and down with your traffic so you only run at capacity when you need it.

jiggawatts · on Dec 16, 2023

The hiccup with this arithmetic is that the big cloud providers charge 7x to 10x the price you’d pay for an on-premises VM.

Sure, sure, you’re about to say something about discounts? Granted, that’s available, but only for commitments starting at one year or longer!

Okay, fine, I actually agree that there are savings available by reducing head count. The entire network and storage teams can be made redundant, for starters. Even considering that DevOps and cloud infra engineers need to be hired at great expense, this can be a net win…

…but isn’t in my experience. Managers are unwilling or unable to make many people redundant at once, so they stick around and find things to do…

… things like reproducing the mess that kept them employed, but in the cloud.

I’m watching this unfold at about a dozen of my large enterprise customers right now.

Got to get back to work and send the fifty seventh email about spinning up a single VM. Got to run that past every team! It’s no rush, it’s only been about fourteen months now…

wredue · on Dec 15, 2023

This doesn’t demonstrate anything about the savings.

Anecdotally, when my previous company was looking at costs, cloud unequivocally came out significantly more expensive, and that wasn’t even a large company (only 2,000 or so employees).

I will grant that we did not have globalization problems to solve (but I’d also wager that lots of businesses prematurely “what if” this scenario anyway).

maccard · on Dec 15, 2023

> This doesn’t demonstrate anything about the savings.

If you neeed 4 CPUs for your peak load for 4 hours per day, and only 1 of them for the other 20 hours a day, you can save by scaling down to 1 cpu for 85% of the day.

vidarh · on Dec 16, 2023

This assumes a lot about the cost of those CPUs and related resources in the respective environments. Cost per equivalently performing unit on a managed server vs. cloud instances is often vastly different.

It's also extremely uncommon to have loads that spiky.

And when you do, hybrid is often a solution (use a provider that can provide colo or managed servers for your base load and cloud instances for your peaks, or scale across providers).

maccard · on Dec 16, 2023

It's pretty hard to capture the nuance of any possible solution in 2 paragraphs without someone coming along and picking it apart. The guy I replied to didn't know even the most basic information.

Even at that, you said yourself that you can use "cloud" to scale into your spikes.

vidarh · on Dec 16, 2023

Yes, but ironically being prepared to handle spikes with cloud tends to make it even less cost effective to do so, because it means you can plan for far higher utilisation of your on prem/colo/managed servers with little risk.

It takes very unusual load patterns for cloud to win on cost. It does occasionally happen, but far less often than developers tend to think.

There many reasons to choose cloud services, but cost is almost never one of them.

maccard · on Dec 17, 2023

> It does occasionally happen, but far less often than developers tend to think.

Guy asked what is a cloud workload, I responded. Nitpicking every tiny detail doesn't help.

> There many reasons to choose cloud services, but cost is almost never one of them.

It's cheaper to pay me to manage IAM roles for lambdas and ECS instances for 5% of my time than it is to pay someone full-time to manage some sort of VMware or other system. It's easier and cheaper to find someone with experience with AWS who can provide value to the team and product than it is to find someone who can manage and maintain a cobbled together set of scripts to update apps. There are click and go options for deploying major self hosted services like grafana, k8s with secure details that I can use without spending any time (and time == $$$) learning about the developers preferred deployment scheme.

vidarh · on Dec 17, 2023

This isn't nitpicking, it's why the cloud option is very rarely cheapest. I've costed this out for many organizations over the years and tested the assumptions.

> It's cheaper to pay me to manage IAM roles for lambdas and ECS instances for 5% of my time than it is to pay someone full-time to manage some sort of VMware or other system.

True, but it's a false equivalence, and one I often see used from people unaware of the ease of contracting this out on a fractional basis.

I used to make a living cleaning up after people who thought cloud was easier, who ended up often spending a fortune untangling years of accumulated cruft that just never happened for my non-cloud customers.

campbel · on Dec 15, 2023

Compute is at a premium, but you can shift opex/capex around which might be more suitable. It can also be cheaper in headcount since you need fewer operators and less expertise in datacenter operations.

adolph · on Dec 15, 2023

> you need fewer operators and less expertise in datacenter operations

Because you are paying someone else for them.

This is considered rational because those operators are presumably more productive in a pool of people using similar skills to support many customers rather than just one. It is similar to hiring a cleaning service rather than employing individual cleaners in a department of cleaning because cleaning things is not a core competency of business.

It might be less irrational if some amount of compute is part of the core competency of the business. Since "software is eating the world," compute is a core competency of all businesses except for the ones that don't realize it yet.