This is actually a pretty big deal. They sell servers, but as a finished product...

ChuckMcM · on Oct 26, 2023

Its a huge deal. I'm biased though because my own takes on how things should evolve were very similar. I was however completely unsuccessful in getting those ideas into production! And that, that is a huge deal. Through out my career it has been interesting to meet people with great ideas and then they are unable to get them into production, and when the idea does come into production everyone feels like "Wow, this is so obvious why didn't we do it sooner?" and some folks are banging their head against the wall :-).

One of the more interesting discussions I had during my tenure at Google was about the "size" of the unit of clusters. If you toured Google you got the whole "millions of cheap replaceable computers" mantra. Sitting in Building 42 was a "rack" which had cheap PC motherboards on "pizza dishes" without all that superfluous sheet metal. Bunches of these in a rack and a festoon of network cables. What are the "first class" elements of these machines? Compute? Networking? Storage? Did you replace components? Or a whole "pizza slice" (which Google called an 'index' at the time). Really a great systems analysis problem.

FWIW I'm more of a "chunk" guy (which is the direction 0xide went) and less of a "cluster" guy (which is the way Google organized their infrastructure). A lot of people associated with 0xide are folks I worked with at Sun in the early days and during that period the first hints of "beowulf" clusters vs "super computers", was memory one thing (UMA) or did it vary from place to place (NUMA). I have a paper I wrote from that time about "compute viscosity" where the effective compute rate (which at the time largely focused on transactional databases) scaled up with resource (more memory more transactions/sec for example) and scaled down with viscosity (higher latencies to get to state meant fewer transactions/sec) Sun was invested heavily in the TPC-C benchmarks at the time but they were just one load pattern one could optimize for.

These guys have capitalized on all that history and it is fucking amazing! I just hope they don't get killed by acquisition[1].

[1] KbA is a technique where people who are invested in the status quo and have resources available use those resources to force the investors in a disruptive technology to sell to them and then they quietly bury the disruptive technology.

mgh95 · on Oct 27, 2023

Can you clarify a bit on what you mean by "chunk" guy? Are you alluding to the ability to distribute work by an isolation mechanism such as cgroups vs machine a-la borg/google?

ChuckMcM · on Oct 27, 2023

More on infrastructure composition software is an abstraction above that.

Is the unit of composition a rack (chunk), a server (smaller chunk), or a blade (smallest chunk)? In what I think of as classic systems architecture you've got a 'store' (storage), 'state' (memory), 'threads' (computation), and 'interconnect' (fabrics). In the 90's a lot of folks focused on fabrics (Cray, Connection Machine, Sun, etc) somewhat on threads (compute blades), and state came along for the ride. How these systems were composed was always a big thing, then along came the first Beowulf clusters that used off the shelf motherboards (a "chunk" of threads/state/store) with a generic fabric (Ethernet). Originally NASA showed that you could do some highly parallel processing on these sorts of systems and Larry and Sergei at Stanford applied it to the process of internet search.

Collectively you have a 'system resource' and with software you can make it look like anything you want. When you do compute with it, its performance becomes a function of its systems balance and the demands of the workload. Its all computer sciencey and yes there is a calculus to it. This isn't something that most people dive into (or are even interested in[1]) but it was one of the things that captured my imagination early on as an engineer. I was consumed with questions like what was the difference between a microprocessor, a mini-computer, a workstation, and a mainframe? Why do they each exist? What does one do that they other can't? Things like that.

[1] At Google I worked in what they called 'Platforms' early on and clearly most of the company didn't really care about the ins and outs of the systems bigtable/gfs/spanner/etc ran on, they just wanted APIs to call. But they also didn't care about utilization or costs. By the time I left some folks had just figured out (and one guy was building his career on) the fact that utilization directly affected operational costs. They still hadn't started thinking about non-uniform rack configurations for different workloads.

TheRealDunkirk · on Oct 27, 2023

> I just hope they don't get killed by acquisition.

Private equity in the US has collectively determined that no company shall exist outside of investment ownership. I don't know what the ownership structure looks like, but generally speaking, it seems that nearly everyone has a "fuck you" number. Now that Oxide is venturing into Dell and HP's turf, I worry someone will get a fix on what Brian's number is.

goldinfra · on Oct 26, 2023

Coupling vs Decoupling is not some one-sided thing. It's a major trade-off.

One of the most obvious examples of the problem with this approach is that they're shipping previous generation servers on Day 1. One can easily buy current generation AMD servers from a number of vendors.

They will also likely charge a significant premium over decoupled vendors that are forced to compete head-to-head for a specific role (server vendor, switch vendor, etc).

Their coupling approach will most likely leave them perpetually behind and more expensive.

But there are advantages too. Their stuff should be simpler to use and require less in-house expertise to operate well.

This is probably a reasonable trade-off for government agencies and the like, but will probably never be ideal for more savvy customers.

And I don't know how truly open source their work is but if it's truly open source, they'll most likely find themselves turned into a software company with an open core model. Other vendors that are already at scale can almost certainly assemble hardware better than they can.

xp84 · on Oct 26, 2023

> will probably never be ideal for more savvy customers

IDK about every use case, but slightly older generations of CPUs would affect me roughly zero. I'm sure there are things so compute-intensive that one would care very much, but a lot of people probably wouldn't bat an eye about that, and not because they're unsavvy.

To the extent that these things are supported as a whole by the vendor rather than a bunch of finger pointing though, that could be massive, specifically in terms of how many staff members you could "not hire" compared to if you had to employ someone to both build and continually maintain it.

I'm posting this not to invalidate what you're saying, just to say that a little predictable upfront amount of money (the premium) will be spent very happily by lots of people who value predictability and TCO over initial price.

SoftTalker · on Oct 27, 2023

Indeed, I'm still using a cluster of Haswell processors to run VMs for appropriate workloads and it's all fine.

goldinfra · on Oct 27, 2023

If you're not rapidly scaling it probably doesn't matter. But if you're still buying (and maybe even using) Haswell CPUs in 2023, you may be missing out in a big way.

A moderately large Haswell cluster is equivalent in power to a moderately powerful modern server.

SoftTalker · on Oct 27, 2023

No not buying new, just using what was bought years ago. It still works, it does the job. Is it the best performance per watt, clearly no but the budget for electricity and the budget for new capital expenses are two different things.

seabrookmx · on Oct 27, 2023

If you go on Google cloud and select an E2 instance type (atleast in `us-central1` where my company runs most of it's infra) you'll usually get Broadwell chips.

geraldhh · on Oct 27, 2023

replacing when and if energy, resources and effort required to run it surpass those required to replace it is efficient

dangoor · on Oct 26, 2023

> They will also likely charge a significant premium over decoupled vendors

It seems like they're trying to hit a middle ground between cloud vendors and fully decoupled server equipment companies.

Using Oxide is likely cheaper over the life of the hardware than using a cloud vendor. A company who already has in-house expertise on running racks of systems may be less the target market here than people who want to do cloud computing but under their own control.

sangnoir · on Oct 26, 2023

> A company who already has in-house expertise on running racks of systems may be less the target market here than people who want to do cloud computing but under their own control.

True, but Oxide may find themselves competing against Dell or HP if they adopt Oxides software for their respective servers. Additionally, Oxide may find itself competing against consultants and vendors in specialized verticals (e.g. core Banking software + Oracle DB + COTS servers + Oxide software). Oxide, and their competitors are going for people who used to buy racks of Sun hardware.

ahhfgsado6698 · on Oct 27, 2023

HP and Dell would have to fundamentally change the way they design hardware and software to be that kind of threat, and if that ever happens I think I would be pretty okay with that outcome.

mlindner · on Oct 26, 2023

> One of the most obvious examples of the problem with this approach is that they're shipping previous generation servers on Day 1. One can easily buy current generation AMD servers from a number of vendors.

> Their coupling approach will most likely leave them perpetually behind

This is a startup that took years to get their initial hardware developed. The time between this version and the version using the next version of AMD chips will be shorter than the time it took to develop this product. This is not an inherent issue with coupling vs decoupling.

Also, most servers are rarely running on the most recent cpus anyway. At least in companies I've worked at with on-site hardware they're usually years (sometimes even a decade) out of date getting the last life sucked out of them before too many internal users start complaining and they get replaced.

goldinfra · on Oct 27, 2023

Coupling requires more integration work, including writing and testing custom firmware. Oxide will be a tiny market player for a long time, even if things go very well. Are AMD and Broadcom really going to spend as much time helping Oxide as they do helping Dell? Of course not, Oxide's order volume will be a rounding error.

I'm sure they'll improve their processes over time but the lag will probably always be a non-zero value. Hopefully they'll be able to keep it low enough that it's not an important factor but as a customer it's certainly something one should consider.

It would be surprising if they don't run into some nasty issue that leaves their customers 6+ months behind on servers or switches at some point.

mlindner · on Oct 27, 2023

From listening to their talks they've actually gotten pretty good direct responses from AMD and AMD likes them quite a bit. They've done what no other system integrator has done and brought up the CPU without using AMD's AGESA firmware bootloader. By simplifying the system they've reduced the workload on what they need to handle.

The talk here talks about that from about 32:15 : https://www.osfc.io/2022/talks/i-have-come-to-bury-the-bios-...

As to your second point, unless AMD somehow becomes supply constrained and only wants to ship to their most important customers first I don't see a future where there would be any lag. Again, the delay this time is from how long it took from company start until product release. Future delays will be based on the time it takes from them getting early development parts to released products, which they could even possibly beat Dell to market on given the smaller company size and IMO more skilled employees.

> It would be surprising if they don't run into some nasty issue that leaves their customers 6+ months behind on servers or switches at some point.

I mean they've already hit tons of nasty issues, for example finding two zero-day vulnerabilities in their chosen security processor. They've shown they can work around issues pretty well.

DrScientist · on Oct 27, 2023

> it would be surprising if they don't run into some nasty issue that leaves their customers 6+ months behind on servers or switches at some point.

I just think your premise is wrong - most customers don't care about not having the absolute latest and greatest. Indeed they will often avoid them because

1. They are new so more likely to have as yet undiscovered issues ( hardware or drivers ).

2. If you buy top end, they sell at a premium well above their performance premium.

ie the customers who are perennially chasing the latest hardware are in the minority.

goldinfra · on Oct 27, 2023

Most customers care about having the best of the available options. Rarely would any company deliberately choose to be behind where their competitors can be.

1. The way to run into undiscovered issues is to choose a completely custom firmware/hardware/software stack that almost no one else in the world is running.

2. Not sure where you're getting this from. There is almost always a price:performance calculation that results in current generation smashing the previous generation with server and switch hardware. Often this means not buying the flagship chips but still the current generation.

And a major reason to get off old generations of hardware is that they become unavailable relatively quickly. It's always easier to buy current generation hardware than previous generation hardware, especially a couple years into the current generation. This has nothing to do with chasing the latest hardware.

DrScientist · on Oct 27, 2023

> And a major reason to get off old generations of hardware is that they become unavailable relatively quickly.

That's not in the customers interests per se- in fact it's a pain. Having control of their own stuff could mean they could offer a much longer effective operational life.

> The way to run into undiscovered issues is to choose a completely custom firmware/hardware/software stack that almost no one else in the world is running.

What breaks stuff is change - sure when they are starting up it's higher risk - but again if they can manage the lifecycle better, not have change for changes sake, then they could be much more reliable.

> Not sure where you're getting this from.

I was talking about not taking the flagship stuff - which is typically a few months ahead of the best price/performance stuff.

lostdog · on Oct 26, 2023

If they standardize and open the server shape and plug interface then it gets really cool. Then I could go design a GPU server myself and add it to their rack. The rack is no longer a hyperconverged single-user proprietary setup and becomes something that can be extended and repurposed.

foobarian · on Oct 27, 2023

"The Framework Cloud"

Upvoter33 · on Oct 26, 2023

I don't see it as a big deal - rather, I see it as a huge amount of venture cap spent on some very bright people to build something no one really wants, or, at best, is niche.

Also, it has little to do with the cloud; it is yet another hyperconverged infra.

Weirdly, it is attached to something very few people want: Solaris. This relates to the people behind it who still can't figure out why Linux won and Solaris didn't.

eduction · on Oct 26, 2023

When you're deploying VMs, which is the use case here, the substrate OS becomes significantly less important. Those VMs will mostly just be linux.

Yes they are using illumos/Solaris to host this but they don't sell on that, they sell on the functionality of this layer — allowing people to deploy to owned infra in a way that is similar to how they'd deploy to AWS or Azure. How much do you ever think about the system hosting your VM on those clouds? You think about your VMs, the API or web interface to deploy and configure, but not the host OS. With Oxide racks the customers are not maintaining the illumos substrate (as long as Oxide is around).

You could be right about demand, there is risk in a venture like this. But presumably the team thought about this - I think folks who worked at Sun, Oracle, Joyent, and Samsung and made SmartOS probably developed a decent sense of market demand, enough to make a convincing case to their funders.

speed_spread · on Oct 27, 2023

I have a feeling they knew exactly from the start who their customers would be: People who have the budget to care about things like trust and observability in a complex system. But these would also be the kind of customers who require absolute secrecy and so this why you don't hear about them even though they might have bankrolled a sizable portion of the operation. Just like the first Cray to officially be shipped was actually serial number 2...

jeffrallen · on Oct 27, 2023

Can you imagine trying to investigate Bryan for a security clearance?

"Sir, if you have nothing to hide, why do you talk so fast, and pronounce words like a foreigner who learned English from a book?"

:)

We love you Bryan, never change.

eduction · on Oct 27, 2023

Oh to be a fly on the wall for the first Oxide<>Oracle partnership discussion…

jeffrallen · on Oct 27, 2023

Bryan: "Steve, I think you should take this meeting with Larry..."

yencabulator · on Oct 26, 2023

> When you're deploying VMs, which is the use case here, the substrate OS becomes significantly less important. Those VMs will mostly just be linux.

Now you need to know both the OS they chose and the OS you chose...

(No, I don't believe it'll be 100% hands-off for the host. This is an early stage product, with a lot of custom parts, their own distributed block storage, hypervisor, and so on.)

tinco · on Oct 26, 2023

This true for other hypervisors too. Enterprises are still paying hundreds of millions to VMware, who knows what's going on in there?

I wouldn't have picked Opensolaris, but it's a lot better than other vendors that are either fully closed source, or thin proprietary wrappers over Linux with spotty coverage and you're not allowed to touch the underlying OS for risk of disrupting the managed product.

sgt · on Oct 26, 2023

What's more important is that the team actually knows Illumos/Solaris inside out. You can work wonders with a less than ideal system. That said, Illumos is of high quality in my opinion.

Always_Anon · on Oct 26, 2023

Seems risky considering how small of a developer pool actively works on illumos/Solaris. The code is most definitely well engineered and correct, but there are huge teams all around the world deploying on huge pools of Linux compute that have contributed back to Linux.

tinco · on Oct 27, 2023

They had a bug in the database they are using that was due to a Go system library not behaving correctly specifically on illumos. They've got enough engineering power to deal with such a thing but damn..

mbakke · on Oct 27, 2023

If everyone had this mindset we would be running our workloads on Microsoft Windows by now.

GNU/Linux was also "risky" at some point.

yencabulator · on Oct 27, 2023

Linux grew up in the bedrooms of teenagers. It was risky in the era of 486 and Pentiums. The environment and business criticality of a $1-2M rack-size computer is quite different.

mbakke · on Oct 27, 2023

I had similar thoughts about VMware (large installations) back in the day. Weird proprietary OS to run other operating systems? Yet they turned out fine.

This appears to be a much better system than VMware, is free as in software, and it builds upon a free software operating system with lineage that predates Linux.

I say this in the most critical way possible, as someone who has built multiple Linux-based "cloud systems", and as a GNU/Linux distribution developer: I love it!

kawhah · on Oct 27, 2023

It was totally a risky choice for companies in the 1990s and early 2000s to put all their web stuff onto Linux on commodity hardware instead of proprietary Unix or Windows servers. Many did it when their website being up was totally mission critical. Lots did it on huge server farms. It paid off very quickly but it's erasing history to suggest that it didn't require huge amounts of guts, savvy and agility to even attempt it.

pjmlp · on Oct 30, 2023

Indeed, for me GNU/Linux was always a cheap way to have UNIX at home, given that Windows NT POSIX support never was that great.

The first time I actually saw GNU/Linux powering something in production was in 2003, when I joined CERN and they were replacing their use of Solaris, and eventually alongside Fermilabs came up with Scientific Linux in 2004.

Later at Nokia, it took them until 2006 to consider Red-Hat Linux a serious alternative to their HP-UX infrastructure.

usefulcat · on Oct 27, 2023

Completely tangential, but this reminds me of an interview I had for my first job out of college in 1995. I mentioned to the interviewer that I had some Linux experience. "Ah, Linux" he said. "A cool little toy that's gonna take over the world".

In hindsight of course it was remarkably prescient. This from a guy at a company that was built entirely around SGI at the time.

ddalex · on Oct 31, 2023

This is a skewed view - the critical piece that made Linux "enterprise-ish" was the memory management system that was contributed by IBM, part of the SCO lawsuit

sanderjd · on Oct 27, 2023

I have no clue what OS runs my VMs on EC2.

cashsterling · on Oct 26, 2023

Back in the day... Sun Micro was a GOAT and pushed the envelope on Unix computing 20-30 years ago. Solaris was stable and high performing.

I don't run on-prem clusters or clouds but know a couple people who do and, at large enough scale, it is a constant "fuck-shit-stack on top of itself" (to quote Reggie Watts). There is almost always something wrong and some people upset about it.

The promise of a fully integrated system (compute HW, network HW, all firmware/drivers written by experts using Rust wherever possible) that pays attention to optimizing all your OpEx metrics is a big deal.

It may take Oxide a couple more years to really break into the market in a big way, but if they can stick it out, they will do very well.

icedchai · on Oct 26, 2023

I used to love Sun and Solaris. Then the dot-com bubble burst, and Linux ate its lunch. I haven't seen a new Solaris system deployed in over 20 years.

steveklabnik · on Oct 26, 2023

Just to be clear, Illumos (it hasn't been Solaris in a very long time) is an implementation detail. It's not customer facing.

burnte · on Oct 26, 2023

> Just to be clear, Illumos (it hasn't been Solaris in a very long time) is an implementation detail. It's not customer facing.

Solaris is still Solaris, as of the latest release last month. OpenSolaris hasn't been OpenSolaris in a while and is Illumos, yes.

steveklabnik · on Oct 26, 2023

Yes, thanks. I didn't even realize my comment could be read that way, but I was speaking of Illumos only, Solaris is still Solaris :)

yencabulator · on Oct 26, 2023

It'll become customer facing the moment something doesn't work right.

ahl · on Oct 26, 2023

It won't. In the same way that AWS customers aren't debugging hypervisor, or Dell customers aren't debugging the BIOS, or Samsung SSD customers aren't debugging the firmware. Products choose where to draw the line between customer-serviceable parts and those that require a support call. In this case, expect Oxide to fix it when something doesn't work right.

jjav · on Oct 26, 2023

When Apple supports OSX for consumers, they don't exactly surface the fact that there's BSD semi-hidden in there somewhere.

That's because they own the whole stack, from CPU to GUI and support it as a unit. That's the benefit of having a product where a single owner builds and supports it as a whole.

My impression of Oxide is that that's the level of single source of truth they are bringing to enterprise in-house cloud. So, I strongly doubt the innards would ever become customer-facing (unless the customer specifically wants that, being open source after all).

yencabulator · on Oct 26, 2023

Apple is a horrible example, with Apple when you have a problem, you often end up with an unfixable issue that Apple won't even acknowledge. You definitely don't want to taint Oxide's reputation with that association.

As for why I think Helios will become customer facing: Oxide is a small startup. They have limited resources. Their computers expensive enough to be very much business critical. You'll get some support by Oxide logging in remotely to customer systems and digging around, but pretty soon the customer will want to do that themselves to monitor/troubleshoot the problems as they happen.

Imagine you're observing a recurring but rare I/O slowdown that seems to trigger under some certain conditions, and tell me a competent sysadmin wouldn't want to log in on all the related boxes (client Helios, >=3 server Helioses for the block store) and look at the logs & stats.

snuxoll · on Oct 27, 2023

> As for why I think Helios will become customer facing: Oxide is a small startup. They have limited resources.

Have you looked at the pedigree of many of the people behind the project? I don't say this because "these guys smart", but because these guys bent over backwards for their customers when they were Sun engineers. Bryan didn't write dtrace for nothing.

> Imagine you're observing a recurring but rare I/O slowdown that seems to trigger under some certain conditions, and tell me a competent sysadmin wouldn't want to log in on all the related boxes (client Helios, >=3 server Helioses for the block store) and look at the logs & stats.

I think you're simultaneously over-estimating and under-estimating the people who will deploy this. There's a lot of companies who would want a "cloud in a box" that would happily plug hardware in and submit a support ticket if they ever find an issue, because their system engineers either don't have the time, desire, or competence (unfortunately common) to do anything more. The ones who are happy to start debugging stuff on their own would have absolutely wonderful tooling at their fingertips (dtrace) and wouldn't have any issue figuring out how to adapt to something other than Linux (hell, I've been running TrueNAS for the better part of a decade and being on a *BSD has never bothered me).

jjav · on Oct 26, 2023

> Apple is a horrible example,

Apple is a great example of the benefits of an integrated system where the hardware and software are designed together. There are tons of benefits to that.

What makes Apple evil (IMO, many people disagree) is how everything is secret and proprietary and welded shut. But that doesn't take away from the benefits of an integrated hardware/software ecosystem.

Oxide is open source so it doesn't suffer from the evil aspect but benefits from the goodness of engineered integration. Or so I hope.

freedomben · on Oct 27, 2023

In practice I don't think it's as good as in theory. I had Apple Macbook Pro with Apple Monitor, and 50% of the time when unplugging the monitor the laptop screen would stay off. Plugging back in to the monitor wouldn't work at that point so all I could do was hold the power button to force it off and reboot. That's with Apple controlling the entire stack - software, hardware, etc.

I think the real benefit is being able to move/deprecate/expand at will. For example, want an app that would require special hardware? You can just add it. Want to drop support for old drivers? Just stop selling them and then drop (deprecate) the software support in the next release.

I fully agree about the evilness, and it baffles me how few people do!

nijave · on Oct 27, 2023

Android is potentially a better example. Compare Android to trying to get Linux working on <some random laptop>. You might get lucky and it works out of the box or you might find yourself in a 15 page "how to fix <finger print reader, ambient light sensor, etc>" wiki where you end up compiling a bunch of stuff with random patches.

Afaik Android phones tend to have a lot more hardware than your average laptop, too (cell modem, gps, multiple cameras, gyro, accelerometer, light sensors, finger print readers)

pjmlp · on Oct 27, 2023

Apple is the survivor of 16 bit home micros integration, PC clones only happened as IBM failed to prevent Compaq's reverse engineering to take over their creation, they even tried to get hold of it afterwards via PS/2 and MCA.

As we see nowadays on tablets and laptops, most OEMs are quite keen in returning back to those days, as otherwise there is hardly any money left on PC components.

Always_Anon · on Oct 26, 2023

Exactly right, Apple is actually a poor example. Watch enough Louis Rossmann and you'll grasp just how bad some of their shit can be.

Fnoord · on Oct 26, 2023

Funny how you mentioning BSD got me to thinking of Sony Playstation and Nintendo Switch. Which are proprietary and not user serviceable. A Steam Deck, Fairphone, or Framework laptop is each less proprietary and more FOSS stack, and user serviceable. Which a user may or may not want to do themselves; at the very least they can pay someone and have them manage it.

Also, Apple is just the one who survived. Previously I'd have thought of SGI, DEC, Sun, HP, IBM, Dell some of whom survived some not.

Those three consumer products I mentioned each provide a platform for a user and business space to floroush and thrive. I expect a company doing something similar for cloud computing to want the same. But it will require some magick: momentum, money, trust. That kind of stuff, and loads of it. (With some big names behind it and a lot of FOSS they got me excited, but I don't matter.)

pjmlp · on Oct 27, 2023

Contrary to urban myths, Nintendo Switch OS is a microkernel OS, not something based on BSD.

fomine3 · on Oct 30, 2023

IBM mainframe, they survive in specific category

throw0101c · on Oct 26, 2023

> When Apple supports OSX for consumers, they don't exactly surface the fact that there's BSD semi-hidden in there somewhere.

Or Linux running underneath all the Java-y Android stuff.

nosequel · on Oct 26, 2023

If you have a bug in how a lambda function is run on AWS, do you find yourself looking for the bug in firecracker? It is open source, so you technically could, but I just don't see many customers doing that. Same can be said about KNative on GCP.

Their choice in foundation OS (for lack of a better term) really should not matter to any customer.

yencabulator · on Oct 26, 2023

I am unable to do so.

Now imagine a multi-million dollar mission critical pile of computers running on premises, and your sysadmin being able to do so.

Oxide is closer to a rack of Supermicros than AWS.

sanderjd · on Oct 27, 2023

Ok but then that is purely additive then, right? Like, "have to find someone with Illumos expertise to fix something that was never intended to be customer-facing" may not be easy, but is still easier than the impossibility of doing the same thing on AWS / Azure / Google Cloud.

Voultapher · on Oct 26, 2023

Right, who wants or benefits from open source firmware anyway.

Also there are many situations where renting, for example a flat makes a lot of sense. And there are many situations where the financials and or enabled options of owning something make a lot of sense. Right now, the kind of experience you get with AWS and co. can only be rented, not bought. Some people want to buy houses instead of renting them.

necovek · on Oct 26, 2023

Well, you can buy your own hardware and set it up with OpenStack and use it as a private cloud. Companies like Canonical or Redhat make a lot of money by providing software (mostly open source) to support exactly that use case.

And Canonical played with a cluster-in-a-box all the way back in 2013-2014: https://www.zdnet.com/article/canonicals-cloud-in-a-box-unde...

You could turn it into an OpenStack cloud in ~20 mins with an automated Juju OpenStack install.

jjav · on Oct 26, 2023

> Well, you can buy your own hardware and set it up with OpenStack and use it as a private cloud. Companies like Canonical or Redhat make a lot of money by providing software (mostly open source) to support exactly that use case.

Sure you can, but then who will diagnose and fix your hardware/OS interaction problems when you have parts from five vendors in the mix?

If you haven't lived through this, the answer is: nobody. Everyone points fingers at the other 4 and ignore your calls.

Back in the day you could buy a fully integrated system (from CPU to hardware to OS) from Sun or SGI or HP and you had a single company to answer all the calls, so it was much better. Today you can't really get this level of integration and support anymore.

(Actually, you probably can from IBM, which is why they're still around. But I have no experience in the IBM universe.)

This is why Oxide is so exciting to me. I hope I can be in a company that becomes a customer at some point.

Always_Anon · on Oct 26, 2023

>Sure you can, but then who will diagnose and fix your hardware/OS interaction problems when you have parts from five vendors in the mix?

Dell is a single vendor that will diagnose and fix all of your hardware issues.

With Oxide you're locked into what looks like a Solaris derivative OS running on the metal and you're only allowed to provision VMs which is a huge disadvantage.

I run a fleet of over 30,000 nodes in three continents and the majority is Flatcar Linux running on bare metal. Also have a decent amount of RHEL running for specific apps. We can pick and choose our bare metal OS which is something you cannot do with Oxide. That's a tough pill to swallow.

jjav · on Oct 26, 2023

> Dell is a single vendor that will diagnose and fix all of your hardware issues.

I've been a Dell customer at a previous company. I know for a fact that's not true.

I had a support ticket for a weird firmware bug open for two years, they could never figure it out. I left that job but for all I know the case is still open many years later.

Dell doesn't know how to fix things like that because they don't design and engineer the systems they sell. Dell is a reseller who puts components together from a bunch of vendors and it mostly works but when it doesn't, there's nobody on staff who can fix it.

Always_Anon · on Oct 26, 2023

I've been a Dell customer for decades at this rate and I know for a fact it's true.

I've had support tickets open for all kinda of weird firmware, hardware, etc. bugs and they've been well resolved, even if it meant Dell just replaced the part with something comparable (NIC swap).

>Dell doesn't know how to fix things like that because they don't design and engineer the systems they sell.

Of course they do. That's like saying Oxide doesn't know how to fix stuff because they don't design the CPU, NVMe, DIMMs, etc. Oxide is still going to vendors for these things.

bcantrill · on Oct 27, 2023

Ironically, it was Dell's total inability to resolve a pathological rash of uncorrectable memory errors very much is part of the origin story of Oxide: this issue was very important to my employer (who was a galactic Dell customer) and as the issue endured and Dell escalated internally, it became increasingly clear that there was in fact no one at Dell who could help us -- Dell did not understand how their own systems work.

At Oxide, we have been deliberate at every step, designing from first principles whenever possible. (We -- unlike essentially everyone else -- did not simply iterate from a reference design.)

To make this concrete with respect to the CPU in particular, we have done our own lowest-level platform enablement software[0] -- we have no BIOS. No one -- not the hyperscalers, not the ODMs and certainly not Dell -- has done this, and even AMD didn't think we could pull it off. Why did we do it this way? Because all along our lodestar was that problem that Dell was useless to us on -- that we wanted to understand these systems from first principles, because we have felt that that is essential to deliver the product that we ourselves wanted to by.

There are plenty of valid criticisms of Oxide -- but that we don't understand our system simply isn't one of them.

[0] https://www.osfc.io/2022/talks/i-have-come-to-bury-the-bios-...

mlindner · on Oct 27, 2023

As a side question, what's the name of your custom firmware that is the replacement of the AGESA bootloader? I tried searching on the oxide github page but couldn't find anything that seemed to fit that description.

bcantrill · on Oct 27, 2023

(The AGESA bootloader -- or ABL -- is in the AMD PSP.) In terms of our replacement for AGESA: the PSP boots to our first instruction, which is the pico host bootloader, phbl[0]. phbl then loads the actual operating system[1], which performs platform enablement as part of booting. (This is pretty involved, but to give you a flavor, see, e.g. initialization of the DXIO engine.[2])

[0] https://github.com/oxidecomputer/phbl

[1] https://github.com/oxidecomputer/illumos-gate/tree/stlouis

[2] https://github.com/oxidecomputer/illumos-gate/blob/stlouis/u...

mlindner · on Oct 27, 2023

Thanks, are the important oxide branches of illumos-gate repo (and any other cloned repos) defined anywhere? I definitely wouldn't have found that branch without you mentioning it here.

Always_Anon · on Oct 27, 2023

Interesting enough I also ran into something somewhat related with Dell that they were not able to resolve so they ended up working in a replacement from another vendor.

Nonetheless, it is quite interesting what you've built, but as the end user I'm not quote convinced that it matters. Sure you can claim it reduces attack vectors and such but we'll still see Dells and IBMs in the most restricted and highest security postured sites in the world. Think DoD and such. Core/libreboot with RoT will get me through compliance the same.

The software management plane y'all built is the headlining feature IMHO, not so much what happens behind the scenes that the vast majority of the time will not have a fatal catastrophic upstream effect.

>There are plenty of valid criticisms of Oxide -- but that we don't understand our system simply isn't one of them.

That's not what I said. There's a line in the sand that you must cross when it comes to understanding the true nature of the componentry that you're using. At the end of the day, your AMD CPUs may be lying to you, to all of us, but we just don't know it yet.

wmf · on Oct 26, 2023

Dell is a single vendor that will diagnose and fix all of your hardware issues.

And you'll be down for weeks or months while they do it.

Always_Anon · on Oct 26, 2023

>And you'll be down for weeks or months while they do it.

Off by a few orders of magnitude. Dell on-site SLA with pre-purchased spares was about 6 hours.

With Oxide, you'd be lucky to get same day service.

jjav · on Oct 26, 2023

> Off by a few orders of magnitude. Dell on-site SLA with pre-purchased spares was about 6 hours.

You're talking about replacement parts. Yes Dell is good about that.

The discussion above is asking them to diagnose and fix a problem with the interaction of various hardware components (all of which come from third parties).

Always_Anon · on Oct 26, 2023

Oxide also has various hardware components from AMD, Intel, Samsung, etc. They are not manufacturing every component.

mlindner · on Oct 27, 2023

But they _are_ writing the firmware that runs most of them and need to understand those devices at a deep level in order to do that, unlike Dell. Dell slaps together hardware and firmware from other vendors with some high level software of their own on top. They don't do the low level firmware and thus don't understand the low level intricacies of their own systems.

Always_Anon · on Oct 27, 2023

No they're not unless I'm mistaken. They're not writing the firmware that runs on the NVMe drives, nor the NICs (they're not even writing the drivers for some of the NICs), etc.

There's a line in the sand that you must cross when it comes to understanding the true nature of the componentry that you're using. At the end of the day, your AMD CPUs may be lying to you, to all of us, but we just don't know it yet.

wmf · on Oct 26, 2023

I'm not speaking hypothetically. If you hit a "zero-day" bug that Dell has never seen it's going to take time. And somehow every large customer finds bugs that Dell certification didn't.

pests · on Oct 26, 2023

> And somehow every large customer finds bugs that Dell certification didn't.

It's a law of computer engineering.

In the Apollo 11 decent sequence the Rendezvous Radar experienced a hardware bug[0] not uncovered during simulation. They found it later, but until then, the solution was adding a "turn off Rendezvous Radar" checklist item.

[0] The Rendezvous Radar would stop the CPU, shuttle some data into areas it could be read, and woke the CPU back up to process it. The bug caused it to supuriously do this dance just to tell it "no new data", which then caused other systems to overload.

Always_Anon · on Oct 26, 2023

>I'm not speaking hypothetically.

Neither am I.

>If you hit a "zero-day" bug that Dell has never seen it's going to take time.

If you hit a "zero-day" bug that Oxide has never seen it's going to take time.

>And somehow every large customer finds bugs that Dell certification didn't.

Yes, happens. And I'm sure the exact same will happen with Oxide, so it's not a differentiator.

samcat116 · on Oct 26, 2023

The vast majority of people only need to deploy VMs.

Always_Anon · on Oct 26, 2023

It's ironic coming from a company who's CTO has harped about containers on bare metal for years. Maybe a large swath only need to deploy VMs, but the future will most definitely involve bare metal for many use cases, and oddly Oxide doesn't support that currently.

pseg134 · on Oct 26, 2023

I run a battalion of 78,000 nodes and I disagree with you.

Always_Anon · on Oct 26, 2023

I used to run over 150,000 nodes and I agree with me.

jryle70 · on Oct 27, 2023

See the pattern? Dell only care about the big guys.

Set aside the childish tone ...

> Dell is a single vendor that will diagnose and fix all of your hardware issues.

There are two anecdotes here disagreeing with you, and frankly that's enough to say what you said above isn't true, not universally so. I doubt Odixe is targeting big deployment like yours, but more like theirs. Whether they will succeed is another matter, but they do have a valid sales pitch and the expertise to pull it off.

jeffrallen · on Oct 27, 2023

> you had a single company to answer all the calls, so it was much better

Huh, then why did Sun, Oracle, and Veritas have to set up a shared tech support center in San Jose?

"Accelerated finger pointing", said a friend who had to do business with them.

Always_Anon · on Oct 26, 2023

>Right, who wants or benefits from open source firmware anyway.

Their competition has open source firmware as well:

https://www.dell.com/en-us/blog/enabling-open-embedded-syste...

bcantrill · on Oct 26, 2023

So OpenBMC is fine (happy for them!), but having open firmware is much deeper and broader than that: yes, it's the service processor (in contrast to the BMC which is a closed part on Dell machines) -- but it's also the root-of-trust and (especially) the host CPU itself. We at Oxide have open source software from first instruction out of the AMD PSP; I elaborated more on our approach in my OSFC 2022 talk.[0]

[0] https://www.osfc.io/2022/talks/i-have-come-to-bury-the-bios-...

Always_Anon · on Oct 26, 2023

Dell now ships with OpenBMC iDRACs and such. How does what you mention differ from the RoT in Dells?

https://www.dell.com/en-us/blog/hardware-root-trust/

c_o_n_v_e_x · on Oct 27, 2023

Dell uses trusted platform modules (TPM). It's a separate chipset than the BMC chipset.

For a mostly open source solution, not only would you need open source BMC firmware, you must have an open source UEFI/BIOS/boot firmware like CoreBoot, LinuxBoot, Oreboot, Uboot, etc.

NexRebular · on Oct 26, 2023

The fact that it's not on linux is one of the great things about it. There is too much linux on critical infrastructure already and the monoculture just keeps on growing.

At least with Oxide there is a glimmer of hope for a better future in this regard.

sanderjd · on Oct 27, 2023

> something no one really wants, or, at best, is niche

Could be! Seems too early to tell though, and remains to be seen whether it pencils. Which is the whole idea of starting a new venture, no?

throw0101a · on Oct 26, 2023

> They sell servers, but as a finished product.

They sell rack-as-compute.[0] Their minimum order is one rack: You plug in power and network, connect to the built-in management software (API), and start spinning up VMs.

[0] With built-in networking and storage.

cryptonector · on Oct 26, 2023

It would be interesting to sell a data center in a container. Cooling, power supply, compute, storage, and network, all in a box. You supply power, a big network pipe, and the piping to external heat exchangers.

throw0101c · on Oct 26, 2023

See §Examples:

* https://en.wikipedia.org/wiki/Modular_data_center

Also:

* https://www.deltapowersolutions.com/en/mcis/data-center-solu...

* https://atos.net/en/solutions/high-performance-computing-hpc...

* https://www.zelladc.com/zella-max

Search for "shipping container data centre" or "containerized datacenter".

bayindirh · on Oct 26, 2023

Huawei also does this. Built to order, single container. Completely isolated, plug’n’play.

anewlanguage · on Oct 26, 2023

AWS does that already for the defense industry: https://aws.amazon.com/blogs/publicsector/announcing-aws-mod...

dthul · on Oct 26, 2023

I know one company who offers that (there might be more): https://www.grando.ai/en/container

jjav · on Oct 26, 2023

> It would be interesting to sell a data center in a container.

Sun did that experiment:

https://en.wikipedia.org/wiki/Sun_Modular_Datacenter

cryptonector · on Oct 26, 2023

Yeah, I'm aware, but I didn't think they were serious about it.

ahl · on Oct 30, 2023

That's entirely fair: Sun wasn't serious about most of the products they rolled out... and customers noticed.

rtpg · on Oct 27, 2023

IIJ has a project like this, data center in a container, just add power. They build it all up in Japan, ship it to rural areas across the world to basically jumpstart a local data center (I imagine mostly for industrial sites). They had a fun project where they had a half rack, powered by solar and connected to the net via Starlink.

numpad0 · on Oct 26, 2023

There are whitebox Windows laptops, OpenWRT routers, Arduino boards, ArduPilot drones, etc. It almost sounds strange that there are no prepopulated 12U racks intended for OpenStack(is that still a thing?)

regularfry · on Oct 27, 2023

No idea how they do things today, but v0.1 of Azure (before it was called Azure) was a bunch of containers in a field. I remember seeing an aerial photo at the time.

c_o_n_v_e_x · on Oct 26, 2023

There are quite a few companies selling “modular data centers” now.

steve1977 · on Oct 26, 2023

So just hyperconverged infrastructure with a cute name?

danpalmer · on Oct 26, 2023

“Just” is doing a lot of lifting in that sentence.

This achievement is clearly worth a lot to people.

throw0101a · on Oct 26, 2023

With rack- and multi-rack-level management of all your hardware infrastructure (including networking and storage, along with VMs) using an API:

* https://docs.oxide.computer/api/guides/responses

steve1977 · on Oct 26, 2023

Yeah including networking and storage together with virtualization is what makes hyperconverged infrastructure hyperconverged. Otherwise it's usually just called converged infrastructure.

It's nice, it's just nothing new.

sanderjd · on Oct 27, 2023

It's not a new concept, it's a new product. Ideas do not become uninteresting the moment the first person instantiates the idea. Further iterations on a possibly-good idea are also interesting.

freeopinion · on Oct 27, 2023

So, you're saying this is just ho-hum, and you're sticking with your IBM 700 from the 1950s?

intelVISA · on Oct 26, 2023

oh, you tease.

dustingetz · on Oct 26, 2023

way better explanation than the website copy

"no cables no assembly just cloud" wtf is that

steveklabnik · on Oct 26, 2023

Obviously different marketing copy speaks to different people. But that is referring to how, when you buy a rack from us, you don't need to put everything together and cable it all up: you pull it out of the box, plug in networking and power, boot the thing up, and you're good to go. Installation time is hours, not days or weeks, which is the norm.

Eduard · on Oct 26, 2023

"no cables no assembly just cloud" is completely misleading to any kind of people - tech or marketing or not.

When people hear cloud, it means that aspects such as electricity costs, electricity stability, Internet, bandwidth, fire protection, safety, etc etc are abstracted away.

Oxide IS on-premise, right? The website is very vague and wishy-washy.

steveklabnik · on Oct 26, 2023

It is on premises. You interact with the rack the same way you interact with the public cloud: as a pool of resources. The specifics are abstracted away. “Private cloud” is pretty well established terminology in this space, and that’s what we’re doing.

MichaelZuo · on Oct 26, 2023

Will you be selling services as well, such as taking care of installation, boot up, testing, validation, etc., for a fee?

steveklabnik · on Oct 26, 2023

At this stage of the company, everyone gets a white-glove installation process. I suspect that will change over time but I don't work on that part of things, so I don't personally know the details.

MichaelZuo · on Oct 26, 2023

Thanks, are the specific details standardized and available in writing? Or is it more tailored to each customer?

steveklabnik · on Oct 26, 2023

Sorry to be slightly obtuse, which details are you referring to here? Help upon installation? At the moment, we are helping customers individually, yeah. But we do have a documented process we are following https://docs.oxide.computer/guides/system/rack-installation-... (and more on other pages there)

MichaelZuo · on Oct 26, 2023

The details would be things such as the requirements associated with the white glove installation process:

Size of doorways, weight bearing capacity of floors, electrical service parameters, environmental conditions, etc.

e.g. Does it actually handle electrical voltage fluctuations of +/- 1V, or whatever is advertised?

The guaranteed parameters of a fully set up machine:

Minimum performance metrics, software compatibility with whatever the sales department promised, maximum power draw, etc.

e.g. Can it reliably hit X metric (FLOPS, IOPS, Integer calculations, etc.)?

And so on.

steveklabnik · on Oct 26, 2023

Ah yeah, so the "facilities" section of https://oxide.computer/product/specifications has some of these things, probably the closest we have to publicly publishing that in a general sense.

MichaelZuo · on Oct 26, 2023

Yes I understand, but will your included service actually verify that everything is set up correctly, meets advertised parameters, and sign off on it? (Such that the customer can start using it immediately afterwards.)

Or does the customer need to take on some risk and hazard associated with installation, configuration, initial boot up, etc.?

e.g. If someone buys with the intention of using it up to X FLOPS, and the machine only delivers Y FLOPS once it's all said and done, what happens?

steveklabnik · on Oct 26, 2023

It’s not the area of the company I personally work on, so I don’t know those details, to be honest. We certainly make sure that everything is working properly.

MichaelZuo · on Oct 26, 2023

So I assume there's no guarantee that it will be plug and play after the white glove installation?

Otherwise I would imagine it would be a major selling point and be advertised publicly.

steveklabnik · on Oct 26, 2023

I mean, we absolutely sell support. I just don't know anything about the details personally. You shouldn't take my lack of knowledge as a "no," just a "steve doesn't personally know."

MichaelZuo · on Oct 27, 2023

I had assumed a simple 'yes'/'no'/'maybe sometime later' answer was just a quick phone call away with the relevant person.

But if it is a complicated enough internal issue for that not to be the case, then my apologies for pestering.

cdchn · on Oct 26, 2023

>plug in networking and power

No cables, except for a few cables.

troupe · on Oct 26, 2023

Would anyone who has actually set up a rack assume that they meant these racks were wireless with a self-contained nuclear generator?

I think their description conveys what it does just fine for the target audience.

steveklabnik · on Oct 26, 2023

Yes, those are two different things.

To be super clear about it, this is referring to not needing to cable up all of the individual sleds to the rack upon installation. It doesn't mean that we recommend connecting a rack of compute to your data center via wifi.

troupe · on Oct 26, 2023

Cdchn was hoping that it had a Starlink antenna built into the rack. :)

cozzyd · on Oct 26, 2023

and solar panels!

freeopinion · on Oct 27, 2023

Wait, these things are weather proof, too?!!

0x457 · on Oct 26, 2023

powered by tesla coil

aidenn0 · on Oct 26, 2023

They mean no intra-rack cables, which are the overwhelming majority of cables on a typical rack.

pclmulqdq · on Oct 26, 2023

This is pretty big, as someone who has deployed servers to datacenters before. Remote hands are very good at plugging in the network uplink and the PDUs. Doing a complete leaf-spine 25GbE network with full redundancy is something they are pretty much guaranteed to screw up at some point.

gleb · on Oct 26, 2023

I wouldn't be dismissive of people telling you that the product description can be improved. My opinion is that the description of the product in this thread will outperform your site 10 to 1.

steveklabnik · on Oct 26, 2023

I am not trying to be dismissive, I was just explaining since there was some confusion.

gleb · on Oct 26, 2023

I'll try to explain, not in the spirit of being argumentative, but with the hope of being useful.

The comment you replied to was not questioning the value of integrated cabling. It was pointing out that the product description on the site does not make sense.

"Cloud computer" sounds like a server you rent from AWS. It's kind of like calling Rust "cloud compiler."

If you choose to use words that your audience doesn't understand, or even worse understands to mean the opposite of what you want them to mean, it's a good idea to explain these words immediately using conventional words with conventional meaning. The comments by throw0101a did that.

The product seems really cool, but there is no way I would've understood what it was from the website.

steveklabnik · on Oct 26, 2023

I understand that's what you're saying, and I understand what the parent is saying. I chose to explain what that alluded to, in case anyone in this conversation is also finding it hard to understand what is meant by that specific copy. That doesn't mean I don't understand the broader point, or that I think the website copy is perfect.

twicetwice · on Oct 26, 2023

Perhaps if you don't understand what the copy means, then that is a sign that you are not the target audience, rather than that the copy is bad? From what I've gathered from reading other comments in this thread, that copy will make perfect sense to Oxide's target audience, as it uses words in a way that will be very familiar and make perfect sense to the kind of person who might make a purchasing decision for a system like this.

And for what it's worth, I don't think you need to explain what's happening to Steve, it seems to me that he understands perfectly well. To me you come across as being rather condescending and in my opinion Steve is being commendably polite in response.

gorjusborg · on Oct 27, 2023

> "Cloud computer" sounds like a server you rent from AWS. It's kind of like calling Rust "cloud compiler."

Cloud as a term is pretty undefined/overloaded. It makes sense to me if I think about 'cloud' computing as API-based provisioning on pooled resources.

the__alchemist · on Oct 26, 2023

Hypothesis: "no cables" and "no assembly" here is analogous to the term "serverless". Or, more abstractly, the word "literally".

zeckalpha · on Oct 26, 2023

Mainframes!

zozbot234 · on Oct 26, 2023

"Real" mainframes have RAS (Reliability, Availability, Servicing) features such as hotswapping for all hardware components and automated HA/workload migration across physical racks. They can also do SSI (single system image), i.e. run a single workload across physical nodes/racks as if it was just multiple 'cores' in a single shared-memory computer. Oxide computers will probably end up doing at least some of this (namely workload migration across racks for HA) but saying that it can comprehensively replace mainframe hardware as-is is a bit of a stretch. In terms of existing hardware it's closer to a midrange computer.

NortySpock · on Oct 26, 2023

The Oxide and Friends podcast had an episode on virtualizing time, specifically for the purpose of live-migrating a container from rack to rack without the VM being aware, and allowing operators to take the rack offline on their schedule. Otherwise, apparently, you end up having to leave racks running because you cannot evacuate all of the containers currently running on it. (e.g. perhaps your contracts or SLAs are such that you cannot afford even the few seconds of downtime a shut-down-here-and-spin-up-elsewhere would cause)

I believe the episode name was "Virtualizing Time"

https://pca.st/episode/c10ce39c-1348-407f-b9c2-a36ced4e6be8

throw0101c · on Oct 26, 2023

With source code availability: https://github.com/oxidecomputer

cratermoon · on Oct 26, 2023

https://en.wikipedia.org/wiki/PureSystems

rcxdude · on Oct 26, 2023

Available to buy turnkey, not just rented out.

unixhero · on Oct 27, 2023

There is no just in hyper

otabdeveloper4 · on Oct 26, 2023

[flagged]

pclmulqdq · on Oct 26, 2023

When you start a container in AWS, do you think Amazon is letting you run that without a VM? VMs are pretty lightweight these days.

drzaiusx11 · on Oct 26, 2023

With hypervisor support it really isn't

otabdeveloper4 · on Oct 26, 2023

What this guy said: https://news.ycombinator.com/item?id=38024122

If you have a rack like that you'd probably want it to run a real database or server backend.

quickthrower2 · on Oct 26, 2023

Have they basically done for the data centre what iMac did for computers in about year 1999 (or whenever!)

repelsteeltje · on Oct 26, 2023

> what iMac did for computers in about year 1999

Ehrm. What is that exactly?

Are you alluding to cute design, different user interface? Or ditching then common PC component modularity? "Thinking differently"?

One difference is that Oxide development was done in the open and they don't seem hell bent on creating a closed ecosystem. (Yet, at least)

steveklabnik · on Oct 26, 2023

I think Oxide shares one idea with Apple: hardware and software should be created for each other. In that sense, your parent is correct.

You are also correct that we diverge from Apple in other ways, such as our commitment to openness, rather than secrecy.

quickthrower2 · on Oct 26, 2023

Yes. I imagine I got downvoted because it sounds like a snark but it wasn’t meant to be.

sangnoir · on Oct 26, 2023

> Ehrm. What is that exactly?

The first iMac famously made it easy to connect to the Internet; The 'i' in iMac was for "Internet". Its setup manual was a couple of pages long, mostly pictures and IIRC, just 37 words.

nonameiguess · on Oct 27, 2023

Brutal downvotes. I guess people forget the old Jeff Goldblum commercials? "There is no step 3."

https://www.youtube.com/watch?v=rX4PzvWgu6c

ollybee · on Oct 26, 2023

Existing vendors will provide rack integration services and deliver a turn key solution like this. Also vendors of virtualization management software have partnerships with hardware suppliers and be happy to deliver fully integrated solutions if you're buying by the rack. The difference is in those cases you have flexibility in the design which seems to be missing here.

Proxmox and a full rack of Supermicro gear would not be as sophisticated, but end result is pretty much the same, with I imagine far far better bang for buck.

I like it, but it doesn't seem like a big deal or revolutionary in any way.

growse · on Oct 26, 2023

Those of us who've bought large "turn-key" solutions from Dell etc. have often discovered that it's actually just a cobbled-together bunch of things which may or may not work well together on a good day, depending on what you're trying to do. Just because it's all got the word "Dell" written on it, doesn't mean that the components were all engineered by people who were working together to build a single working system.

When it breaks, good luck!

EvanAnderson · on Oct 26, 2023

Total agreement. Another point: Having the "Dell" name on the front doesn't give you a "throat to choke" as so many people seem to think is important. Unless you're very large scale then, at best, you can threaten them that they don't get your next business. You're certainly not going to get help.

You're no worse-off with Oxide from that perspective. Their open source firmware means that thr opportunity to pay somebody else to support you at least exists.

tomnipotent · on Oct 26, 2023

> that they don't get your next business.

Even small shops can use bad experience as leverage for credits and discounts, especially if the vendor has account managers. This is one of the (few) benefits of having a human involved in invoicing vs. self-serve.

yencabulator · on Oct 26, 2023

Same is true of Oxide, it'll be up to actual experience to see how well it works. Oxide seems to have written their own distributed block storage system (https://github.com/oxidecomputer/crucible), have their own firmware, kernel and hypervisor forks, etc -- when any of that breaks, good luck!

jjav · on Oct 26, 2023

> Oxide seems to have written their own ...

> when any of that breaks, good luck!

The premise is that you don't need luck, you can call Oxide. As you said, they wrote all of it, so they own all the interaction so they can diagnose all of it.

When I call Dell with a problem between my OS filesystem and the bus and the hardware RAID, there's at least three vendors involved there so Dell doesn't actually employ anyone that knows all of it so they can't fix it.

Sure, Oxide now needs to deliver on that support promise but at least they are uniquely positioned to be able to do it.

yencabulator · on Oct 26, 2023

That's the same premise as with all "turn-key" solutions. If it didn't come with software support, it wasn't really turn-key.

The rest comes down to execution. Sure, we all have high hopes for Oxide. Sure, we all hate established players like Dell.

jjav · on Oct 26, 2023

> That's the same premise as with all "turn-key" solutions. If it didn't come with software support, it wasn't really turn-key.

Just about any company will sell your company a support contract.

The more interesting question is, can they back it up with action when push comes to shove? I suspect most people have plenty of stories of opening support tickets with big name vendors that never get resolved. And through the grapevine you find out that they won't fix it because they can't fix it. They might not even have access to the source code or anyone on staff who has a clue about it because it came from who knows where. Sales is happy to sell you the support contract but it doesn't mean your problems can be fixed. BTDT.

From listening to the Oxide podcasts, my impression is that Oxide actually can technically fix anything in the stack they sell, which would make them vastly different from Dell et.al.

yencabulator · on Oct 26, 2023

Skill-wise, yes for sure (except perhaps for storage -- I haven't heard them talk about that much). Bandwidth wise, though?

I used to work for a company targeting Fortune 500s. At that level of spend, when a client had a problem, somebody got on a plane. Only a fraction of those problems escalated all the way to R&D, which is where Oxide skills are. That's where VMWare etc are hard to beat.

dasil003 · on Oct 26, 2023

The premise is that the bandwidth needed will be orders of magnitude less, because the engineering will be orders of magnitude better. The opportunity makes sense as we've long been climbing up the local maximum peak of enterprise sales driven tech behemoths built on a cobbled together mix of open source and proprietary pieces held together with bubblegum.

Can an engineering first approach break into the cloud market? Hard to say as enterprise sales is very powerful, and the numerous "worse is better" forces always loom large in these endeavours. That said, enterprise sales driven companies are fat, slow and complacent. Oxide is lean and driven, and a handful of killer use cases and success stories is probably enough to sustain them and could be the thin end of the wedge on long-term success. We can hope anyway.

JeremyNT · on Oct 26, 2023

> Proxmox and a full rack of Supermicro gear would not be as sophisticated, but end result is pretty much the same, with I imagine far far better bang for buck.

I think the question is how well they can do the management plane. Dealing with the "quirks" of a bunch of grey box supermicro stuff is always painful in one way or another. The drop shipped, pre-cabled cab setups are definitely nice but that's only a part of what Oxide is doing here. No cables and their own integrated switching sounds nice too (stuff from the big vendors like UCS is closer to this ballpark but also probably closer to the cost too).

I suspect cooling and rack density could be better in the Oxide solution too, not having to conform to the standards might afford them some possibilities (although that's just a guess, and even if they do improve there these may not be the bottlenecks for many).

throw0101c · on Oct 26, 2023

> I think the question is how well they can do the management plane.

Docs:

* https://docs.oxide.computer/api/guides/responses

See perhaps "This repo houses the work-in-progress Oxide Rack control plane."

* https://github.com/oxidecomputer/omicron

jjav · on Oct 26, 2023

> Existing vendors will provide rack integration services and deliver a turn key solution like this.

My experience with the likes of Dell is that they'll deliver it but they won't support it.

Sure, there's a support contract. And they try. But while they sell a box that says Dell, the innards are a hodgepodge of stuff from other places. So when certain firmware doesn't work with something else, they actually can't help because they don't own it, they're just a reseller.

civilitty · on Oct 26, 2023

Classic hacker news! https://news.ycombinator.com/item?id=9224

ollybee · on Oct 26, 2023

It's a fair point! I would certainly trust the opinion of Bryan Cantrill over my own as well.

la64710 · on Oct 26, 2023

AWS outposts have been there in the market for a long time .. though I am sure there are differences but to say extisting cloud vendors were blind to on prem requirements is a stretch.

foobiekr · on Oct 26, 2023

Also future datacenter builds are going to be focusing on specific applications which means specific builds. I think Nvidia has a much better chance here with their superpod than Oxide. The target use case is pretty unclear.

On-prem buyers are doing cost reduction and cost reduction targets things like, as one example, the crazy cost of GPU servers on the CSPs. Your run of the mill stuff is very hard to cost reduce.

You can see their sort of lack of getting it by using Tofino2 as their switch. That’s just a very bad choice that was almost certainly chosen for bad reasons.

hderms · on Oct 26, 2023

can you elaborate a bit? What you're saying sounds pretty interesting but I'm too ignorant to read between the lines

foobiekr · on Oct 26, 2023

You don't build a new greenfield compute pod because you want to, you do it because it makes sense. Making sense is about cost and non-cost needs like data gravity and regulatory issues.

The cost case only works for GPU heavy workloads which this isn’t - wrong chassis, wrong network, etc.

Tofino2 is the wrong choice because even when they made that choice it would have been clear that it’s doa. Intel networking has not been a success center in, well, ever. That’s a selection that could only have been made for nerd reasons and not sensible business goals alignment or risk mitigation.

When you make an integrated solution you’d better be the best or close to the best at everything. This does not seem to be the best at anything. I will grant that it is elegant and largely nicer than the hyper converged story from other vendors but in practical terms this is the 2000s era rack scale VxBlock from Cisco or whatever Dell or HPE package today. Marginally better blade server is not a business.

They also make a big deal and have focused on things no one who actually builds data center pods cares about.

I actually hope they get bought by Dell or HPE or SuperMicro. Those companies could fix what’s wrong here and benefit a lot from the attention to detail and elegance on display.

ThinkBeat · on Oct 26, 2023

IBM invented this a long time ago. Mainframes.

>They sell servers, but as a finished product. Not as a cobbled together mess of third party stuff where the vendor keeps shrugging if there is an integration >problem. They integrated it.

socrates137 · on Oct 26, 2023

100%.

I’m actually extremely impressed. I want one. I haven’t worked in a data center in years, but I’d be tempted to do it again just to get my hands on one.

fiddlerwoaroof · on Oct 26, 2023

I wish they’d sell a tabletop version for hobbyists, but realize this is probably a distraction. But… the problem with a lot of these systems (including the old Sun boxes and things like ibm mainframes and the AS/400) is that they sound cool but there’s no real way for the typical new developer to “get into them” for fun and, as a result, you lose the chance for some developer selling it to their company based on his experience with the things.

acct-litter-zed · on Oct 30, 2023

Apparently (I don't remember it, although I probably did read the Byte magazine at the time) there was a rumor in the early 1980s that IBM's PC was going to be a shrunken 370, called the 380. [1][2]

I wish IBM would shrink their LinuxONE Rockhopper 4 Rack Mount down to at least an "under the desktop" model. To my knowledge, IBM still makes quality products and has excellent customer service. They have fun names too (Rockhopper and Emperor are types of penguins!) and they even have 3D models of their rack mount cloud computers with shadows. [3] In fact, when I first read about Oxide a year or two ago, I searched for "IBM cloud server", and left it at that. So IBM, could you please send someone from the LinuxONE down to Boca Raton to create our new PC? :) Thanks!

P.S. I would even accept a shrunken 370. :)

[1] https://en.wikipedia.org/wiki/IBM_Personal_Computer

[2] https://archive.org/details/byte-magazine-1981-01/page/n313/...

[3] https://www.ibm.com/demos/it-infrastructure/ux/product.html#...

vhodges · on Oct 26, 2023

Not the same in any meaningful way but https://turingpi.com/product/turing-pi-2/ might interest you.

Also https://artemis.sh/2022/03/14/propolis-oxide-at-home-pt1.htm...

Does illumos run on ARM?

Fnoord · on Oct 26, 2023

I own a Turing Pi 2 but the hardware it is running on is proprietary. The switch isn't managed. The manament software is very archaic. Yes, it is modular and stackable and probably thousands of times more hobbyist friendly than Oxide but so is edge computing in general.

fiddlerwoaroof · on Oct 26, 2023

For example, this form factor looks really nice for a “hobbyist edition” or “evaluation edition”: https://zimacube.zimaboard.com/. I would probably buy an Oxide rack like this as soon as pre-orders were announced.

Xymist · on Oct 26, 2023

They won't even tell you how much a rack will cost. Infuriating typically B2B "talk to Sales so we can decide exactly how much we can get out of you and segment the market on the fly" approach persists even here, it seems.

fiddlerwoaroof · on Oct 26, 2023

I wouldn’t expect anything else for a full rack in this segment: it’s going to be tens or hundreds of thousands of dollars, and big enough that there will be some inevitable negotiation about prices.

Xymist · on Oct 27, 2023

That doesn't mean you can't have a thin spec builder and a pricing page, even if what that mostly gets used for is devs putting together a comparison of that to a cloud deployment or similar and taking that to the procurement department to argue it's worth opening the conversation.

EvanAnderson · on Oct 26, 2023

Same here. I really want to work on one of these. I got in the industry at the tail end of the time when people used Sun and DEC gear. I got to use just a little bit of it and it seemed so much more "put together" then PC stuff is even now.

Oxide feels like it'll be that "integrated" experience, but with the added benefit of software freedom.

hlandau · on Oct 26, 2023

>Even if Oxide goes bust, you can still salvage things in a pinch.

Is this true? Can you set your own root of trust for the firmware signing key and build and deploy it yourself?

mlindner · on Oct 26, 2023

I would assume so. They've said before you can make modifications to the firmware and deploy it yourself if you so wish. That's one of the major reasons that making the firmware open source is so useful.

pid-1 · on Oct 26, 2023

While working in telecom data centers circa 2016 I've seen many single rack computers from Dell, IBM, HP, Huawei... Not sure that's a new ideia, ex. the open source bits.

yencabulator · on Oct 26, 2023

I think Dell, IBM & HP all went through a "blade" era where they built cableless systems that plugged into a backplane.

pxc · on Oct 26, 2023

> Ironically this looks like the realization of Richard Stallman's dream where users can help each other if something doesn't work.

How is it ironic?

silverlake · on Oct 26, 2023

How is this different from AWS Outpost?

skywhopper · on Oct 26, 2023

You own this. AWS Outpost is leased and you still also pay for the resource usage on top of the outpost unit itself. And this would not be integrated with your AWS account.