Prefixing this here with - this is just my experience, I’m sure everyone who’s gone down this rabbit hole has had a different experience.
I usually trawl Ebay for a little bit every day when I’m looking for something specific, and start making offers when I figure out exactly what I want. Negotiating with the seller, asking questions about things that are for “parts only”. I have built up a small but very useful set of contacts that liquidate a lot of large/medium tech co’s surplus through this, so shoot me a message (see profile) if you have some specific hardware in mind.
If you’re doing 100G ethernet, my advice, biased by personal experience, is to buy “old” Mellanox gear from before the Nvidia acquisition, but went through the phase of still being supported by Nvidia.
With NICs I’d avoid Connect-X 4, mostly due to their age, but 5 and 6 are great. There are so many models though so make sure you pick one that meets your needs, pay attention to the form factor especially - FHHL, HHHL, OCP 3, etc. Vendor OEM parts can often be cheaper - there’s a glut of HPE cables/transceivers, cards and switches out there, that are just Mellanox gear out there, or Supermicro AIOM cards that are fully OCP compliant. I’ve not had any issue mixing and matching this stuff.
For instance, I have a Gigabyte server with a couple Bergamo CPUs plugged into an HPE SN2700 switch with FS cables, Juniper transceivers, a Supermicro CX-6 OCP 100G card, and a non-OEM CX-6 dual port Infiniband CX-6 plugged into an QM8790 switch that looks like it’s been through through a rock tumbler and has a few ports that I’ve reattached with a pretty poor soldering job. Works flawlessly - literally the only issue I’ve had with it is losing the BMC password I set to the ether, and temporary jankiness with the 2700 after I accidentally force pushed the Mellanox update, instead of the HPE update. Still was able to update the firmware without going in to the datacenter.
I’ve had too many issues personally with Broadcom and their drivers to want to use them, but they are excellent if you put in the effort.
Never used the Intel 100G card, I was put off of trying by issues I had with their 10G stuff, though I think they are fully supported in the mainline kernel now.
The SN2100 can be found pretty cheap. The SN2700 is my fave, and they are actually pretty easy to repair (as long as the ASIC board isn’t the issue!). Sometimes you’ll find prototype networking equipment for some reason, but that stuff tends to work too. I first installed Debian on it after coming across an excellent article[0], since then I have also set up arch, and currently it’s running a seamless NixOS install (really the necessary kernel config was enabling switchdev and the MLX options like in the article). It’s basically just a 32 port NIC implemented mostly as an ASIC, with a small dual core Celeron server as a management peripheral ;) Just make sure you grab that RJ45 serial adapter! IIRC I think the Juniper-compatible ones work flawlessly with this.
I usually trawl Ebay for a little bit every day when I’m looking for something specific, and start making offers when I figure out exactly what I want. Negotiating with the seller, asking questions about things that are for “parts only”. I have built up a small but very useful set of contacts that liquidate a lot of large/medium tech co’s surplus through this, so shoot me a message (see profile) if you have some specific hardware in mind.
If you’re doing 100G ethernet, my advice, biased by personal experience, is to buy “old” Mellanox gear from before the Nvidia acquisition, but went through the phase of still being supported by Nvidia. With NICs I’d avoid Connect-X 4, mostly due to their age, but 5 and 6 are great. There are so many models though so make sure you pick one that meets your needs, pay attention to the form factor especially - FHHL, HHHL, OCP 3, etc. Vendor OEM parts can often be cheaper - there’s a glut of HPE cables/transceivers, cards and switches out there, that are just Mellanox gear out there, or Supermicro AIOM cards that are fully OCP compliant. I’ve not had any issue mixing and matching this stuff.
For instance, I have a Gigabyte server with a couple Bergamo CPUs plugged into an HPE SN2700 switch with FS cables, Juniper transceivers, a Supermicro CX-6 OCP 100G card, and a non-OEM CX-6 dual port Infiniband CX-6 plugged into an QM8790 switch that looks like it’s been through through a rock tumbler and has a few ports that I’ve reattached with a pretty poor soldering job. Works flawlessly - literally the only issue I’ve had with it is losing the BMC password I set to the ether, and temporary jankiness with the 2700 after I accidentally force pushed the Mellanox update, instead of the HPE update. Still was able to update the firmware without going in to the datacenter.
I’ve had too many issues personally with Broadcom and their drivers to want to use them, but they are excellent if you put in the effort. Never used the Intel 100G card, I was put off of trying by issues I had with their 10G stuff, though I think they are fully supported in the mainline kernel now.
The SN2100 can be found pretty cheap. The SN2700 is my fave, and they are actually pretty easy to repair (as long as the ASIC board isn’t the issue!). Sometimes you’ll find prototype networking equipment for some reason, but that stuff tends to work too. I first installed Debian on it after coming across an excellent article[0], since then I have also set up arch, and currently it’s running a seamless NixOS install (really the necessary kernel config was enabling switchdev and the MLX options like in the article). It’s basically just a 32 port NIC implemented mostly as an ASIC, with a small dual core Celeron server as a management peripheral ;) Just make sure you grab that RJ45 serial adapter! IIRC I think the Juniper-compatible ones work flawlessly with this.
[0] https://ipng.ch/s/articles/2023/11/11/mellanox-sn2700.html