The ruby::box thing looks pretty interesting, from a cursory glance you can run two simultaneous versions of something like a feature or rollout much more conveniently.
Also being able to do
if condition1
&& condition2
...
end
on multiple lines rather than one - this is pretty nifty too!
I'm kinda hoping that eventually each ractor will run in it's own ruby::box and that each box will get garbage collected individually, so that you could have separate GCs per ractor, BEAM-style. That would allow them to truly run in parallel. One benefit should be to cut down p99 latency, since much fewer requests would be interrupted by garbage collection.
I'm not actually in need of this feature at the moment, but it would be cool and I think it fits very well with the idea of ractors as being completely separated from each other. The downside is of course that sharing objects between ractors would get slower as you'd need to copy the objects instead of just sharing the pointer, but I bet that for most applications that would be negligible. We could even make it so that on ractor creation you have to pass in a box for it to live in, with the default being either a new box or the box of the parent ractor.
They already truly run in parallel in Ruby 4.0. The overwhelming majority of contention points have been removed in the last yet.
Ruby::Box wouldn't help reducing contention further, they actually make it worse because with Ruby::Box classes and modules and an extra indirection to go though.
The one remaining contention point is indeed garbage collection. There is a plan for Ractor local GC, but it wasn''t sufficiently ready for Ruby 4.0.
I know they run truly parallel when they're doing work, but GC still stops the world, right?
Assuming you mean "because with Ruby::Box classes and modules have an extra indirection to go though." in the second paragraph, I don't understand why that would be necessary. Can't you just have completely separate boxes with their own copies of all classes etc, or does that use too much memory? (Maybe some COW scheme might work, doodling project for the holidays acquired haha)
Anyway, very cool work and I hope it keeps improving! Thanks for 4.0 byroot!
Yes, Ractor local GC is the one feature that didn't make it into 4.0.
> Can't you just have completely separate boxes with their own copies of all classes etc, or does that use too much memory?
Ruby::Box is kinda complicated, and still need a lot of work, so it's unclear how the final implementation will be. Right now there is no CoW or any type of sharing for most classes, except for core classes.
Core classes are the same object (pointer) across all boxes, however they have a constant and method table for each box.
But overall what I meant to say is that Box wouldn't make GC any easier for Ractors.
In languages where placement don't matter, like c/js, I prefer leading booleans. It makes it much easier to see the logic, especially with layers of booleans.
Personally && in the new line seems to be much better readability. Can’t wait to use some smart cop to convert all existing multiline ifs in my codebase.
For folks who scan code bases based on the front of lines, it makes it easier to grok. Also helps with deleting and inserting lines (similar to leading or trailing commas in lists).
It's funny how I have been doing this way of writing the conditions in languages, where one can, like Python (if you use a pair of parentheses) and linters have yelled at me for ages to put the binary operator on the previous line. People herald these quite subjective things like truths, just because there is some tool, that they can delegate responsibility to.
There are others as well but NVidia is aggressive when it comes to punishing companies willing to buy non-NVidia products. As a result, they prefer to remain under the radar, at least until they have enough market leverage to be more widely known.
I imagine 2 big giants basically (Nvidia/google/amd basically influenced by a few select of people) vs (Chinese companies who have investments from the govt)
its sort of like proxy wars and this is sort of whats happening in software side of things with open source models but I think that the benefit of the proxy wars is going to be for the end consumers
But although on the other hand, having two large countries compete with each other while buying everything else and all feels like it astronomically increases the price if someone wants to compete with these two giants (any other country perhaps)
We definitely need a better system where it doesn't feel like we are seeing pacman eat everything up basically
Is it not? All this money is going into AI under the fear that China will win the race to AGI. China releases open-source models that keep OpenAI/Anthropic researching and training their models, which in turn creates demand for more Nvidia GPUs.
I'm working on MedAngle, the world's first agentic AI Super App for current and future doctors. Invite only, 100k+ users, 150m+ questions solved, tens of billions of seconds spent studying smarter.
Cerebras has been a true revelation when it comes to inference. I have a lot of respect for their founder, team, innovation, and technology. The colossal size of the WS3 chip, utilizing DRAM to mind-boggling scale, it's definitely ultra cool stuff.
I also wonder why they have not been acquired yet. Or is it intentional?
I will say, their pricing and deployment strategy is a bit murky and unclear. Paying $1500-$10,000 per month plus usage costs? I'm assuming that it has to do with chasing and optimizing for higher value contracts and deeper-pocketed customers, hence the minimum monthly spend that they require.
I'm not claiming to be an expert, but as a CEO/CTO, there were other providers in the market that had relatively comparable inference speed (obviously Cerebras is #1), easier onboarding, better response from people that worked there (all of my experience with Cerebras have been days/weeks late or simply ignored). IMHO, if Cerebras wants to gain more mindshare, they'll have to look into this aspect.
I also wonder why they have not been acquired yet. Or is it intentional?
A few issues:
1. To achieve high speeds, they put everything on SRAM. I estimated that they needed over $100m of chips just to do Qwen 3 at max context size. You can run the same model with max context size on $1m of Blackwell chips but at a slower speed. Anandtech had an article saying that Cerebras was selling a single chip for around $2-3m. https://news.ycombinator.com/item?id=44658198
2. SRAM has virtually stopped scaling in new nodes. Therefore, new generations of wafer scale chips won’t gain as much as traditional GPUs.
3. Cerebras was designed in the pre-ChatGPT era where much smaller models were being trained. It is practically useless for training in 2025 because of how big LLMs have gotten. It can only do inference but see above 2 problems.
4. To inference very large LLMs economically, Cerebras would need to use external HBM. If it has to reach outside for memory, the benefits of a wafer scale chip greatly diminishes. Remember that the whole idea was to put the entire AI model inside the wafer so memory bandwidth is ultra fast.
5. Chip interconnect technology might make wafer scale chips more redundant. TSMC has a roadmap for glueing more than 2 GPU dies together. Nvidia’s Feynman GPUs might have 4 dies glued together. IE, the sweet spot for large chips might not be wafer scale but perhaps 2, 4, 8 GPUs together.
6. Nvidia seems to be moving much faster in terms of development and responding to market needs. For example, Blackwell is focused on FP4 inferencing now. I suppose the nature of designing and building a wafer scale chip is more complex than a GPU. Cerebras also needs to wait for new nodes to fully mature so that yields can be higher.
There exists a niche where some applications might need super fast token generation regardless of price. Hedge funds and Wallstreet might be good use cases. But it won’t challenge Nvidia in training or large scale inference.
> I estimated that they needed over $100m of chips just to do Qwen 3 at max context size
I will point out (again :)), that this math is completely wrong. There is no need (nor performance gains) to store the entire weights of the model in SRAM. You simply store n transformer blocks on-chip and then stream block l+n from external memory to on-chip when you start computing block l, this completely masks the communication time behind the compute time, and specifically does not require you to buy 100M$ worth of SRAM. This is standard stuff that is done routinely in many scenarios, e.g. FSDP.
That blog is about training. For inference, the weights and kv cache are in SRAM. Having said that, the $100M number is inaccurate/meaningless. It's a niche product that doesn't have economies of scale yet.
The blog is about training but the technique applies equally well to inference, just like FSDP and kv cache sharing are routinely done in inference on GPUs.
There is just no need to have parameters or kv cache for layer 48 in SRAM when you are currently computing layer 3, you have all the time in the world to move that to SRAM when you get to layer 45 or whatever the maths work out to be for your specific model.
I did experiments with this on traditional consumer GPU and the larger the discrepancy between model size and VRAM, the faster it dropped off (exponentially) to as if you didn’t even have any VRAM in the first place (over PCIe). This technique is well known and works when you have more than enough bandwidth.
However, the whole point that even HBM is a problem is the available bandwidth is insufficient, so if you’re marrying SRAM and HBM I would expect the performance gains to be overall modest for models that exceed available SRAM in a meaningful way.
This is highly dependent on exact model size, architecture and hardware configurations. If the compute time for some unit of work is larger than the time it takes to transfer the next batch of Params you are good to go. If you are doing it sequentially though then yes you will pay a heavy price, but the idea is to fetch a future layer not the one you need right away.
As a similar example I have trained video models on ~1000 H100 where the vast majority of parameters are sharded and so need to be first fetched on the network before being available on HBM, which is similar imbalance to the HBM vs SRAM story. We were able to fully mask comms time such that not sharding (if it was even possible) would offer no performance advantage.
> but did not dispute that they store the entire model on SRAM.
No idea what they did or did not do for that specific test (which was about delivering 1800 tokens/sec though, not simply running qwen-3) since they didn't provide any detail. I don't think there is any point storing everything in SRAM, even if you do happen to have 100M$ worth of chips lying around in a test cluster at the office, since WSE-3 is designed from the ground up for data parallelism (see [1] section 3.2) and inference is sequential both within a single token generation (you need to go through layer 1 before you can go through layer 2 etc.) and between tokens (autoregressive, so token 1 before token 2). This means most of your weights loaded in SRAM would be just sitting unused most of the time, and when they need to be used they need to be broadcasted to all chips from the SRAM of the chip that has the particular layer you care about, this is extremely fast, but external memory is certainly fast enough to do this if you fetch the layer in advance. So the way to get the best ROI on such a system would be to pack the biggest batch size you can (so many users' queries) and process them all in parallel, streaming the weights as needed. The more your SRAM is occupied by batch activations and not parameters, the better the compute density and thus $/flops.
You can check the Cerebras doc to see how weight streaming works [2]. From the start, one of the selling point of Cerebras is the possibility to scale memory independently of compute, and they have developped an entire system specifically for weight streaming from that decoupled memory. Their docs seems to keep things fairly simple assuming you can only fit one layer in SRAM and thus they fetch things sequentially, but if you can store at least 2 layers in those 44GB of SRAM then you can simply fetch l+1 when l is starting to compute, completely masking latency cost. Its possible they already mask the latency even within a single layer by streaming by tiles for matmul though, unclear from their docs. They mention that in passing in [3] section 6.3.
All of their doc is for training since it seems for inference play they have pivoted to selling API access rather than chips, but inference is really the same thing, just without the backprop (especially in their case were they aren't doing pipeline parallelism where you could claim doing fwd+back prop gives you better compute density). At the end of the day whether you are doing training or inference, all you care about is that your cores have the data they need in their registers at the moment they are free to compute, so streaming to SRAM works the same way in both cases.
Ultimately I can't tell you how much it cost to run Qwen-3, you can certainly do it on a single chip + weight streaming, but their specs are just too light on the exact FLOPs and bandwidth to know what the memory movement cost would be in this case (if any), and we don't even know the price of single chip (everyone is saying 3M$ though, regardless of that comment on the other thread). But I can tell you that your math of doing `model_size/sram_per_chip * chip_cost` just isn't the right way to think about this, and so the 100M$ figure doesn't make sense.
I’m not saying it’s particularly competitive, I’m saying claiming it cost 100M$ to run Qwen is complete lunacy. There is a gulf between those 2 things.
And beyond pure performance competitiveness there are many things that make it hard for Cerebras and to be actually competitive: can they ship enough chips to meet the need of large clusters ? What about the software stack and lack of great support compared to nvidia? Lack of ml engineers that know how to use them, when everyone knows how to use CUDA and there are many things developed on top of it by the community (e.g triton).
Just look at the valuation difference between AMD and Nvidia, when AMD is already very competitive. But being 99% of the way there is still not enough for customers that are going to pay 5B$ for their clusters.
> SRAM has virtually stopped scaling in new nodes.
But there are several 1T memories that are still scaling, more or less — eDRAM, MRAM, etc. Is there anything preventing their general architecture from moving to a 1T technology once the density advantages outweigh the need for pipelining to hide access time?
I’m pretty sure that HBM4 can be 20-30x faster in terms of bandwidth than eDRAM. That makes eDRAM not an option for AI workloads since bandwidth is the main bottleneck.
No, only Groq uses the all SRAM approach, Cerebras only use SRAM for local context while the weights are still loaded from RAM (or HBM). With 48 Kbytes per node, the whole wafer has only 44 GB SDRAM, much lower than the amount needed for loading the whole networks.
This is actually completely unnecessary in the batched inference case.
Here is an oversimplified explanation that gets the gist accross:
The standard architecture for transformer based LLMs is as follows: Token Embedding -> N Layers each consisting of an attention sublayer and an MLP sublayer -> Output Embedding.
Most attention implementations use a simple KV caching strategy. In prefill you first calculate the KV cache entries by performing GEMM against the W_K, W_V, W_Q tensors. In the case of token generation, you only need to calculate against the current token. Next comes the quadratic part of attention. You need to calculate softmax(Q K^T)V. This is two matrix multiplications and has a linear cost with respect to the number of entries in the KV cache for generating the next token, as you need to re-read the entire KV cache plus the new entry. For prefill you are processing n tokens, so the cost is quadratic. The KV cache is unique for every user session. It also grows with the size of the context. This means the KV cache is really expensive memory wise. It consumes both memory capacity and bandwidth and it also doesn't permit batching.
Meanwhile the MLP sublayer is so boring I won't bother going into the details, but the gist is that you have a simple gating network with two feed forward layers that project the token vector into a higher dimension (e.g. more outputs than inputs) known as up gate and then you element-wise multiply these vectors and then feed them into a down gate which reduces it back to the original dimension of the token vector. Since the matrices are always the same, you can process the tokens of multiple users at once.
Now here are the implications of what I wrote above: Prefill is generally compute bound is therefore mostly uninteresting,or rather, interesting for ASIC designers because FLOPS are cheap and SRAM is expensive. Token generation meanwhile is a mix of being memory bandwidth bound and compute bound in the batched case. The MLP layer is trivially parallelized through GEMM based batching. Having lots of SRAM is beneficial for GEMM, but it is not super critical in a double buffered implementation that performs loading and computation simultaneously with the memory bandwidth being chosen so that both finish roughly at the same time.
What SRAM buys you for GEMM is the following: Given two square matrices A, B and their output A*B = C of the same dimension, where A and B are both 1 GiB in size and x MiB of SRAM, you tile the GEMM operation so that each sub-matrix is x/3 MiB in size. Let's say x=120MiB which means 40 MiB per matrix. You will split the matrices A and B into approximately 25 tiles. For every tile in A, you have to load all tiles in B. Meaning (A) 25 + 25*25 (A*B) = 650 load operations of 40 MiB matrices for a total amount of reads of 26000 MiB. If you double the SRAM you now have 13 tiles of size 80 MiB. 13 + 13*13 = 182. 182 * 80 MiB = 14560 MiB. Loosely speaking, doubling SRAM reduces the needed memory bandwidth by half. This is boring old linear scaling, because fewer tiles also means bigger tiles, so the quadratic gain of 4x reduction in loads is outweighed by 2x bigger load operations. Having more SRAM is good though.
Now onto Flash Attention. If I had to dumb down flash attention, it's a very quirky way of arranging two GEMM operations to reduce the amount of memory allocated to the intermediate C matrix of the first Q*K^T multiplication. Otherwise it is the same as two GEMM with smaller tiles. Doubling SRAM halves the necessary memory bandwidth.
Final conclusion: In the batched multi user inference case your goal is to allocate the KV cache to SRAM for attention nodes and achieve as large of a batch size as possible for MLP nodes and use the SRAM to operate on as large tiles as possible. If you achieve both, then the required memory bandwidth scales reciprocal to the amount of SRAM. Storing full tensors in SRAM is not necessary at large batch sizes.
Of course since I only looked at the memory aspects, it shouldn't be left out that you need to evenly match compute and memory resources. Having SRAM on its own doesn't buy you anything really.
I’ve been using them as a customer and have been fairly impressed. The thing is, a lot of inference providers might seem better on paper but it turns out they’re not.
Recently there was a fiasco I saw posted on r/localllama where many of the OpenRouter providers were degraded on benchmarks compared to base models, implying they are serving up quantized models to save costs, but lying to customers about it. Unless you’re actually auditing the tokens you’re purchasing you may not be getting what you’re paying for even if the T/s and $/token seems better.
OpenRouter should be responsible for this quality control, right? It seems to me to be the right player in the chain with the duties and scale to do so.
> I will say, their pricing and deployment strategy is a bit murky and unclear. Paying $1500-$10,000 per month plus usage costs? I'm assuming that it has to do with chasing and optimizing for higher value contracts and deeper-pocketed customers, hence the minimum monthly spend that they require.
Yeah wait, why rent chips instead of sell them? Why wouldn't customers want to invest money in competition for cheaper inference hardware? It's not like Nvidia has a blacklist of companies that have bought chips from competitors, or anything. Now that would be crazy! That sure would make this market tough to compete in, wouldn't it. I'm so glad Nvidia is definitely not pressuring companies to not buy from competitors or anything.
1. They’re useless for training in 2025. They were designed for training prior to LLM explosion. They’re not practical for training anymore because they rely on SRAM which is not scalable.
2. No one is going to spend the resources to optimize models to run on their SDK and hardware. Open source inference engines don’t optimize for Cerebras hardware.
Given the above two reasons, it makes a lot of sense that no one is investing in their hardware and they have switched to a cloud model selling speed as the differentiator.
The UAE has sunk a lot of money into them, and I suspect it's not purely a financial move. If that's the case, an acquisition might be more complicated than it would seem at first glance.
They were acquisition target since 2017 (from the OpenAI internal emails). So lacking of acquisition is not because lacking of interests. Let you wonder what happened in these due-diligence.
As a doctor and full stack engineer, I would never go into radiology or seek further training in it. (obviously)
AI is going to augment radiologists first, and eventually, it will start to replace them. And existing radiologists will transition into stuff like interventional radiology or whatever new areas will come into the picture in the future.
As a radiologist and full stack engineer, I’m not particularly worried about the profession going away. Changing, yes, but not more so than other medical or non-medical careers.
>AI is going to augment radiologists first, and eventually, it will start to replace them.
I am a medical school drop-out — in my limited capacity, I concur, Doctor.
My dentist's AI has already designed a new mouth for me, implants &all ("I'm only doing 1% of the finish-work: whatever the patient says doesn't feel just quite right, yet"—myDMD). He then CNCs in-house on his $xxx,xxx 4-axis.
IMHO: Many classes of physicians are going to be reduced to nothing more than malpractice-insurance-paying business owners, MD/DO. The liability-holders, good doctor.
In alignment with last week's (H)(1)(b) discussion, it's interesting to note that ~30% of US physician resident "slots" (<$60kUSD salary) are filled by these foreigner visa-holders (so: +$100k cost per applicant, amortized over a few years of training, each).
There's a number of you (engineer + doctor), though quite rare. I have a few friends who are engineers as well as doctors. You're like unicorns in your field. The Neo and Morpheus of the medical industry - you can see things and understand things that most people cant in your typical field (medicine). Kudos to you!
This was actually my dream career path when I was younger. Unfortunately there's just no way I would have afforded the time and resources to pursue both, and I'd never heard of Biomedical Engineering where I grew up.
As a doctor and full stack engineer you’d have a perfect future ahead of you in radiology - the profession will not go away, but will need doctors who can bridge the full medical-tech range
What’s your take on pharmacists? To my naive eyes, that seems like a certainty for replacement. What extra value does human judgement bring to their work?
My wife is a clinical pharmacist at a hospital. I am a SWE working on AI/ML related stuff. We've talked about this a lot. She thinks that the current generation of software is not a replacement for what she does now, and finds the alerts they provide mostly annoying. The last time this came up, she gave me two examples:
A) The night before, a woman in her 40's came in to the ER suffering a major psychological breakdown of some kind (she was vague to protect patient privacy). The Dr prescribed a major sedative, and the software alerted that they didn't have a negative pregnancy test because this drug is not approved for pregnant women and so should not be given. However, in my wife's clinical judgement- honed by years of training, reading papers, going to conferences, actual work experience and just talking to colleagues- the risk to a (potential) fetus from the drug was less than the risk to a (potential) fetus from mom going through an untreated mental health episode and so she approved the drug and overrode the alert.
B) A prescriber had earlier in that week written a script for Tylenol to be administered "PR" (per-rectum) rather than PRN (per requisite need). PR Tylenol is a perfectly valid thing that is sometimes the correct choice, and was stocked by the hospital for that reason. But my wife recognized that this wasn't one of the cases where that was necessary, and called the nurse to call the prescriber to get that changed so the nurse wouldn't have to give them a Tylenol suppository. This time there were no alerts, no flags from the software, it was just her looking at it and saying "in my clinical judgement, this isn't the right administration for this situation, and will make things worse".
So someone- with expensively trained (and probably licensed) judgement- will still need to look over the results of this AI pharmacist and have the power to override its decisions. And that means that they will need to have enough time per case to build a mental model of the situation in their brain, figure out what is happening, and override if necessary. And it needs to be someone different from the person filling out the Rx, for Swiss cheese model of safety reasons.
Congratulations, we've just described a pharmacist.
> And it needs to be someone different from the person filling out the Rx, for Swiss cheese model of safety reasons.
This is something I question. If you go to a specialist, and the specialist judges that you need surgery, he can just schedule and perform the surgery himself. There’s no other medical professional whose sole job is to second-guess his clinical judgment. If you want that, you can always get a second opinion. I have a hard time buying the argument that prescription drugs always need that second level of gatekeeping when surgery doesn’t.
So, the main reason for the historical separation (in the European tradition) between doctor and pharmacist was profit motive- you didn't want the person prescribing to have a financial stake in their treatment, else they will prescribe very expensive medicine for all cases. And surgeons in particular do have a profit motive- they are paid per service- and it is well known within the broader medical community that surgeons will almost always choose to cut. And we largely gate-keep this with the primary care physician providing a recommendation to the specialist. The PCP says "this may be something worth treating with surgery" when they recommend you go see a specialist rather than prescribing something themselves, and then the surgeon confirms (almost always).
That pharmacists also provide a safety check is a more modern benefit, due to their extensive training and ability to see all of the drugs that you are on (while a specialist only knows what they have prescribed). And surgeons also have a team to double-check them while they are operating, to confirm that they are doing the surgery on the correct side of the body, etc. Because these safety checks are incredibly important, and we don't want to lose them.
I am a pharmacist who dabbles in web dev. We should easily be replaced because all of our work on checking pill images and drug interactions are actually already automated, or the software already tells us everything.
If every doctor agreed to electronically prescribe (instead of calling it in, or writing it down) using one single standard / platform / vendor, and all pharmacy software also used the same platform / standard, then our jobs are definitely redundant.
I worked at a hospital where basically doctors and pharmacists and nurses all use the same software and most of the time we click approve approve approve without data entry.
Of course we also make IVs and compounds by hand, but that's a small part of our job.
I'm not a doc or a pharmacist (though I am in med school) and I'm sure there are areas that AI could do some of a pharmacists job but on the outpatient side they do things like answering questions for patients and helping them interpret instructions that I don't think we want AI to do... or at least I really doubt an AIs ability to gauge how well someone is understanding instructions and augment how it explains something based on that assessment... on the inpatient side, I have seen pharmacists help physicians grapple with the pros and cons of certain treatments and make judgement calls about dosing that I think it would be hard to trust an AI to do because there is no "right" answer really. It's about balancing trade offs.
IDK, these are just limitations - people that really believe in AI will tell you there is basically nothing it can't do... eventually. I guess it's just a matter of how long you want to wait for eventually to come.
I work on a kiosk (MedifriendRx) which, to some degree "replaces" pharmacists and pharmacy staff.
The kiosk is placed inside of a clinic/hospital setting, and rather than driving to the pharmacy, you pick up your medications at the kiosk.
Pharmacists are currently still very involved in the process, but it's not necessarily for any technical reason. For example, new prescriptions are (by most states' boards of pharmacies) required to have a consultation between a pharmacist and a patient. So the kiosk has to facilitate a video call with a pharmacist using our portal. Mind you, this means the pharmacist could work from home, or could queue up tons of consultations back to back in a way that would allow one pharmacist to do the work of 5-10 working at a pharmacy, but they're still required in the mix.
Another thing we need to do for regulatory purposes is when we're indexing the medication in the kiosk, the kiosk has to capture images of the bottles as they're stocked. After the kiosk applies a patient label, we then have to take another round of images. Once this happens, this will populate in the pharmacist portal, and a pharmacist is required to take a look at both sets of images and approve or reject the container. Again, they're able to do this all very quickly and remotely, but they're still required by law to do this.
TL;DR I make an automated dispensing kiosk that could "replace" pharmacists, but for the time being, they're legally required to be involved at multiple steps in the process. To what degree this is a transitory period while technology establishes a reputation for itself as reliable, and to what degree this is simply a persistent fixture of "cover your ass" that will continue indefinitely, I cannot say.
Pharmacists are not going to be replaced, their jobs like most other jobs touched by AI will evolve, possibly shrink in demand but won't completely dissapear. AI is a tool that some professional has to use after all.
I could see that as more radiology AI tools become available to non-radiologist medical providers, they might choose to leverage the quick feedback those provide and not wait for a radiologist to weight in, even if they could gain something from the radiologist. They could make a decision while the patient is still in the room with them.
Partially true, and the answer to that is runway -- it will be a very long time before all the other specialties are fully augmented. With respect to "non-surgical" you may be underestimating the number and variety of procedures performed by non-surgeons (e.g. Internal Medicine physicians) -- thyroid biopsy, bronchoscopy, endoscopic retrograde cholangiopancreatography, liquid nitrogen ablation of skin lesion, bone marrow aspiration, etc.
The other answer is that AI will not hold your hand in the ICU, or share with you how their mother felt when on the same chemo regimen that you are prescribing.
It's unfortunate that email hosting and email infrastructure can really be done only well by major players. The days of people running and maintaining their own are pretty much long gone.
Fwiw, not a knock against CF. I like their products, mostly simple, fair pricing, etc. Just a bit unfortunate commentary on the state of email infra on the internet.
I run my own email server and you couldn't pay me to use a commercial provider like Google instead. The privacy benefits are huge and there is no one to restrict my storage or change my "terms and conditions" overnight.
The days of people running their own servers are gone because of the shortsightedness and laziness of IT managers. They though the "cloud" would be easier and cheaper, and they are now trapped.
I entertained the idea of running my own mail servers for a while. After researching the topic it turned out that the internet now runs on an IP reputation system. Major email services like gmail assume that anything sent from unknown IPs is malicious.
So it looks like we've gotta be well connected to federate with the other email servers now. A nobody like me can't just start up his own mail server at home and expect to deliver email to his family members who use gmail or outlook. So I became a Proton Mail customer instead.
I've run my own mail servers for many decades and have never had any deliverability issues. I've also never used bargain basement cloud VPS services with horrible reputations.
The best way to ensure a good reputation is to obtain your own address space from a RIR. Barring that, you need to choose a provider with a decent reputation to delegate the space to you.
Not true, at least for ARIN. If you have an IPv6 allocation, you can obtain one or more IPv4 /24 allocations, so long as their stated purpose is to provide IPv4/IPv6 compatibility (e.g. for dual-stack services or NAT): https://www.arin.net/participate/policy/nrpm/#4-10-dedicated...
From your HN profile, I see you're in Brazil, which is part of the region IANA has delegated to LACNIC. Per [0], LACNIC has further delegated numbering authority in Brazil to Registro.br.
Before I even start this bureaucratic process, I need to create an actual organization. Then I need to be assigned an ASN. Only then I'll be allowed to beg them for IPs. Once all that's taken care of, I need to tell them things like what the IPs will be used for and what my infrastructure is. If they like my answer, then they'll approve my request and finally tell me what the prices are.
I've been through the process about 10 times now at various companies, and the paperwork (at least for ARIN) is no more difficult than what would be expected to justify IP space from your typical ISP. If anything, the ARIN folks are more responsive and technically competent than your average ISP support agent, which makes the process easier.
> After researching the topic it turned out that the internet now runs on an IP reputation system. Major email services like gmail assume that anything sent from unknown IPs is malicious.
You have to buy/rent a dedicated IP address (that you'll be able to keep long term), and it warm it up by gradually increasing mail volume over a few months to weeks. But once you have, deliverability shoudl be fine.
I think the bigger issue is needing to keep on top of mainenance of the server.
Like the parent have ran Email servers for many years now. If you get a bad IP, as long as you get the DKIM records right, over time it will 'warm' up the IP. And the more you use the email on that IP and NOT spam people. The IP will warm up. Make sure you actually own that IP!!! It will become valuable.
Key point - own the IP. We own our IPs and we also buy elastic IPs from AWS. The entire AWS subnet (it seems their entire address space) is universally garbage and unwarmable. Our own IPs have hummed along for years with zero issues.
FWIW, a huge percentage of the spam I get is via Sendgrid, and at some point in the past year or two their abuse reporting mechanisms all turned into black holes, so mail sent via Sendgrid is heavily penalized in my spam rules.
Sending reputation is just as applicable if you're using a third party as if you're hosting it yourself, but much less under your control.
I don't have deliverability issues to the big providers, but that comes down to the age of my domain and my IP in a clean non-residential block. But you won't have reputation issues if your friends and family also run their own server and don't enforce such arbitrary requirements. Running your own servers, not only for email, is the only way to regain control over your computing.
I have arrived at the opinion that what I would do if I moved to selfhost would just be to pay some trivial amount for outbound email via a provider like sendgrid as someone else in these comments has also mentioned. Since I send out maybe a half dozen emails a month I don't think this would be a big deal.
But when I relied on selfhosted email several years ago, I was always inundated with spam, which SpamAssassin was wildly undermatched to handle -- that was one of the main reasons I moved to gmail. So I'm curious what people who are happy self-hosting today are using.
My suggestion would be to use a unique alias for each website/company. This way, if you start receiving spam at that address, you know who leaked it, and can simply delete the alias. You should also then publicly name and shame the source of spam.
I also run SpamAssassin on my server, but I don't believe it ever had to do anything.
I've run my own mail for 10 years (postfix/dovecot/rspamd), no issues. Reverse DNS, SPF, and DKIM records need to be in place, but that's a small lift.
Well, one time I was unable to send mail to a guy with an ancient @att.com email address from his ISP. I got a nice bounce message back with instructions to contact their sysadmins to get unblocked.
To my surprise, they unblocked the IP of my mail server in a matter of hours.
Private email will have no problems. I also ran my own mail server for personal use and had almost zero problem (and this was on an AWS IP!).
Where people will absolutely have problems is trying to run a marketing campaign through their own IP. You absolutely will (and should) get blocked. This is why these mixer companies exist and why you pay for an intermediary to delivery your mail.
I suspect if you shared more info about your mail infrastructure, it might reveal that what is working for you is too complicated for 99.9% of people to set up and maintain themselves.
I don't think the goal is that every non technical person can host their own mail infra.
But most people who can run a server should be able to setup OpenSMTPd with the DKIM filter and Dovecot. It's much easier than configuring postfix like we had to do in the past.
To answer a sibling comment, the last time I received an answer is a few minutes ago. The correspondent's email infra is hosted by Google.
You're right, it used to be a bit complicated. Now you just need to have a reputable and clean IP address, and knowledge of running some services in docker and of course understanding DNS and its crucial role for running a mail server.
I used to run all the components and maintain it (even that wasn't bad), but I changed to mailu[1] about a year ago
It is probably because you have run it so long that you have good reputation and less issues. Too bad we don't have time machine to go back to ninties to start building up reputation.
The problem is that Gmail will bounce any emails from DigitalOcean IP, even if you sit on this IP for years (so no recent spam), even if replying to someone, even if you registered as 'Postmaster' on Google.
So if you want to selfhost, you'll first need to find an IP that's not blocked to begin with.
> It's not hard, if you do it in a way that you can't send to like 50% of the recipients.
So it's hard (to do well)
>The problem is that Gmail will bounce any emails from DigitalOcean IP, even if you sit on this IP for years (so no recent spam), even if replying to someone, even if you registered as 'Postmaster' on Google.
>So if you want to selfhost, you'll first need to find an IP that's not blocked to begin with.
I'd say this is just the thing antitrust was made for. Hopefully some incumbent can get them to court.
That is not my experience at all. Using a pretty fresh IP and domain I get pretty good deliverability as long as I have proper rDNS and all the other normal steps (like DKIM, etc.)
Cloudflare's customers are companies that have to send out, say, reset password emails and verify account emails and other crumbs of the modern web. You want me to build my own infrastructure for that? Personally I can't wait for them to expand to SMS and crush Twilio.
> The days of people running and maintaining their own are pretty much long gone
This is very much a myth. There's a lot of FUD around how mail is "hard", but it's much less complicated than, say, running and maintaining a k8s cluster (professionally, I'm responsible for both at my org, so I can make this comparison with some authority).
Honestly `apt install postfix dovecot` gets you 90% of the way there. Getting spambinned isn't a problem in my experience, as long as you're doing SPF and DKIM and not using an often-abused IP range (yes, this means you can't use AWS). The MTA/MDA software is rock-solid and will happily run for years on end without human intervention. There really isn't anything to maintain on a regular basis apart from patches/updates every few months.
I think that there's a mindset among younger coders that "if it's not a modern post-AWS cloud provider, servers will take ages to come online and aren't going to give me full access, that's why EC2 exists." And this is conflated with the myth that running a mail server is hard.
But in practice, you can find any number of VPS providers, running in local datacenters, with modern self-service interfaces, with at least some IPs that aren't already spam flagged (and you can usually file a ticket to get a new IP if you need it), that are often cheaper per month than AWS, and give full root and everything. Find a service that will help you warm the IPs before you send to customers, and you're good to go!
This is 100% my experience too. Self-hosting email isn't any harder than self-hosting something else and there is no maintenance beyond apt update and apt upgrade. Even if you choose to do this in hard mode using postfix/dovecot instead of a dockerized stack, you can get a working config in a few minutes from an LLM these days.
> There's a lot of FUD around how mail is "hard", but it's much less complicated than, say, running and maintaining a k8s cluster
The main difference is that you're fully in control of the k8s cluster, but no matter what you do, you don't have control over the email infrastructure, because deliverability depends on the receiver. On every receiver you send to.
People say "I don't have deliverability problems!" but how do you know? Most places don't tell you they rejected your email.
Meh, one could also complain they don't have control over backbone networks, transit, peering agreements, and intermediary routing therefore hosting a service on k8s is futile without using a managed provider / PaaS.
> People say "I don't have deliverability problems!" but how do you know?
Because people reply to my emails.. because I email documents to family/friends/landlord/etc and they receive it as expected..
> Most places don't tell you they rejected your email.
> intermediary routing therefore hosting a service on k8s is futile without using a managed provider / PaaS.
Except that a managed service doesn't solve that for you. They are no better at that than you are. Email services are better at deliverability than you are, because they spend lots of time building their IP reputations and more importantly negotiating with mail providers to guarantee their emails show up.
> Because people reply to my emails.. because I email documents to family/friends/landlord/etc and they receive it as expected..
I'm guessing you don't confirm every email you send with every person though.
> Of course they do, this is what DMARC is for.
I was involved in the creation of DMARC (and SPF and DKIM) so I know how it's supposed to work, but in the real world, most providers do not honor the "reject" flag and actually send the bounces. Last time I dealt with it was a few years ago, maybe it's better now.
For context, I started my career at Sendmail, and I worked on the SPF and DKIM specs, so I've dealt with deliverability for 25+ years. I also ran my own mail server until around 2009. But I switched to Gmail as my primary around 2008, when deliverability just got too hard. But I still worked on commercial deliverability for years after that.
Granted, SPF and DKIM wasn't widely adopted at that point (and DMARK didn't exist), so maybe it's easier now. But at the same time, most of AWS/Azure/GCP are marked as bad automatically, as well as most home internet blocks.
So if you want to run your own mail server, you can't do it on your home router anymore, you have to rent a server in a rack and get a clean IP that's just for you. That costs $$$.
I see this common pattern where a previously private infrastructure is opened up (usually from low abstraction), and the ecosystem is split into an open base and a private thin layer, and that private layer might just reimplement the same tradeoffs that the incumbent private monoliths made.
Examples being Git/Github, Crypto/Centralized Exchanges, and as per the topic, email.
But I think that it's an important distinction that the base infrastructure is open, and that technically an incumbent could join the fray, albeit with a lot of catching up to do, and mix it up.
We are working on an open-source, self-hosted solution [0] to make this easier. When you correctly set up DKIM, SPF, reverse/forward DNS for IPs, it is not much hard to get emails delivered. IPs can still get blacklisted and you need to monitor blacklists and contact them if it happens. Solutions like Postfix are great, but they lack observability. In our solution, we have developed dashboards and health checks to make it easier to find problems with the set up.
We are currently running beta tests (really appreciate it if you can join).
There is a sweet spot between Gmail and self-hosting. I use Runbox and generally separate contexts, with CF being an exception as I use CF pages for static blog websites, some of their core services, AND as a registrar. For the latter, the default setting is porkbun. The reason for this is not CF's mandatory in-house DNS servers, but the simple fact that they do not register .de domains.
> It's unfortunate that email hosting and email infrastructure can really be done only well by major players. The days of people running and maintaining their own are pretty much long gone.
Its really not. Everyone can do that (doesn't mean everyone should). I'm running it for millions of emails daily and don't see why I would crappy proprietary service instead.
The ruby::box thing looks pretty interesting, from a cursory glance you can run two simultaneous versions of something like a feature or rollout much more conveniently.
Also being able to do
on multiple lines rather than one - this is pretty nifty too!reply