Can someone elucidate us as to how so many platforms (ChatGPT, Gemini, Claude, etc etc) all sprung up so quickly? How did the engineering teams immediately know how to go about doing this kind of tech with LLMs and DNNs and whatnot?
By 2020/2021 with the release of GPT-3, the trajectory of a lot of the most obvious product directions had already become clear. It was mainly a matter of models becoming capable enough to unlock them.
E.g. here's a forecast of 2021 to 2026 from 2021, over a year before ChatGPT was released. It hits a lot of the product beats we've come to see as we move into late 2025.
It's not much different to other ML, pretty much it's on a bigger and more expensive scale. So once someone figured out the rough recipie (NN architecture, ludicrous scale of weights and data, reinforcement learning tuning), it's not hard for other experts in the field to replicate, so long as they have the resources. Deepseek was pretty much a side project, for example.
Was it that quickly? GPT 3 is where I would kind of put the start of this and that was in 2020, they had to work on the technology for quite a while before it got like this. Everyone else has been able to follow their progress and see what works.
I imagine it wasn't as immediate as it might look on the outside. If they all were working independently on similar ideas for a while, one of them launching their product might have caused the others to scramble to get theirs out as well to avoid missing the train.
I think it's also worth pointing out that the polish on these products was not actually there on day one. I remember the first week or so after ChatGPT's initial launch being full of stories and screenshots of people fairly easily getting around some of the intended limitations with silly methods like asking it to write a play where the dialogue has the topic it refused to talk about directly or asking it to give examples about what types of things it's not allowed to say in response to certain questions. My point isn't that there wasn't a lot of technical knowledge that went into the initial launch, but that it's a bit of an oversimplification to view things at a binary where people didn't know how to do it before, but then they did.
All of the products you mention already had research teams (in the case of ChatGPT and Claude that actually predated most of their engineers). So knowing how to build small language models was always in their wheel house. Scaling up to larger LLMs required a few algorithmic advancements but for the most part it was a question of sourcing more data and more compute. The remarkable part of transformers is their scaling laws, which let us achieve much better models without having to reinvent new architecture.
Intersection of cloud compute power being plentiful combined with existing LMs. As I understand it, right now, it's really just throwing compute power at existing LMs to learn on gigantic datasets.
I don’t like these examples because IRL nobody does things this way.
Try actual problems that require you to use these tools and the inter-relationships between them, where it becomes blindingly obvious why they exist. Calculus is a prime example and it’s comical most students find Calculus hard because their LA is weak. But Calculus has extensive uses, just not for doing basic carb counting.
Honestly all these cute websites give people a false sense that they're actually learning something. The only way to learn this stuff is get one of the million good LA books out there and work through the problems. But that's hard, so people look for shortcuts.
Yeah I think when students actually hit Calculus-level related rates, a small dim light starts to glow. Obviously it only gets brighter the less you have to hold onto and the more you have to mathematically present something that you are trying to reason about that all the tools start to make sense, the relationships are asking you “is this true in my case or do I need to take a step back?” and so forth.
I don’t have an axe to grind against the site I think it’s fine, but if someone wants to learn LA, a college level course followed by an intense grind of word problems and having to work backwards and forwards and finding flaws in answers might be a better way to develop the noggin for it. Just my 2c.
You’ve heard this line of thought before, and forgive me for parroting but here it goes:
Bluesky attracts the same people X attracts, they just disagree on specifics which in most cases are surface level. The fanaticism and tribalism is basically the same. There is no utopia where a community is pleasant without a lot of guarding and gatekeeping and, really, viewpoint alignment and subject matter filtering. Some topics are basically there for shitflinging, and that’s mostly the topics that seem to be a hot poker for everyone.
No one gets banned for preferring Debian over Fedora.