> SOTA models are already capable of outperforming any human on earth in a dizzying array of ways, especially when you consider scale.
So why are so many people still employed as e.g. software engineers? People aren’t prompting the models correctly? They’re only asking 10 times instead of 20? They’re holding it wrong?
Long form engineering tasks aren’t doable yet without supervision. But I can say in our shop, we won’t be hiring any more junior devs, ever, except as (in my region, free) interns or because of some extraordinary capabilities, insights, or skills. There just isn’t any business case for hiring junior devs to do the grunt work anymore.
But, the vast majority of work that is done in the world is not in the same order of magnitude of complexity or rigor that is required by long form engineering.
While models may not outperform an experienced developer, they will likely outperform her junior assistant, and a dev using ai effectively will almost certainly outperform a team of three without ai, in most cases.
The salient fact here is not that the human is outperformed by the model in a narrow field of extraordinary capability, but rather that the model can outperform that dev in 100 other disciplines, and outperform most people in almost any cerebral task.
My claim is not that models outperform people in all tasks, but that models outperform all people at many tasks, and I think that holds true with some caveats, especially when you factor in speed and scale.
What does junior or senior have anything to do with it ? I would think a smarter junior will run circles around a dumber senior engineer with LLM autocomplete.
If you’re hiring dumb senior engineers you’re holding it wrong lol. Using LLMs is a lot like delegating to a team from a skills perspective, so it favors extensive domain knowledge. You don’t just commit whatever it writes, just like you wouldn’t commit what a junior dev writes without scrutiny. Experience makes that scrutiny more valuable and effective.
How so? Preventing roll-backs on software updates is a "security feature" in most cases for better and for worse. Yeah, it would be convenient for tinkerers or in rare events such as these, but would be a security issue in the 99,9..99% of the time for enterprise users where security is the main concern.
I don't really understand this, many Linux distributions like Universal Blue advertise rollbacks as a feature. How is preventing a roll-back a "security feature"?
Imagine a driver has an exploitable vulnerability that is fixed in an update. If an attacker can force a rollback to the vulnerable older version, then the system is still vulnerable. Disallowing the rollback fixes this.
I think if someone wants to criticize Microsoft after experiencing their buggy products for 20 years straight, that is not “baseless,” although I accept that taking responsibility for literally anything our products do goes against the core values of our profession.
The do have some crappy products, but those crappy products make the world move, because nobody really makes better drop in replacement products, same as SAP, Canonical, Android, etc, none of them are fault tolerant, they all have issues and will fail if you fuzz them with enough edge cases, and according to this article CroudStrike caused the issue, not Windows which is what I was pointing at.
Do you think MacOS can't fail if you fuck with it long enough? Sometimes you don't even have to, it just fails by itself. My Ubuntu 22.04 LTS at my previous job gave me more issues than Windows ever did. Thanks Snaps, Wayland and APT. No workstation OS is perfect.
If you want a fault tolerant OS you're gonna have to roll out your own Linux/BSD build based on your requirements and do your own dev and testing. Which company has money for that? So of course they're gonna pick an off-the-shelf solution that best fits their needs on the budget. How is this Microsoft's fault what their customers choose to do with it? Did they guarantee anywhere their desktop OS os fault tolerant should be used in high availability systems and emergency services, especially with crappy endpoint solutions hooked at kernel level?
> > The Windows ecosystem typically deployed in corporate PCs or workstations is often insecure, slow, and poorly implemented
> Yes, but that's not because of Windows itself
Come on. There’s a reason Windows users all want to install crappy security products: they’ve been routinely having their files encrypted and held for ransom for the last decade.
And Linux/BSD generally would not help here. Ransomeware is just ordinary file IO and is usually run "legitimately" by phished users rather than actual code execution exploits
I have a similar disdain for security bloatware with questionable value, but one actually effective corporate IT strategy is using one of those tools to operate a whitelist of safe software, with centralized updates
I think having a Linux/BSD might be helpful here in the general case, because the culture is different.
In Windows land it's pretty much expected that you go to random websites, download random executables, ignore the "make changes to your computer?" warnings and pretty much give the exe full permission to do anything. It's very much been the standard software install workflow for decades now on Windows.
In the Linux/BSD world, while you can do the above, people generally don't. Generally, they stick to trusted software sources with centralized updates, like your second point. In this case I don't think it's a matter of capability, both Windows and Unix-land is capable of what you're suggesting.
I think phishing is generally much less effective in Max/Linux/BSD world because of this.
Until a a lucrative contract requires you to install prescribed boutique windows-only software from a random company you've never heard of, and then it's back to that bad old workflow.
Yeah, because no one on Linux or Mac would clone a git repo they just found out about and blindly run the setup scripts listed in the readme.
And no one would pipe a script downloaded with wget/curl directly into bash.
And nobody would copy a script from a code-formatted block on a page, paste it directly into their terminal and then run it.
Im not going to go so far as to claim that these behaviors are as common as installing software on Windows, but they are still definitely common, and all could lead to the same kinds of bad things happening.
I would agree this stuff DOES happen, but typically in development environments. And I also think its crappy practice. Nobody should ever pipe a curl into sh. I see it on docs sometimes and yes, it does bother me.
I think though that the culture of robust repositories and package managers is MUCH more prominent on Mac/iOS/Linux/FreeBSD. It's coming to Windows too with the new(er) Windows store stuff, so hopefully people don't become too resistant to that.
A developer is much more likely to be able to fix their computer and/or restore from a backup than a typical user is. A significant problem is cascading failures, where one bozo installing malware either creates a business problem (e.g. allowing someone to steal a bunch of money) or is able to disable a bunch of other computers on the same network. It is not that common for macOS to be implicated in these sorts of issues. I know people have been saying for a long time that it’s theoretically possible but it really doesn’t seem that common in practice.
I'd wager if Linux had the same userbase as Windows, you'd see more ransomware attacks on that platform as well. Nothing about Linux is inherently more secure.
> Yeah I don't get where this "Linux is more secure" thing comes from.
It comes from the 1990s and early 2000s. Back then, Windows was a laughingstock from a security point of view (for instance, at one point connecting a newly installed Windows computer to the network was enough for it to be automatically invaded). Both Windows and Linux have become more secure since then.
> Basically any userspace program can read your .aws, .ssh, .kube, etc... The user based security model desktops have is the real issue. Compare that with Android and iOS for instance. No one needs anti-virus bloatware, just because apps are curated and isolated by default.
Things are getting better now, with things like flatpak getting more popular. For instance, the closed-source games running within the Steam flatpak won't have any access to my ~/.aws or ~/.ssh or ~/.kube or etc.
What fraction of ransomware attacks would these security products have prevented exactly? Windows already comes with plenty of monitoring and alerting functionality.
Probably close to none at some point. They may block some things.
But most of Windows falling to this is that it’s what people use. The only platform that is somewhat actually protected against attacks is the iPhone - the Mac can easily be ransomwared it’s just the market is so small nobody bothers attacking it; no ROI.
Yeah. The mobile ecosystems are what real security design looks like. Everything is sandboxed, brokered, MACed, and fuzzed. We should either make the desktop systems work the same way or generalize the mobile systems into desktops.
The mobile ecosystem is what corporate IT should be. Centralized app store, siloed applications, immutable filesystem (other than the document part for each application), then VM and specials computers for activities like development. However locked iOS can be, most upgrades happen without an hitch, and no need for security software.
Hard to say, but windows defender doesn't stop as many as EDR's can. There are actual tests for this, ran by independent parties that check exactly this. Defender can be disabled extremely easily, modern EDRs cannot.
Yes, average Windows users are significantly less tech literate due to obvious reasons and there are way more of them. This create a very lucrative market.
How is desktop Linux somehow inherently particularly more secure than Windows?
Yes, the original Parkinson’s Law paper contains an equation:
https://en.wikipedia.org/wiki/Parkinson's_Law
> The growth was presented mathematically with the formula x = (2k^m + P)/n, in which k was the number of officials wanting subordinates, m was the hours they spent writing minutes to each other.
(check the original paper for details since this obviously doesn’t explain what n is)
> The reason we have 3204 2B allocations is that each of the 2B contains info (latent space dimension).
I think the author is more correct than you are. It is not necessarily the case that we need 3,204 dimensions to represent the information contained in the tokens; in fact, the token embeddings live in a low-dimensional subspace; see footnote 6 here:
> We performed PCA analysis of token embeddings and unembeddings. For models with large d_model, the spectrum quickly decayed, with the embeddings/unembeddings being concentrated in a relatively small fraction of the overall dimensions. To get a sense for whether they occupied the same or different subspaces, we concatenated the normalized embedding and unembedding matrices and applied PCA. This joint PCA process showed a combination of both "mixed" dimensions and dimensions used only by one; the existence of dimensions which are used by only one might be seen as a kind of upper bound on the extent to which they use the same subspace.
So some of the embedding dimensions are used to encode the input tokens and some are used to pick the output tokens (some are used for both), and everything else is only used in intermediate computations. This suggests that you might be able to improve on the standard transformer architecture by increasing (or increasing and then decreasing) the dimension, rather than using the same embedding dimensionality at each layer.
> Secondly, his description of each layer's function as adding information to the original vector misses the mark IMO--it is more like the original input is convolved with the weights of the transformer into the output. I am probably missing the mark a bit here as well.
> Lastly, his statement that the embedding vector of the final token output needs all the info for the next token is plainly incorrect. The final decoder layer, when predicting the next token, uses all the information from the previous layer's hidden layer, which is the size of the hidden units times the number of tokens so far.
I think the author is correct. Information is only moved between tokens in the attention layers, not in the MLP layers or in the final linear layer before the softmax. You can see how it’s implemented in nanoGPT:
https://github.com/karpathy/nanoGPT/blob/f08abb45bd2285627d1...
At training time, probabilities for the next token are computed for each position, so if we feed in a sequence of n tokens, we basically get n training examples, one for each position, but at inference time, we only compute the next token since we’ve already output the preceding ones.
> That's some major scapegoating of flight crew, when their airline clearly has not maintained the plane properly and caused engine failure in the first place and airport failed to provide adequate ground control support.
Most disasters result from multiple things going wrong together, hence the importance of addressing them individually.