It turns in top-level performance on original, out-of-distribution problems given in international math and programming competitions, but it's "not comparable to a human level." Got it.
Yes, it's officially atheist because there's only room for one god figure, who happens to be a man. Christianity and Islam are "officially atheist" in the same absurd way. In NK the one permissible exception is not called Allah or Yahweh but Kim.
You know, the guy whose portrait hangs in everyone's home in the exact same spot where you'd find a crucifix in a southern American home.
And no, the religious nature of personality cults is not a fallacy. If anything, No True Scotsman applies to claims that a personality cult is not a "real religion." They are absolutely indistinguishable from theistic religions, except for the minor, ignorable detail that the god is alive and walking around.
Of course there's also a strong component of ancestor worship in the cult of the Kims. The portrait or other object of veneration is as likely to feature il-Sung as one of the other two.
Great read, but he undersells the weight of von Neumann's EDVAC report. If you haven't read that (which I imagine you have), it's crazy how prescient some of the lesser-known ideas are. He seemed to assume that we'd end up with some kind of neural architecture, and it's easy to imagine him being surprised that it took us this long to get serious about the idea.
Apropos of that, I couldn't resist telling Gemini 3 to run with your story prompt from the earlier thread: https://gemini.google.com/share/ac122aba6f7f. Thanks for the inspiration, apologies for following it. :-P
(Also thanks for posting the material you wrote back in the 1980s on the SCP initiative. I had heard of it as an SDI connection or component, but that was all. Reading through it now.)
What generation had CarPlay disabled? It works very nicely in 95B.2 and .3, and the pre-facelift 95B models with PCM 3 didn't support CarPlay at all without an adapter, did they?
I know full-screen CarPlay isn't supported without a jailbreak, but I don't care about that myself so haven't done it.
At this point they've contributed a reasonably-fair share of open-source code themselves.
No one benefits from locking up 99.999% of all source code, including most of Microsoft's proprietary code and all GPL code.
No one.
When it comes to AI, the only foreseeable outcome to copyright maximalism is that humans will have to waste their time writing the same old shit, over and over, forever less one day [1], because muh copyright!!!1!
Clearing those rights, which don't actually exist yet, would have been utterly impossible for any amount of money. Thousands of lawyers would tie up the process in red tape until the end of time.
The basic premise of the economy is people do stuff for money. Any rights holder debating with their punishing house or whatever just means they don’t get paid. Some trivial number of people would opt out, but most authors or their estates would happily take an extra few hundred dollars per book.
YouTube on the other hand has permission from everyone uploading videos to make derivative works barring some specific deal with a movie studio etc.
Now there’s a few exceptions like large GPL works but again diminishing returns here, you don’t need to train on literally everything.
The GPL arose from Stallman's frustration at not having access to the source code for a printer driver that was causing him grief.
In a world where he could have just said "Please create a PDP-whatever driver for an IBM-whatever printer," there never would have been a GPL. In that sense AI represents the fulfillment of his vision, not a refutation or violation.
I'd be surprised if he saw it that way, of course.
The safeguards will prevent the AI from reproducing the proprietary drivers for the IBM-whatever printer, and it will not provide code that breaks the DRM that exist to prevent third-party drivers from working with the printer. There will however be no such safeguards or filters to prevent IBM to write a proprietary driver for their next printer, using existing GPL drivers as a building block.
I wish you luck. The music industry basically won their fight in forcing safeguards against AI music. The film industry are gaining laws regulating AI film actors. The code generating AI are only training on freely accessible code and not proprietary code. There is multiple laws being made against AI porn all over the world (or possible already on the books).
What we should fight is Rules For Thee but Not for Me.
The music industry basically won their fight in forcing safeguards against AI music. The film industry are gaining laws regulating AI film actors. The code generating AI are only training on freely accessible code and not proprietary code. There is multiple laws being made against AI porn all over the world (or possible already on the books).
Yeah, well, we'll see what our friends in China have to say about all that.
That's the inverse. Mass surveillance is bad so it should be banned, vs. using AI to thwart proprietary lock-in is good and so shouldn't be banned.
But also, is the inverse even wrong? If some store has a local CCTV that keeps recordings for a month in case someone robs them, there is no central feed/database and no one else can get them without a warrant, that's not really that objectionable. If Amazon pipes the feed from every Ring camera to the government, that's very different.
By "everywhere" I obviously don't mean "on your private property", I mean "everywhere" as in "on every street corner and so on".
If people are OK with their government putting CCTVs on every lamp post on the promise that they are "secure" and "not used to aggregate data and track people" and "only with warrant" then it's kind of "I told you so" when (not if) all of those things turn out to be false.
> using AI to thwart proprietary lock-in is good and so shouldn't be banned.
It's shortsighted because whoever runs LLMs isn't doing it to help you thwart lock in. It might for now but then they don't care about anything for now, they steal content as fast as they can and they lose billions yearly to make sure they are too big too fail. Once they are too big they will tighten the screws and literally they have the freedom to do whatever they want as long as it's legal.
And surprise helping people thwart lock-in is relatively much less legal (in addition to much less profitable) than preventing people from thwarting lock-in.
It's kind of bizarre to see people thinking these LLM operators will be somehow on the side of freedom and copyleft considering what they are doing.
> By "everywhere" I obviously don't mean "on your private property", I mean "everywhere" as in "on every street corner and so on".
If they're on each person's private property then they're on every street corner and so on. The distinction you're really after is between decentralized and centralized control/access, which is rather the point.
> It's kind of bizarre to see people thinking these LLM operators will be somehow on the side of freedom and copyleft considering what they are doing.
You're conflating the operators with the thing itself.
LLMs exist and nobody can un-exist them now because they're really just code and data. The only question is, are they a thing that does what you want because there are good published models that anybody can run on their own hardware, or are the only up-to-date ones corporate and censored and politically compromised by every clodpoll who can stir up a mob?
You really try hard to misunderstand it. A small shop has own cctv to catch intruders = one thing. Local company installing cctv everywhere = different thing. In practice they can be both supplied by one company, centralized and unified and sold and fighting ANY cctv is ultimately the winning move.
> LLMs exist and nobody can un-exist them now because they're really just code and data
"Malware exists and nobody can unexist it now because it's just code and data"
> A small shop has own cctv to catch intruders = one thing. Local company installing cctv everywhere = different thing.
But that's the thing you were implying couldn't be distinguished. Every small shop having its own CCTV is different than one company having cameras everywhere, even if they both result cameras all over the place.
> "Malware exists and nobody can unexist it now because it's just code and data"
Which is accurate. Even if you tried to ban malware, or LLMs, they would still be produced by China et al. And malware is by definition bad, so you're also omitting the thing that matters again, which is that we should not ban the LLMs that aren't bad.
You don't get to unilaterally make laws for the rest of us, which is what you are trying to do when you throw around terms like "stealing" in contexts where they have no legal meaning. Sorry.
If the incumbent copyright interests insist on picking an unnecessary fight with LLMs or AI in general, they will and must lose decisively. That applies to all of the incumbents, from FSF to Disney. Things are different now.
I see; the laws aren't in question or in flux, but it's the judges who are wrong. Enlightening.
I still don't understand how copyright maximalism has suddenly become so popular on a site called "Hacker News." But it's early here, and I'm sure I'm not done learning exciting new things today.
> like LLM or NFT or killer drones, malware isn't bad for somebody.
Malware isn't bad for Russian crime syndicates, but we're generally content to regard them as the adversary and not care about their satisfaction. That isn't the case for someone who wants to use an LLM to fix a bug in their printer. They're doing the good work and people trying to stop them are the adversary.
> which LLM is not made by stealing copyleft code?
Let's drive a stake through this one by going completely the other way. Suppose you train an LLM only on GPL code, and all the people distributing and using it are only distributing its output under the GPL. Regardless of whether that's required, it's allowed, right? How would you accuse any of those people of a GPL violation?
But that isn't the same code that you were running before. And like, let's not forget GPLv3: "please give me the code for a mobile OS that could run on an iPhone" does not in any way help me modify the code running on MY iPhone.
Sure it does. Just tell the model to change whatever you want changed. You won't need access to the high-level code, any more than you need access to the CPU's microcode now.
We're a few years away from that, but it will happen unless someone powerful blocks it.
I believe the point was that iPhones don't even allow running custom code even if you have the code; whereas GPLv3 mandates that any conveyed form of a work must be replacable by the user. So unless LLMs easily spit out an infinite stream of 0days to exploit to circumvent that, they won't help here.
In said hypothetical world, though, the whatever-driver would also have been written by LLMs; and, if the printer or whatever is non-trivial and made by a typical large company, many LLM instances with a sizable amount of token spending over a long period of time.
So getting your own LLM rewrite to an equivalent point (or, rather, less buggy as that's the whole point!) would be rather expensive; at the absolute very least, certainly more expensive than if you still had the original source code to reference or modify (even if an LLM is the thing doing those). Having the original source code is still just strictly unconditionally better.
Never mind the question of how you even get your LLM to reverse-engineer & interact with & observe the physical hardware of your printer, and whatever wasted ink during debugging of the reinvention of what the original driver already did correctly.
Now I'm kind of curious if you give an LLM the disassembly of a proprietary firmware blob and tell it to turn it into human-readable source code, how good is it at that?
You could probably even train one to do that in particular. Take existing open source code and its assembly representations as training data and then treat it like a language translation task. Use the context to guess what the variable names were before the original compiler discarded them etc.
The most difficult parts of getting readable code would be dealing with inlined functions and otherwise-duplicated code from macros or similar, and dealing with in-memory structure layouts; both pretty complicated very-global tasks. (never mind naming things, but perhaps LLMs have a good shot at that)
All of them recognized the thrM exception path, although I didn't review them for correctness.
That being said, I imagine the major showstopper in real-world disassembly tasks would simply be the limited context size. As you suggest, a standard LLM isn't really the best tool for the job, at least not without assistance to split up the task logically.
Those first two indeed look correct (third link is not public); indeed free chatgpt is understandably not the best, but I did give it basically the smallest function in my codebase that does something meaningful, instead of any of the actually-non-trivial multi-kilobyte functions doing realistic things needing context.
Would be interesting to push the models with a couple of larger functions, if you have some links you'd like me to try.
I have paid pro accounts on all three, but for some reason Gemini is no longer allowing links to be shared on some queries including this one. All it would let me do is export it to Docs, which I thought would be publicly visible but evidently isn't.
Actually, even finding a larger function that would by itself have a meaningful disassembly is posing problematic; basically every function deals with in-memory data structures non-trivially, and a bunch do indirect jumps (function pointers, but also lookup-table-based switches, which require table data from memory in addition to assembly to disassemble).
(I'm keeping the other symbol names there even though they'd likely not be there for real closed-source things, under the assumption that for a full thing you'd have something doing a quick naming pass beforehand)
This is still very much on the trivial end, but it's already dealing with in-memory structures, three inlined memory allocation calls (two half-deduplicated into one by the compiler, and the compiler initializing a bunch of the objects' fields in one store), and a bunch of inlined tagged object manipulations; should definitely be possible to get some disassembly from that, but figuring out the useful abstractions that make it readable without pain would probably take aggregating over multiple functions.
(unrelated notes of your previous results - claude indeed guessed correctly that it's BQN! though CBQN is presumably wholesale in its training data anyway; it did miss that the function has an unused 0th arg (a "this" pointer), which'd cause problems as the function is stored & used as a generic function pointer (this'd probably be easily resolved when attempting to integrate it in a wider disassembly though); neither claude nor cgpt unified the `x>>48==0xfff7` and `(x&0xffff000000000000)==0xfff7000000000000` which do the exact same thing but clang is stupid [https://github.com/llvm/llvm-project/issues/62145] and generates different things; and of course a big question is how many such intricacies could be automatically reduced down with a full codebases worth of context, cause understandably the single-function disassemblies are way way more verbose than the original)
Should be possible. A couple of years ago I used an earlier ChatGPT model to understand and debug some ARM assembly, which I'm not personally very familiar with.
I can imagine that a process like what you describe, where a model is trained specifically on .asm / .c file pairs, would be pretty effective.
The only legal way to do that in the proprietary software world is a clean room implementation.
An AI could never do a clean room implementation of anything, since it was not trained on clean room materials alone. And it never can be, for obvious reasons. I don't think there's an easy way out here.
Google's engineers when they were copying Java API for Davlik (and later ART), they had access to and consulted Java source code. The infamous Oracle v. Google judgement siding Google set precedent at the highest level, SCOTUS that looking at the code is not an issue.
So, it doesn't matter if a AI can or cannot do clean room implementation. Unless it is a patent or trade secret violation, cleam room implementation doesn't matter.
reply