Given the valuation of these companies the upside of "just lieing" is unfortunately high.
For example taking a step back, it's crazy that people accept the idea of "AGI" (which drives the valuation partially) at face value without any evidence.
I would be shocked if there was any accountability though.
I have written a lot of Zig comptime code and ended up finding the opposite. In C++ I find I have to bend over backward to get what I want done, often resulting in insane compile times. I've used metaprogramming libraries like Boost Hana before to have some more ergonomics, but even that I would consider inferior to comptime.
Out of curiosity, do you happen to have any examples of what you describe, where C++ is more powerful and expressive than Zig?
I think you could argue that competition indeed makes the market efficient, but that we shouldn't conflate that with capitalism itself. Capitalism, in my opinion, can at times prevent competition due to the required capital investment to compete. E.g. even OpenAI with their golden bullet couldn't get there without the capital investments from big tech? Might be wrong here of course.
Another interesting pattern is the ability to generate structs at compile time.
Ive ran experiments where a neural net is implemented by creating a json file from pytorch, reading it in using @embedFile, and generating the subsequent a struct with a specific “run” method.
This in theory allows the compiler to optimize the neural network directly (I havent proven a great benefit from this though). Also the whole network lived on the stack, which is means not having any dynamic allocation (not sure if this is good?).
I've done this sort of thing by writing a code generator in python instead of using comptime. I'm not confident that comptime zig is particularly fast, and I don't want to run the json parser that generates the struct all the time.
Another thing I tried as an alternative is using ZON (zig object notation) instead of json. This can natively be included directly as a source file. It involved writing a custom python exporter though (read: I gave up).
FWIW the goal for comptime Zig execution is to be at least as fast as Python. I can’t find it now but I remember Andrew saying this in one of his talks at some point.
I think if you integrated with the build system, yes, Zig can do things only when the file changed. But I'm not sure that Zig figured out incremental comptime yet. That's way harder to accomplish.
They become quite long, but it was surprisingly tolerable. I recall it vaguely but a 100MB neural network was on the order of minutes with all optimizations turned on. I guess it would be fair to say it scaled more or less linearly with the file size (from what I saw). Moreover I work in essentially a tinyml field so my neural networks are on the order of 1 to 2 MB for the most part. For me it wouldve been reasonable!
I guess in theory you could compile once into a static library and just link that into a main program. Also there will be incremental compilation in zig I believe, maybe that helps? Not sure on the details there.
I suppose the issue is a data problem, there being relatively little high quality data explaining how things should be solved in binary. As such making the learning mapping between prompt (english) and good solution (binary) difficult.
But compiled code loses a lot of the "extra" data. Also these are "language" models so I would be surprised if training on binaries was much more efficient versus writing in some kind of language.
Besides, how do you even check the result now without running untrusted code? Every run of the model you need to reverse-engineer the binary?
In the end programming languages are tools, and tools are often designed to
fit well in the existing ecosystem. E.g. python is “easy to type”, thus designed for the keyboard “meta”. Similarly, if one designs a language with synergy with LLMs in mind, it may yield productivity boosts.
For example, one could conceive a language that is very safe, but to achieve this it may perhaps also be very verbose. Such a language may be horrid for humans, but perhaps fine for LLMs.
But it still has to be auditable by humans, so I imagine some sort of LLM tool library over an existing language makes sense. Might be wrong! But langchain tools and pydantic schemas for Input/Output feel like the right abstraction
I can see the argument though, anything moving in that direction already?
Not that I know of! Its an interesting idea though, as you say it should remain auditable.
Along that line me wonder if it were possible to design an LLVM output (i.e. can work with existing code) that is extra well optimized for interop with a specialized LLM, e.g. encoding more information more compactly or something.
Interesting. I wonder if its due to it having been trained on data from Humans. E.g. I guess humans “scheme” often when for example selling things, like in advertising: “I know how to solve your problem, and the best way is with my thing!”.
In that view perhaps the contrary, the thing not scheming, would be more surprising.
The prediction from the LessWrong folks was that it’s inevitable that a rational actor with “goals” would do this, so models trained purely on RL would exhibit this too. (Instrumental Convergence is the name for the theory that predicts power-seeking in a generalized way.)
I agree that we should expect LLMs to be particularly vulnerable to this as you note. But it seems to me that LLMs seem to be absorbing some understanding of human morality too, which might make it possible to steer them into “the best of us” territory.
Are LLMs / AI attaining better results than the data they were trained on? For me, the answer is no: LLMs are always imperfectly modeling the underlying distribution of the training dataset.
Do we have sufficient data that spans the entire problem space that SWE deals with? Probably not, and even if we did it would still be imperfectly modeled.
Do we have sufficient data to span the space of many routine tasks in SWE? It seems so, and this is where the LLMs are really nice: e.g., scripting, regurgitating examples, etc.
So to me, much like previous innovation, it will just shift job focus away from the things the innovation can do well, rather than replacing the field as a whole.
One pet theory I have is that we currently suck at assessing model performance. Sure, vibes-based analysis of the outputs of the model make them look amazing. But is that not the literal point of RLHF? But how good are these outputs really?
For example taking a step back, it's crazy that people accept the idea of "AGI" (which drives the valuation partially) at face value without any evidence.
I would be shocked if there was any accountability though.