The model weights in eg TensorFlow are the source code.
It is not a von-Neumann architecture but a gigabyte of model weights is the executable part, no less than a gigabyte of imperative code.
Now, the training of the model is akin to the process of writing the code. In classical imperative languages that code may be such spaghetti code that each part would be intertwined with 40 others, so you can’t just modify something easily.
So the fact that you can’t modify the code is Freedom 2 or whatever. But at least you have Freedom 0 of hosting the model where You want and not getting charged for it an exorbitant amount or getting cut off, or having the model change out from under you via RLHF for political correctnesss or whatever.
OpenAI has not even met Freedom Zero of FSR or OSI’s definition. But others can.
No, you would need to spend “eye watering amounts of compute” to do it, similar to hiring a lot of developers to produce the code. The compiling of the code to an executable format is a tiny part of that cost.
I still think of millions of dollars of GPU spend crunching away for a month as a compiler.
A very slow, very expensive compiler - but it's still taking the source code (the training material and model architecture) and compiling that into a binary executable (the model).
Maybe it helps to think about this at a much smaller scale. There are plenty of interesting machine learning models which can be trained on a laptop in a few seconds (or a few minutes). That process feels very much like a compiler - takes less time to compile than a lot of large C++ projects.
Running on a GPU cluster for a month is the exact same process, just scaled up.
Huge projects like Microsoft Windows take hours to compile and that process often runs on expensive clusters, but it's still considered compilation.
That’s the dirty secret of why ChatGPT 4 is better. But they’ll tell you it has to do with chaining ChatGPT 3’s together, more fine tuning etc. They go to these poor countries and recruit people to work on training the AI.
Not to mention all the uncompensated work of humans around the world who put their content up on the Web.
The model weights in eg TensorFlow are the source code.
It is not a von-Neumann architecture but a gigabyte of model weights is the executable part, no less than a gigabyte of imperative code.
Now, the training of the model is akin to the process of writing the code. In classical imperative languages that code may be such spaghetti code that each part would be intertwined with 40 others, so you can’t just modify something easily.
So the fact that you can’t modify the code is Freedom 2 or whatever. But at least you have Freedom 0 of hosting the model where You want and not getting charged for it an exorbitant amount or getting cut off, or having the model change out from under you via RLHF for political correctnesss or whatever.
OpenAI has not even met Freedom Zero of FSR or OSI’s definition. But others can.