To be fair if they released detailed instructions and datasets on how to rebuild llama (considering that there's some randomness in the process) you still probably wouldn't be able to build it, like who has the resources? And if you had the resources you probably _still_ probably wouldn't _want_ to rebuild it yourself, it seems awfully expensive when you could instead spend those resources elsewhere.
Fair point about the license, people have different definitions for what "open source" means.
That's true for most people for ordinary software too. How many people actually build Linux or Chromium from source? Building Chromium takes more RAM and HD space than most people even have. Yet the world gets immense value from the few who do. I wouldn't want to live in a world where WebKit and Chromium were closed source. You can run a Chromium fork without having to build it yourself. And compute costs will come down over time.
> Building Chromium takes more RAM and HD space than most people even have.
According to [1], it takes 16GB of RAM and ~180GB of disk space. Most people have that much. It does take several hours without a many-core machine though.
I would bet overall most people have those 4GB RAM, 32GB eMMC laptops from walmart, etc. If you limit things to developers/gamers/enthusiasts, you'd probably be right.
Linux and Chromium seem at the edge of the current scale of "ordinary" open-source software. I think perhaps one should also take into account how much money would be needed to be able to build the thing in reasonable time.
Building Chromium sounds awful, but I'm not sure I'd really need to buy another computer for that. If I did I'm sure I wouldn't need to spend billions on it, most probably not even millions.
For LLaMa I definitely don't have the computer to build it, I definitely don't have the money to buy the computer, even if I won the lottery tomorrow I'm pretty sure I wouldn't have enough money to buy the hardware, even if I had enough money to buy the hardware I'm still not sure I could actually buy it in reasonable time, nvidia may be backlogged for a while, even if I already had all the hardware I probably wouldn't want to retrain llama, and even if I wanted to retrain it the process is probably going to take weeks if not months at best.
Like I think it's one of those things where the difference in magnitude creates a difference in kind, one can't quite meaningfully compare LLaMa with the Calculator app that Ubuntu ships with.
It's probably worth to play around with different prompts and different board positions.
For context this [1] is the board position the model is being prompted on.
There may be more than one weird thing about this experiment, for example giving instructions to the non-instruction tuned variants may be counter productive.
More importantly let's say you just give the model the truncated PGN, does this look like a position where white is a grandmaster level player? I don't think so. Even if the model understood chess really well it's going to try to predict the most probable move given the position at hand, if the model thinks that white is a bad player, and the model is good at understanding chess, it's going to predict bad moves as the more likely ones because that would better predict what is most likely to happen here.
Apparently I can find some matches for games that start like that between very strong players [1], so my hypothesis that the model may just be predicting bad moves on purpose seems wobbly, although having stockfish at the lowest level play as the supposedly very strong opponent may still be throwing the model off somewhat. In the charts the first few moves the model makes seem decent, if I'm interpreting these charts right, and after a few of those things seem to start going wrong.
Either way it's worth repeating the experiment imo, tweaking some of these variables (prompt guidance, stockfish strength, starting position, the name of the supposed players, etc.).
Interesting thought the LLM isn’t trying to win, it’s trying to produce data like the input data. It’s quite rare for a very strong player to play a very weak one. If you feed it lots of weak moves it’ll best replicate the training data by following with weak moves.
The experiment started from the first move of a game, and played each game fully. The position you linked was just an example of the format used to feed the game state to the model for each move.
What would "winning" or "losing" even mean if all of this was against a single move?
Yes exactly, at some point I asked to maintain it and kinda redid it. Now I kinda consider it "done", as in "maybe some more work would be put into it, but by end large I don't think it's going to change in the future".
Civet has so many quality of life improvements! It's good that it exists sort of as a playground for ideas that could maybe in the future be adopted by JS itself, kinda like how it went with CoffeeScript.
This seems a bit different from the kind of signals frameworks have, where dependencies are tracked automatically (no dependency array), and you can sort of chain stuff automatically, so for example you can have an effect that depends on 3 memos that depend on 4 signals or whatever else, and you never experience non-fresh values.
If you want to look a bit deeper into this I had written another sort of toy implementation that much more closely resembles what the frameworks are actually doing: https://github.com/fabiospampinato/flimsy