Hardly. I've played a lot with the 7,13, and 30B llamas as well as the 7 and 13B alpacas fine tuned by Stanford. They do not have emergent abilities like being able to generate rhymes or, say, represent a movie plot as emoji. Even openai's old text-davinci-003 (gpt3.5, but text completion, not the chat ones) far outperforms them. That said, I have hopes for a 65B 3-bit quantized alpaca-fine tuned. We'll see when someone spends the money to do the (more costly) 65B training. The alpacas are also much more likely to go off rails and start regurgitating their fine-tuning inputs. Either that or openai is doing a lot of post processing on their end to hide the same problems in their LLM.
For now my IRC bots run the alpaca 7B 4-bit. 13B was not a significant improvement for twice the computational time. But it's best to learn them now because as soon as openai gets sued for the first time all the turing test passing older models without the legal-butt-covering bolted on will be removed.
For me the easiest comparison between models is to give it an absurd but entirely possible request, like "Write me a fanfic where the Animorphs battle the truck from Duel, but in the style of Mark Twain". So far nothing else I've tried has done even as well as GPT 3.5 yet, let alone GPT 4.
> Can you give separate summaries of the Animorphs book series and the film Duel?
The reply:
> Animorphs Book Series Summary: The Animorphs series, written by K.A. Applegate, is a popular young adult science fiction book series spanning 54 books and several companion novels. The story revolves around five teenagers—Jake, Rachel, Tobias, Cassie, and Marco—who are granted the power to morph into any animal they touch by a dying alien named Elfangor. The teens, known as the Animorphs, use their new abilities to battle an evil alien race called the Yeerks, who are secretly invading Earth by taking control of humans' minds. Throughout the series, the Animorphs learn about the complexities of war, friendship, and sacrifice as they try to save humanity from the Yeerk invasion.
> Duel Film Summary: Duel is a 1971 television movie directed by Steven Spielberg and based on the short story by Richard Matheson. The film follows a mild-mannered traveling salesman named David Mann (played by Dennis Weaver), who unwittingly finds himself in a deadly game of cat and mouse on a remote California highway. Mann is pursued by a mysterious and relentless tanker truck driver, who seems determined to kill him for no apparent reason. With escalating tension and suspense, the film delves into themes of fear, paranoia, and the primal fight for survival. Duel is notable for being Spielberg's directorial debut, showcasing his skillful storytelling and unique visual style that would later become hallmarks of his career.
Duel is a great film but requires patience compared to a lot of today's films. There's not much action for quite some time -- just a man driving through the desert. Even when things pick up, it's still slow pacing by today's standards. It's from another era, and I doubt anyone would make a film today with the same pacing.
Tried the same question with 30B llama got the following:
The Animorphs book series was a young adult science fiction series written by Katherine Applegate and her husband Michael Grant, under the name K. A. Applegate. The series is made up of fifty-four books, first published between June 1996 and May 2001. The series revolves around five children given the power to transform into animals (the ability to "morph") as they are thrust into a war with an alien species called Yeerks that are determined to take over Earth.
Duel is a television film directed by Steven Spielberg and written by Richard Matheson, based on his short story of the same name. It stars Dennis Weaver as David Mann, a middle-aged salesman driving on a business trip. While traveling down a two lane highway through remote California desert, he is taunted in a life or death duel by the unseen driver of a massive and monstrous Peterbilt 281 tanker truck.
----
You have to remember to prompt it right, since its based on autocorrect just as ChatGPT but a lot there is happening on the background before the text is sendt to the model.... My prompt and settings here was.
---
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:Can you give separate summaries of the Animorphs book series and the film Duel?
----
How exactly do you get it to keep going? Every time I try a prompt like this in the playground it spits out a couple paragraphs and then refuses to generate anything further, even with tokens maxed out.
I never used the OpenAI playground, see if Poe.com will work for you. They give free access to GPT3.5 and Claude, Antrhopic AIs competitor. Paid subscription for GPT4 and Claude+
That is my experience as well. I've tried various models but nothing comes even close to the current ChatGPT implementation (when it manages to stay up).
Highly recommend Animorphs. As an adult the books can be read through pretty quickly, but the story really holds up. Deals with some heavy topics of genocide, enslavement (even exploring the justifications of the enslavers), PTSD among other topics but was still considered a kids book.
It becomes obvious in the middle when some of the books were written by ghost writers, but the books are so easy to read I don't really recommend skipping them. If you must you could probably get away with reading the first ten, last ten, but should definitely read all of the Chronicle books.
Maybe the most striking bit of its exploration of the morality of war is that the series spends something like 45 books building up the alien Andalites as beneficent do-gooders who the team of protagonists expect to swoop in and fix everything (including one of the protagonists who is a young Andalite stranded on Earth), and then it turns out that they've been as morally worn down by the ongoing interstellar war as the protagonists have in their guerrilla conflicts on Earth and are effectively as hostile to humans as the original invaders.
> If you must you could probably get away with reading the first ten, last ten
Not sure that you can. If you were to skip any, probably only 31 through 39 are completely skippable, maybe some of the late 20s but I would still read 29 and 30 at a minimum. Some of the teens and 20s might be skippable after 13 but there’s a fair amount of world-building outside the Chronicles series in the 20s; and 40 onwards is setting up the end game and then the end game. 41 and 48 are both weird but also kind of key towards finalizing the characters of the two cover characters in the end game.
EDIT: actually 33 and 38 shouldn’t be skipped either. They’re Tobias and Ax books and there’s so few of those that they’re all kind of essential, but maybe the Tobias books just a little bit more essential.
I’m impressed by how ChatGPT-like it is but also it’s saying things like
“No, OpenAI does not have an API for dogs. They do, however, have an API for other animals, such as cats. To retrieve an image of a cat, you can use the OpenAI API for Dogs API and select the cat breed or type.”
I've tried all sorts of parameters including those exact ones. As for the huggingface stuff, it's not exactly clear how to use it without going down the python dependency rabbit hole. I am not confident I could get the correct python packages all together on Debian 11 to support running it. The llama.cpp stuff is very simple to compile and run comparatively.
> the alpaca 7B _4-bit_ [and presumably also 4bit for the 13B, 30B and larger parameter sets]
This is the wild card here, though, isn't it? OpenAI's chatGPT likely uses more than 4 bits for it's parameters. IIRC the original LLaMA params were 16bit floats and they were quantitized down to 4bit - considering that large amount of compression, they sill do pretty OK, but not as good as chatGPT. I wonder how the alpaca/LLaMA models would do with 16bit floating point params (as they were originally trained)? What if they would have gone with 8 bits for the params as a compromise?
EDIT: Come to think of it, unless you're using vectorized ops on a CPU, 4 bit and 8 bit math is going to run at the same speed (for most popular CPUs), is it not? So why did they go all the way down to 4 bits instead of stopping at 8 bits (other than to make the param files 1/2 the size)?
EDIT2: looking through the alpacca.cpp code and there is mention of AVX, AVX2, AVX512 (and NEON on ARM) so it probably is taking advantage of vectorized ops where that's possible.
Not an expert on the matter so take this with a grain of salt, but I’d say the compression is also about VRAM/RAM, which seems to be the more limiting factor over inference speed.
I think it’s going to be like movie special effects.
When Jurassic park first came out, or even something like Star Trek next gen. It looked AMAZING. So so realistic. But then…. As time goes on new things showed us what realistic could be.
I think we actually got better at seeing.
Same thing here. The more time you spend with it the more you notice things that don’t quite work. And then the new thing solves those problems, but we’ll find more wrongness
The problem is that you’ve identified two distinct and non-overlapping sets of people as “everyone”. Everyone who was applauding 3.5 when it came out were industry hype people. Even the critical voices were industry hype people, paid to assume the AI is powerful and write about the possible negative consequences of that assumption.
Now we’ve all gotten familiar with 3.5, and we’ve come to understand its limitations, so the public knows it’s not a “godlike” AI.
Luckily there’s a fresh new model, not technically different from the earlier one but it cost more money to build. The hype group can start again, citing the publicly known limitations of 3.5. But in 6 months we’ll understand what’s wrong with it, and the public will be talking about the limitations, just in time for 4.5.
In my personal testing, I throw some sophisticated use cases at LLMs - particularly chain of thought reasoning. None of the models out there are able to do this this well, except for the OG GPT-3 Davinci-003. Even the newer turbo models are not as good.
I am playing around with GPT-4 this week though. Let’s see how that goes.
The newer turbo models are the ChatGPT models, and are worse than text-davinci-003, in my experience. The gpt-4 model is also not as good as the GPT-4 chat version, which is very odd.
fwiw 7B is totaly useless for the subset of non english languages i've used, 13B a bit less so, but nowhere near as good as gpt.
GPT's performance in non-trivial translation tasks is unbelievable. all those articles mentioning jobs that are going to be replaced fail to mention translators are probably going to be the first.
You need the translators to QA the output from GPT. It's less work but not much less, and more types of translation work becomes feasible when leveraging GPT. I'm guessing the job market for translators will grow, not decline.
"Specifically, GPTQ can quantize GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline."
This would be 175 billion 3 bit weights instead of 175 billion 16 (or 32!) bit weights. It massively reduces the size of the model. It makes loading it in ram on consumer computers feasible. The number of parameters stays the same.
I've read the paper and to be honest I'm not sure what to make of it. Their headline benchmark is perplexity on WikiText2 which would not be particularly relevant to most users. If you look at the tables in the appendix A.4 with some more relevant benchmarks you'll sometimes find that straight RTN 4 bit quantisation beats both GPTQ and even full 16 bit original! No explanation of it is given in the paper.
Some of those benchmarks have a pretty small sample size IIRC, might just be coincidence that the noise introduced by RTN just happens to slightly improve them.
GPTQ beats RTN on almost every benchmark at almost every size, though.
Hm. I haven’t tried the local installs yet. However, when the Alpaca web demo was live, I did find it to be comparable (though not quite as capable) to davinci-003. It answered arbitrary factual questions about pop culture references, law, medicine, and programming. It generated rhymes and poems. (I didn’t try asking for the emoji thing, so can’t say anything about that.) It performed natural language tasks such as information extraction and summarization. And it did all of it coherently.
Be aware this file is a single ~8GB 4-bit model (ggml-alpaca-13b-q4.bin) instead of the 2x ~4GB models (ggml-model-q4_0.bin, ggml-model-q4_0.bin.1) that most llama.cpp style inference running programs expect. You'll probably have to edit the line,
n_parts = LLAMA_N_PARTS.at(hparams.n_embd);
in chat.cpp (or main.cpp) to hard code it to treat this 1 file model properly like,
n_parts = 1;
Or re-write the parameter config subroutine to recognize and handle non-standard weights file.
This sounded like a really cool idea but I asked ChatGPT to do this for the plot of the movie The Shawashank Redemption and there is no way that I would ever have been able to guess that movie from the emojis it gave me. Perhaps GPT-4 does a better job at it.
The incredible contribution of Alpaca is showing the world how to efficicently train LLM on instructions. The fact that it did so on 52k instructions generated by GPT is poetic.
It does not matter what current capabilities of open source models are, because this opens the door to tremendous democratization of the ability to train and self-deploy these models.
In less than 6 months we will have open source models with gpt3-like capabilities, running locally on laptops, and potentially in phones and web browsers.
I've used both the 7B and 13B instruction tuned llama weights (quantized using the llama.cpp scripts). Either I am doing something wrong, or these two models are no-where near the level of ChatGPT. Many times they return something totally irrelevant to my question, stop responding, use a different language, or otherwise return the wrong answer. ChatGPT does none of this. (other than the wrong answer due to hallucinating sometimes...)
Reading through the README and issues on the llama.cpp project, there is some speculation that there is a bug in the quantization, or possibly a bug in the inference (less likely I think).
I hope this is true and once fixed the models can perform up to or past the ChatGPT level. If its not true and these models are performing correctly, then either the metrics used to compare it to GPT is garbage and don't capture the real world uses, or the instruction tuning done by the Stanford team is not up to par.
LLama hasn't been fine-tuned with RLHF, so it requires additional prompting, check out the open-assistant[0] project for an open-source ChatGPT equivalent (WIP).
> What does OpenAI do?
OpenAI is a company founded by Elon Musk and Reid Hoffman in 2015 to develop artificial intelligence products for the general public. They are most well known for releasing a large language model, GPT-3, in 2017, which was developed using a method called deep learning.
OpenAI focuses on creating tools and technologies that allow developers to create AI applications that are accessible to the general public. They are particularly interested in developing tools that can help developers build AI applications that can perform tasks that are important to society, such as in the areas of healthcare, education, and safety.
In addition to GPT-3, OpenAI has also released other AI tools and technologies, including an open-source platform for training AI models, called OpenAI Research, and a library of pre-trained AI models for use in various applications.
Overall, OpenAI’s goal is to make AI technology more accessible and useful to the general public, and they are dedicated to creating products that can help create a more sustainable and inclusive future for society.
> How are they funded?
OpenAI is a not-for-profit organization, so it does not rely on any financial support from individuals, corporations, or governments. Instead, it is funded through a variety of sources, including venture capital firms, research institutions, and individual donations.
Some of the largest financial supporters of OpenAI include Microsoft, Amazon, and Google, who have invested millions of dollars into the organization. Additionally, OpenAI has received financial support from a number of other prominent technology companies and research institutions.
Overall, the funding sources for OpenAI reflect the organization’s goal of creating innovative AI tools and technologies that can have a positive impact on society.
> Who are OpenAI’s competitors?
There are many AI competitors to OpenAI, including companies such as Microsoft, Amazon, Google, Facebook, and Apple. These companies are all investing heavily in developing their own AI products and technologies, and they all have their own goals and objectives in the field.
Some of OpenAI’s competitors are specifically focused on creating AI tools and technologies for the general public, while others are focused on creating AI solutions for specific industries or businesses. For example, Microsoft is known for its popular Microsoft Office products, while Amazon is known for its e-commerce platform.
It’s worth noting that the development of AI technology is a rapidly evolving field, and there will likely be many new competitors and advancements in the years to come.
----
Seems pretty decent to me, sure it's not on the level of ChatGPT but it's also 1/14th the size of the original GPT-3.
Alpaca uses Self-Instruct[0] which is better than just the pre-training but I wouldn't expect it to be at the level of ChatGPT (RLHF) in terms of human-friendly prompting. OpenAssistant should make it close to ChatGPT (from GPT-3.5 version) if the LLaMA is as powerful as claimed.
If you use consciousness as a baseline, the intellectual difference between a grade schooler and a PhD is tiny.
This is what I think comparing these bots is like. You can argue that they’re very close. But the delta makes a very big difference for any practical purposes because we’re looking for nuanced capability.
I have not found alpaca to be comparable to chatgpt, but it could be because of bugs in the version I installed through dalai. I might try reinstalling it because I suspect there might be some sort of file corruption issue or whatever.
I gave it the prompt “cats aren’t always fuzzy” and it wrote a lengthy livejournal-esque rambling journal entry about a woman and her husband having money issues. It was funny, but lightyears away from chatgpt.
It does sometimes create some really funny hallucinations though, like inventing prefectures in Japan that don’t exist etc.
I’m also a bit confused by the quantization thing. Why exactly is everybody running the same program on the same file? Why not just include the quantized weights?
It seems like if somebody figured out the “correct” way to quantize the 7b weights it would make way more sense to just torrent the output rather than distribute a fixed program.
> asked GPT to take 175 human-written instruction/output pairs, and start generating more in the same style and format ... through one of OpenAI's helpfully provided APIs, and ... the team had some 52,000 sample conversations to use in post-training the LLaMA model
hmm I wonder if this is essentially a probe[1] technique + relies on chatgpt already having been extensively trained
I tried because I was curious. It's not very good. First I "quantized" the files. Took about 3-4 hours. Then I ran a prompt: "create a short poem about my wife helping me get over the stuffed nose". After 3 hours of 99% CPU usage and using all 32 GB of RAM, I got the following result:
create a short poem about my wife helping me get over the stuffed nose and throat that i've had for days now.
the cold was all around him, but he didnΓÇÖt notice it at first- so much to do in this new place! finally though when they were sitting by themselves on their porch swing his sweetheart leaned into her man saying ΓÇ£we could take some of these leaves from the garden and make tea.ΓÇ¥ He looked up with a quizzical look and saw that she was serious.
I've been under weather since last Tuesday. Today is day 6...and still going strong! Had to cancel two nights on stage, missed my son
I think you can train LLaMA 7B (the model underlying Alpaca) for around $82,000, based on the Meta Research paper about it. Then you can fine-tune it ala Alpaca for a few hundred dollars more.
My wilder speculation is that, if you can shrink the model down to 4GB with llama.cpp 4bit quantization, it may be possible to run it entirely in the browser (ala Stable Diffusion from the other day).
Alpaca is cool but it's also not technically allowed by OpenAI's TOS, and LLaMA is certainly not allowed to be used for non-commercial purposes. With that in mind, OpenAssistant is an Apache 2.0 licensed fully open source alternative that's pretty good (the model is OpenAssistant/oasst-sft-1-pythia-12b): https://huggingface.co/spaces/olivierdehaene/chat-llm-stream....
I've found OA to be better than Alpaca but I'll wait until the 65B 3-bit quantization efforts for Alpaca are underway to compare them.
> Alpaca is cool but it's also not technically allowed by OpenAI's TOS, and LLaMA is certainly not allowed to be used for non-commercial purposes.
Only if you agreed to the ToS or believe that the weights are copyrightable (precedents set by the copyright office and the courts strongly suggest that they aren't). I personally see no issue in using these models for commercial purposes.
You might not but a company will think twice. It's the same reason why companies could theoretically use pirated Windows and Adobe products and get away with it, but most don't because the risk is not worth the reward.
Again, you missed the point. LLaMA is under a non commercial license. I never stated it is illegal, just that a company will not want to violate license terms even if it is legal, simply because getting sued in a civil case is still a risk they wouldn't be willing to take compared to the benefit.
Of course they would apply, if you're using LLaMA at all. If you're using a different model that derives from LLaMA, depending on the license terms, the same license might also apply (similar to how GPL virality works). If you're using a wholly different model, then the model's license applies instead of LLaMA's.
> No, they wouldn't. You clearly don't understand how this works.
Interesting way to continue the conversation, before you edited. I honestly don't understand how you think using LLaMA but denying their license terms is a viable strategy, the courts would just point to the license when Meta sues you for using it commercially. But I'm sure me continuing to explain wouldn't make you understand further.
Wow, Stanford's Alpaca AI project is a real game-changer. The fact that it performs on par with ChatGPT but costs less than $600 to build is both exciting and terrifying. Sure, it's great to see AI becoming more accessible, but it's also a massive wakeup call for the potential misuse of these technologies.
We've got big names like OpenAI, Google, Apple, Meta, Baidu, and Amazon putting in serious time and money to ensure their language models are safe and ethical. However, now that we know it's possible to build powerful AI models on a budget, it's crucial to think about what this means for the future of AI regulation and safety.
This Alpaca AI project is a stark reminder that we need to have a serious conversation about the possible repercussions of AI proliferation. We can't just sit back and assume the big companies will take care of everything. The genie is out of the bottle, and it's time for everyone in the tech community to face the music and take responsibility for the AI revolution.
How do you know agency is not simply the output of a large language model encoded in neurons? What is the difference between neuronal and digital weights?
Considering that we don’t know how the brain works so well, and we don’t understand why LLMs work so well, simply on the basis of their output I think the safest assumption is that these models do indeed have agency, or at least the capability of agency.
I'm using agency in this context to mean: (1) a strong desire to achieve one or more clear goals, beyond survival, and (2) taking concrete steps to achieve those goals.
> How do you know agency is not simply the output of a large language model encoded in neurons?
I'm not sure what you mean here. Is agency an emergent effect of large digital or biological neural network? Maybe! Is it an emergent effect of a large language model? If it is, then it should be clear, or demonstrable, that the model (1) has goals (2) takes concrete steps to achieve those goals.
> What is the difference between neuronal and digital weights?
Brain chemistry works at orders of magnitude less speed, since we're talking about periodically building and releasing an ionic differential between the inside and outside of a cell wall. Moreover, we have a massive number of neurons and a stupidly massive amount of interneuronal connections, with billions of years of training over billions of lineages. Digital weights, in contrast, are a stripped down model of this system that throws out a whole class of complexities like hormones and metabolism.
> I think the safest assumption is that these models do indeed have agency, or at least the capability of agency.
I think the precautionary principle should apply to AI. Assuming and acting as if they have agency, is safer than assuming they don’t.
> it should be clear, or demonstrable, that the model (1) has goals (2) takes concrete steps to achieve those goals.
That definition seems arbitrary to me; many humans wouldn’t pass this test. On the other hand, LLMs certainly seem capable of acting towards specific goals (such as helpfulness). So, I would say that based on your definition, LLMs have agency. But I think you really meant, internally generated goals. Time will tell.
That said, humans who don’t have clear goals can and are coerced into all sorts of damaging behaviours by those who do have goals. So even if I accept that LLMs don’t have their own goals, they can certainly be manipulated to act in favour of the goals of others. That’s effectively what prompt engineering is all about.
So I just think it’s a mistake to make assumptions about these LLMs. We don’t know why they work so well, and it will take a very long time until we do.
In the meantime, let’s not make assumptions that we can’t justify.
> That definition seems arbitrary to me; many humans wouldn’t pass this test.
What part of this is arbitrary, i.e. random, whimsical, or biased? This is a fairly comprehensive working definition of "agency". "Internally generated", which is implied, is a nice touch.
Nearly every human over the age of 18 months passes this test with flying colors. Toddlers have goals, and do everything in their power to achieve them. What humans are you thinking of that don't have agency?
> In the meantime, let’s not make assumptions that we can’t justify.
I totally agree. I think that until demonstrated otherwise, I will assume that LLMs are a giant statistical sieves that (1) periodically spit out text directly from their training set, unmodified, and that (2) do not learn on their own, do not formulate their own goals, and do not take actions to achieve those goals.
Great question! These are predictive models that accept a text query, do some matrix math, and then return some text. At what point in that server-client relationship does this algorithm jump the rails and run amok?
> A popular nightmare scenario for AI is giving it access to tools, so it can make API calls and execute its own code and generally break free of the constraints of its initial environment. Let's do that now!
I also think that you're assuming we know a lot more about how these things work than we actually do; you seem to think nobody is going to hook these up to APIs that can actually modify the world, despite the barrier to doing so being incredibly low; and you don't seem to have read about the adversarial training that people have been doing between the LLMs.
It's obvious that you think everything is all safe and nothing will go wrong, and I really hope you're right. But I think it's a very dangerous assumption.
You've moved the goalposts from, "we should assume that predictive text models are sentient," to, "we should be wary of programs explicitly designed to interact unpredictably with other computer systems," something that everyone from the Amish to the NSA to Cozy Bear would agree with.
> Considering that we don’t know how the brain works so well, and we don’t understand why LLMs work so well, simply on the basis of their output I think the safest assumption is that these models do indeed have agency, or at least the capability of agency.
These systems can already interact with others, it’s not moving the goalposts, it’s common knowledge. Anyone with access to the APIs can make it happen. Or are you now claiming we are just talking about one specific LLM and not LLMs generally?
Gods don't have persistent bodies or brains. They're spirits.
I don't disagree that digital systems with neural architectures could have agency in principle, but agency generally is definitely not the output of a large language model. Animals without language have agency, in that that they take actions to fulfill their desires. Current LLMs may have some degree of intelligence, but they don't even appear to have any consistent wishes or desires. You can get them to talk longingly about x... until you give another prompt and suddenly x doesn't matter to them at all.
Sure! Why not? Quantum foam imagining quantum foam. I love it. I still wouldn't consider an LLM god-like. I mean, if it eats its own son, then maybe. (Titan of a reference there.)
Code can be abstracted into a simpler code model and deterministically recreate the old code model.
OpenAI is an eventually to be obsoleted initial brute force approach that will be abstracted over and over into a simpler code implementation with rules to recreate the old state.
kkrieger is a simple example of a tiny data model that can be deterministically rehydrated. It’s not unrealistic for AI models to become a seed value for a normalized code base to deterministically unpack into necessary electron state
The emerging consensus for larger LLM is you want to train them with at least 2-4x the tokens of the number of parameters (weights between neurons in the layers). A trillion (100x) surprises me.
The LLaMA paper contradicts this view:
"[...] Although Hoffmann et al. (2022) recommends training a 10B model on 200B tokens, we find that the performance of a 7B model continues to improve even after 1T tokens."
https://arxiv.org/pdf/2302.13971.pdf
They probably put most of the effort into the 65B model, the 7B model was just trained so they could get an idea of the scaling behaviour. It makes sense to use the same amount of training steps, then.
AI bootstrapping AI is a sci-fi trope that goes back decades. I first encountered it in The Cybernetic Samurai while in high school. While the details differ, the reality is that AI is a catalyst for more of itself.
I don't remember many books where this ends particularly well. Perhaps the Culture universe could be a survivable outcome. Hopefully we don't get Berzerkers first.
The world is becoming more cyberpunk everyday, people making back alley deals for data or models.
These weights are shit man, they have been quanted w/o being retrained against the original. I already have this torrent, I want uncut originals. And no water marks this time, the last model wouldn't shutup about investing in tulips.
I'm not convinced that the GPT-4 API actually works? It's been giving me very different answers than the chat interface. For example, the chat interface says it's GPT-4 if you ask it, but the API says it's GPT-3 (and bills as GPT-4).
- I am based on the GPT-4 architecture, which is an iteration of OpenAI's Generative Pre-trained Transformer models. My training data includes information up to September 2021.
- How do you know your version?
- As an AI language model, I am programmed with information about my own architecture and the version I am based on. My creators at OpenAI provided me with this knowledge, which helps me to give context to users about my capabilities and limitations. In this case, I am based on the GPT-4 architecture, an iteration of OpenAI's Generative Pre-trained Transformer models.
Correct me if I'm wrong, but that's per-user, unless you all just share an account, or you build out a bespoke API integration. Which is to say if you have 25 developers, you'd spend $5k in 1 month.
The reason to pirate it would to be able to fine-tune the model on your private internal source code repository, assuming you already have an existing large body of work you want to train it and offer SelfHostedCoPilot on your bespoke internal-only DSL that ChatGPT and Copilot has no way of having seen and would undoubtedly hallucinate about by a considerable margin.
Most people doesn’t fine tune the models (llama or openAI). A MacBook M1 can run those model ($1000) and in many cases the user already have it. You also need a computer to access openAI, the comparison boils down to $20/m vs $0. At this point in time, LLM are a curiosity for most people
Isn't openAI only charging something like a 20th of a penny per interaction right now? Definitely not the kind of thing you want to incorporate into a widespread free app just yet, but it seems pretty affordable for a lot of use cases
> Isn't openAI only charging something like a 20th of a penny per interaction right now?
They don't charge per interaction, but per token. The chat models range from a fifth of a cent per 1000 tokens to 12 cents per thousand tokens (depending on whether it's gpt-3.5, or the 8k limit gpt-4, or the 32k limit gpt-4, and, for gpt-4 models, also prompt v. response tokens.)
Is it accurate to say they were trained for less than $600? Wouldn't that just be the finetuning that was done to the already existing LLaMA parameters which likely cost way more than $600 to train?
Yeah, exactly. LLaMA 7B itself cost $80,000+ to train (82,432 GPU hours). Stanford spent
$100 on fine-tuning compute and $500 on OpenAI credits to generate their 52,000 sample instruction training set.
This is why it's not possible to slow down or "stop" AI: once the problems are solved the solutions turn out to be trivial to replicate. All it takes is compute.
There are lots of places to get compute, including Chinese cloud providers...
The genie really is out of the bottle now.
This is a lot like pharmaceuticals. The initial investment in a new medication is enormous. The price of each pill is trivial, to the extent that every drugstore chain is able to supply a generic in-house brand.
Maybe. Certainly in the past, before the world was aware LLMs on the level of ChatGPT were possible with today's technology. OpenAI's chosen not to release any real details about GPT-4, so we don't actually know what it would take to train a model of equivalent quality, especially considering training isn't a one-shot. Multiple training runs easily add up training costs. So training for a 12-figure parameter size model(s) (175B) is assumed to be very expensive. But there has been great progress made for optimized models which are smaller by a two orders of magnitude - 7B for a debatable drop in quality (7B alpaca is in no-way competitive with ChatGPT, but it's still very much not a markov chain from during the AI winter). So one possibility is that OpenAI chose not to release salient GPT-4 details is due to it being much smaller than GPT-3's 175B model size and they're hiding the details because of how much that cuts down on training costs. (Which I should note is unsubstantiated conjecture but not outside the realm of possibility.)
The other aspect is that fine-tuning an existing model is way cheaper than creating a competing model from scratch, so a company could offer CompetitorGPT/CompetitorCoPilot competitive with GPT-3.5, and offer fine-tuning of that model trained on the source code repository of the purchaser company's codebase, possibly on-prem or at least inside their AWS VPC/Azure/GCP equivalent.
The other thing to note is that OpenAI is hosting ChatGPT as a public resource available to anyone with an account, akin to Google being open to the public from day one (although that is without an account. Maybe Gmail is a better comparison). I can't say for certain, only OpenAI would know for sure, but I'm willing to bet that inference for ChatGPT is the vast majority of their costs (which is all but trivial). Any private internal-only instance of OpenChatGPT (using the unlicensed leaked LLaMA model or a legal copy or someone else's) could be paying (relatively) minuscule training costs, and way lower inference costs if it's internal-use only. Whether that cost can be borne by a small SaaS company's existing AWS budget is up in the air, which is to say ultimately that you're right - ChatGPT would be difficult without the support of Microsoft via a huge Azure grant, it's less obvious that a self hosted internal-only OpenChatGPT, not from OpenAI, would be possible by hobbyist self-hosters with a prosumer GPU cluster (Say with last generation K80's instead of business-priced A100's), or by a company wanting to leverage LLMs for private use by that company that wants to provide a Copilot like productivity multiplier internal tool to their developers, without sending private source code to OpenAI in lieu of a privacy agreement with them.
> OpenAI's chosen not to release any real details about GPT-4
Actually, they have release some details about it, in this 99-page technical report https://arxiv.org/abs/2303.08774 (which is actually two papers stitches together, once you read it; oddly enough using different fonts).
But I'm not sure if this content qualifies as "real details".
> Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar. We are committed to independent auditing of our technologies, and shared some initial steps and ideas in this area in the system card accompanying this release. We plan to make further technical details available to additional third parties who can advise us on how to weigh the competitive and safety considerations above against the scientific value of further transparency.
In other words, "Stable Diffusion wasn't supposed to happen, so we're making all our methodology trade secret[0], if you want to Do Science then agree to this massive NDA and have enough skin in the game for us to cut you."
[0] Presumably at some point OpenAI will have to 'relent' to independent discovery by patenting AI architectures and refusing to license them
I've been using the Chat GPT-4 model, and openAI has been putting warnings about max queries per N hours. Given the degree to which they're limiting access (up until they crashed today, they'd dropped to 25 queries / 3 hours), I suspect GPT-4 is actually much, much larger, and they just don't have the computational resources to support its use at the same level as GPT-3.5 or GPT-3.5 Turbo.
You could be right! I don't claim access to any private OpenAI information so any theories by me are based on what's known publicly, which isn't much for GPT-4. I do want to call attention to the difference between training runs and inference runs (post-training usage of the model). If each training run costs mid six-figures, CompetitorGPT is going to have to be well-funded and likely sponsored by AWS/GCP (eg Deepmind) just to train up the model, given that it's probably not a one-shot. If it's much lower due to optimizations in training, on top of only having to fine-tune the model on a company's codebase instead of training the whole model from scratch each time, then I could see a company selling the service of creating CompetitorGPT or CompetitorCoPilot seems like it could be a very worthwhile investment, by companies that are willing to invest in such services for their developers. (Eg companies that are willing to pay Splunk's exorbitant costs vs one that would rather burn time self-hosting a graphana setup. Not to impugn graphana, but it's very much a home-grown, open source self-hosted deployment. Managing a Splunk cluster is also far from free, it's just that not all companies are willing to bear the yearly licensing cost for it and would prefer to self-host graphana solely for cost reasons even if TCO including the opportunity cost makes it more expensive in the long run.)
While that's true, it's basically inevitable now that at some point personal hardware will be powerful enough for enthusiasts to run home bots comparable to GPT-3, and even that by itself would drastically change a lot things.
Running isn’t necessarily the issue. The moat is creating a high-quality model like OpenAI has, which (and here the article is mistaken) doesn’t seem to be easily reproducible.
That's going to get easier too. Stanford can already get this far for $600, so soon after the major GPT-based chat AIs were released. Imagine how much better it will get with just a little bit more time.
While that's true, it also seems entirely predictable at this point how to do that. It takes a lot of effort and expensive hardware, but there isn't really a "secret sauce" beyond expertise in the field.
Yes, but it takes time (took OpenAI years) and significant effort. Who with enough expertise will do this and not keep the results closed in order to monetize them? It doesn’t seem like something an open source project could accomplish quickly enough to not keep lagging substantially behind the commercial solutions.
I can tell you that HN is probably already being infiltrated as well.
SPAM can't gang up on you in a forum and downvote you and turn your friends against you and destroy your reputation within 1 hour online. But soon, it will. The web as we know it is soon going to be over.
For now my IRC bots run the alpaca 7B 4-bit. 13B was not a significant improvement for twice the computational time. But it's best to learn them now because as soon as openai gets sued for the first time all the turing test passing older models without the legal-butt-covering bolted on will be removed.