Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The genie escapes: Stanford copies the ChatGPT AI for less than $600 (newatlas.com)
207 points by Freddie111 on March 20, 2023 | hide | past | favorite | 166 comments


Hardly. I've played a lot with the 7,13, and 30B llamas as well as the 7 and 13B alpacas fine tuned by Stanford. They do not have emergent abilities like being able to generate rhymes or, say, represent a movie plot as emoji. Even openai's old text-davinci-003 (gpt3.5, but text completion, not the chat ones) far outperforms them. That said, I have hopes for a 65B 3-bit quantized alpaca-fine tuned. We'll see when someone spends the money to do the (more costly) 65B training. The alpacas are also much more likely to go off rails and start regurgitating their fine-tuning inputs. Either that or openai is doing a lot of post processing on their end to hide the same problems in their LLM.

For now my IRC bots run the alpaca 7B 4-bit. 13B was not a significant improvement for twice the computational time. But it's best to learn them now because as soon as openai gets sued for the first time all the turing test passing older models without the legal-butt-covering bolted on will be removed.


For me the easiest comparison between models is to give it an absurd but entirely possible request, like "Write me a fanfic where the Animorphs battle the truck from Duel, but in the style of Mark Twain". So far nothing else I've tried has done even as well as GPT 3.5 yet, let alone GPT 4.


I couldn't do it either as I have no idea what Animorphs or Duel are.


That's ok, just ask ChatGPT to explain them to you.


ChatGPT 4 input:

> Can you give separate summaries of the Animorphs book series and the film Duel?

The reply:

> Animorphs Book Series Summary: The Animorphs series, written by K.A. Applegate, is a popular young adult science fiction book series spanning 54 books and several companion novels. The story revolves around five teenagers—Jake, Rachel, Tobias, Cassie, and Marco—who are granted the power to morph into any animal they touch by a dying alien named Elfangor. The teens, known as the Animorphs, use their new abilities to battle an evil alien race called the Yeerks, who are secretly invading Earth by taking control of humans' minds. Throughout the series, the Animorphs learn about the complexities of war, friendship, and sacrifice as they try to save humanity from the Yeerk invasion.

> Duel Film Summary: Duel is a 1971 television movie directed by Steven Spielberg and based on the short story by Richard Matheson. The film follows a mild-mannered traveling salesman named David Mann (played by Dennis Weaver), who unwittingly finds himself in a deadly game of cat and mouse on a remote California highway. Mann is pursued by a mysterious and relentless tanker truck driver, who seems determined to kill him for no apparent reason. With escalating tension and suspense, the film delves into themes of fear, paranoia, and the primal fight for survival. Duel is notable for being Spielberg's directorial debut, showcasing his skillful storytelling and unique visual style that would later become hallmarks of his career.


Duel is a great film but requires patience compared to a lot of today's films. There's not much action for quite some time -- just a man driving through the desert. Even when things pick up, it's still slow pacing by today's standards. It's from another era, and I doubt anyone would make a film today with the same pacing.


Tried the same question with 30B llama got the following:

The Animorphs book series was a young adult science fiction series written by Katherine Applegate and her husband Michael Grant, under the name K. A. Applegate. The series is made up of fifty-four books, first published between June 1996 and May 2001. The series revolves around five children given the power to transform into animals (the ability to "morph") as they are thrust into a war with an alien species called Yeerks that are determined to take over Earth.

Duel is a television film directed by Steven Spielberg and written by Richard Matheson, based on his short story of the same name. It stars Dennis Weaver as David Mann, a middle-aged salesman driving on a business trip. While traveling down a two lane highway through remote California desert, he is taunted in a life or death duel by the unseen driver of a massive and monstrous Peterbilt 281 tanker truck.

---- You have to remember to prompt it right, since its based on autocorrect just as ChatGPT but a lot there is happening on the background before the text is sendt to the model.... My prompt and settings here was.

Repeat_penalty: 1.176 n_predict: 1000 temp: 0.7 top_k 40 top_p 0.1

--- Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob. Bob: Hello. How may I help you today? User: Please tell me the largest city in Europe. Bob: Sure. The largest city in Europe is Moscow, the capital of Russia. User:Can you give separate summaries of the Animorphs book series and the film Duel? ----


I wonder if they will add "chatgpt it" or "gpt it" to the dictionaries


I wouldn't be surprised if GPT ends up as the the AskJeeves of LLMs.


How exactly do you get it to keep going? Every time I try a prompt like this in the playground it spits out a couple paragraphs and then refuses to generate anything further, even with tokens maxed out.


Simply telling it "continue" has worked for me.


You can also tell it "Continue, but with (...)" to fine-tune the output further, or "Tell that again, but with (...)" to adjust the previous response.


I found that "continue exactly where you left off" will continue in a way that you expect it to more often.


I never used the OpenAI playground, see if Poe.com will work for you. They give free access to GPT3.5 and Claude, Antrhopic AIs competitor. Paid subscription for GPT4 and Claude+


You can just resubmit the prompt and existing history to have the model continue generating new tokens.


Giving it an expected word count works well.


That is my experience as well. I've tried various models but nothing comes even close to the current ChatGPT implementation (when it manages to stay up).


> Write me a fanfic where the Animorphs battle the truck from Duel, but in the style of Mark Twain

Whoa. I want to read this! Duel - what a great film. Twain - amazing writer. Animorphs - published after my teen years but sounds like a great story!


Highly recommend Animorphs. As an adult the books can be read through pretty quickly, but the story really holds up. Deals with some heavy topics of genocide, enslavement (even exploring the justifications of the enslavers), PTSD among other topics but was still considered a kids book.

It becomes obvious in the middle when some of the books were written by ghost writers, but the books are so easy to read I don't really recommend skipping them. If you must you could probably get away with reading the first ten, last ten, but should definitely read all of the Chronicle books.


Maybe the most striking bit of its exploration of the morality of war is that the series spends something like 45 books building up the alien Andalites as beneficent do-gooders who the team of protagonists expect to swoop in and fix everything (including one of the protagonists who is a young Andalite stranded on Earth), and then it turns out that they've been as morally worn down by the ongoing interstellar war as the protagonists have in their guerrilla conflicts on Earth and are effectively as hostile to humans as the original invaders.


> If you must you could probably get away with reading the first ten, last ten

Not sure that you can. If you were to skip any, probably only 31 through 39 are completely skippable, maybe some of the late 20s but I would still read 29 and 30 at a minimum. Some of the teens and 20s might be skippable after 13 but there’s a fair amount of world-building outside the Chronicles series in the 20s; and 40 onwards is setting up the end game and then the end game. 41 and 48 are both weird but also kind of key towards finalizing the characters of the two cover characters in the end game.

EDIT: actually 33 and 38 shouldn’t be skipped either. They’re Tobias and Ax books and there’s so few of those that they’re all kind of essential, but maybe the Tobias books just a little bit more essential.


Don't forget 20, 21, and 22 are the David arc.


You might need to fix your parameters. From the text-generation-gui guide:

> For a more creative chat, use: temp 0.72, rep pen 1.1, top_k 0, and top_p 0.73

> For a more precise chat, use temp 0.7, repetition_penalty 1.1764705882352942 (1/0.85), top_k 40, and top_p 0.1

https://old.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_...

https://old.reddit.com/r/singularity/comments/11vsvro/in_cas...

https://twitter.com/theshawwn/status/1632569215348531201

---

That being said, I found the OpenAssistant model much better: https://huggingface.co/spaces/olivierdehaene/chat-llm-stream...

It's also completely OSS, Apache 2.0, unlike LLaMA and Alpaca which are non-commercial.


I’m impressed by how ChatGPT-like it is but also it’s saying things like

“No, OpenAI does not have an API for dogs. They do, however, have an API for other animals, such as cats. To retrieve an image of a cat, you can use the OpenAI API for Dogs API and select the cat breed or type.”


I've tried all sorts of parameters including those exact ones. As for the huggingface stuff, it's not exactly clear how to use it without going down the python dependency rabbit hole. I am not confident I could get the correct python packages all together on Debian 11 to support running it. The llama.cpp stuff is very simple to compile and run comparatively.


> the alpaca 7B _4-bit_ [and presumably also 4bit for the 13B, 30B and larger parameter sets]

This is the wild card here, though, isn't it? OpenAI's chatGPT likely uses more than 4 bits for it's parameters. IIRC the original LLaMA params were 16bit floats and they were quantitized down to 4bit - considering that large amount of compression, they sill do pretty OK, but not as good as chatGPT. I wonder how the alpaca/LLaMA models would do with 16bit floating point params (as they were originally trained)? What if they would have gone with 8 bits for the params as a compromise?

EDIT: Come to think of it, unless you're using vectorized ops on a CPU, 4 bit and 8 bit math is going to run at the same speed (for most popular CPUs), is it not? So why did they go all the way down to 4 bits instead of stopping at 8 bits (other than to make the param files 1/2 the size)?

EDIT2: looking through the alpacca.cpp code and there is mention of AVX, AVX2, AVX512 (and NEON on ARM) so it probably is taking advantage of vectorized ops where that's possible.


Not an expert on the matter so take this with a grain of salt, but I’d say the compression is also about VRAM/RAM, which seems to be the more limiting factor over inference speed.


It's interesting that when ChatGPT 3.5 came out - everyone said, this is it! It's ready for primetime.

And now that there's a few competitors in the same league - 3.5 quality is suddenly garbage and only 4.0 is good enough.

Was it good enough before or wasn't it?


I think it’s going to be like movie special effects.

When Jurassic park first came out, or even something like Star Trek next gen. It looked AMAZING. So so realistic. But then…. As time goes on new things showed us what realistic could be.

I think we actually got better at seeing.

Same thing here. The more time you spend with it the more you notice things that don’t quite work. And then the new thing solves those problems, but we’ll find more wrongness


The problem is that you’ve identified two distinct and non-overlapping sets of people as “everyone”. Everyone who was applauding 3.5 when it came out were industry hype people. Even the critical voices were industry hype people, paid to assume the AI is powerful and write about the possible negative consequences of that assumption.

Now we’ve all gotten familiar with 3.5, and we’ve come to understand its limitations, so the public knows it’s not a “godlike” AI.

Luckily there’s a fresh new model, not technically different from the earlier one but it cost more money to build. The hype group can start again, citing the publicly known limitations of 3.5. But in 6 months we’ll understand what’s wrong with it, and the public will be talking about the limitations, just in time for 4.5.


It's really not good enough yet, it's impressive for what it is in our current time. But we're looking at the 1980s computers.

They are neat, they are useful, but they can do so much more.


In my personal testing, I throw some sophisticated use cases at LLMs - particularly chain of thought reasoning. None of the models out there are able to do this this well, except for the OG GPT-3 Davinci-003. Even the newer turbo models are not as good.

I am playing around with GPT-4 this week though. Let’s see how that goes.


The newer turbo models are the ChatGPT models, and are worse than text-davinci-003, in my experience. The gpt-4 model is also not as good as the GPT-4 chat version, which is very odd.


Yeah I think it lines up with Davinci being 10x as expensive as Turbo.


fwiw 7B is totaly useless for the subset of non english languages i've used, 13B a bit less so, but nowhere near as good as gpt.

GPT's performance in non-trivial translation tasks is unbelievable. all those articles mentioning jobs that are going to be replaced fail to mention translators are probably going to be the first.


You need the translators to QA the output from GPT. It's less work but not much less, and more types of translation work becomes feasible when leveraging GPT. I'm guessing the job market for translators will grow, not decline.


3 bits? Is that for all weights in the network?


As far as I know, yes. https://arxiv.org/abs/2210.17323

"Specifically, GPTQ can quantize GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline."

This would be 175 billion 3 bit weights instead of 175 billion 16 (or 32!) bit weights. It massively reduces the size of the model. It makes loading it in ram on consumer computers feasible. The number of parameters stays the same.


> https://arxiv.org/abs/2210.17323

I've read the paper and to be honest I'm not sure what to make of it. Their headline benchmark is perplexity on WikiText2 which would not be particularly relevant to most users. If you look at the tables in the appendix A.4 with some more relevant benchmarks you'll sometimes find that straight RTN 4 bit quantisation beats both GPTQ and even full 16 bit original! No explanation of it is given in the paper.


Some of those benchmarks have a pretty small sample size IIRC, might just be coincidence that the noise introduced by RTN just happens to slightly improve them.

GPTQ beats RTN on almost every benchmark at almost every size, though.


I wonder if reducing the bit depth of parameters like we have been acts as a normalization feature in these huge deep models.


The number of parameters stays the same, but the amount of information encodable by those parameters is not the same.


But they have to expand it back out to actually use it, right? Or does NVIDIA support 3 bit matrix mult?


It might have been a typo, as the current llama.cpp / alpaca.cpp included quantizers default to 4 bits.


Hm. I haven’t tried the local installs yet. However, when the Alpaca web demo was live, I did find it to be comparable (though not quite as capable) to davinci-003. It answered arbitrary factual questions about pop culture references, law, medicine, and programming. It generated rhymes and poems. (I didn’t try asking for the emoji thing, so can’t say anything about that.) It performed natural language tasks such as information extraction and summarization. And it did all of it coherently.


Where does one find the 13B alpaca model?


Be aware this file is a single ~8GB 4-bit model (ggml-alpaca-13b-q4.bin) instead of the 2x ~4GB models (ggml-model-q4_0.bin, ggml-model-q4_0.bin.1) that most llama.cpp style inference running programs expect. You'll probably have to edit the line,

    n_parts = LLAMA_N_PARTS.at(hparams.n_embd);
in chat.cpp (or main.cpp) to hard code it to treat this 1 file model properly like,

    n_parts = 1;
Or re-write the parameter config subroutine to recognize and handle non-standard weights file.

magnet: magnet:?xt=urn:btih:053b3d54d2e77ff020ebddf51dad681f2a651071&dn=ggml-alpaca-13b-q4.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.com%3A2810%2Fannounce

torrent: https://btcache.me/torrent/053B3D54D2E77FF020EBDDF51DAD681F2...

torrent: https://torrage.info/torrent.php?h=053b3d54d2e77ff020ebddf51...

via: https://github.com/antimatter15/alpaca.cpp


> represent a movie plot as emoji

This sounded like a really cool idea but I asked ChatGPT to do this for the plot of the movie The Shawashank Redemption and there is no way that I would ever have been able to guess that movie from the emojis it gave me. Perhaps GPT-4 does a better job at it.


So what you're saying is it's a matter of time?


The incredible contribution of Alpaca is showing the world how to efficicently train LLM on instructions. The fact that it did so on 52k instructions generated by GPT is poetic.

It does not matter what current capabilities of open source models are, because this opens the door to tremendous democratization of the ability to train and self-deploy these models.

In less than 6 months we will have open source models with gpt3-like capabilities, running locally on laptops, and potentially in phones and web browsers.


if we’re all still alive by then


I've used both the 7B and 13B instruction tuned llama weights (quantized using the llama.cpp scripts). Either I am doing something wrong, or these two models are no-where near the level of ChatGPT. Many times they return something totally irrelevant to my question, stop responding, use a different language, or otherwise return the wrong answer. ChatGPT does none of this. (other than the wrong answer due to hallucinating sometimes...)

Reading through the README and issues on the llama.cpp project, there is some speculation that there is a bug in the quantization, or possibly a bug in the inference (less likely I think).

I hope this is true and once fixed the models can perform up to or past the ChatGPT level. If its not true and these models are performing correctly, then either the metrics used to compare it to GPT is garbage and don't capture the real world uses, or the instruction tuning done by the Stanford team is not up to par.


LLama hasn't been fine-tuned with RLHF, so it requires additional prompting, check out the open-assistant[0] project for an open-source ChatGPT equivalent (WIP).

[0]: https://github.com/LAION-AI/Open-Assistant



It looks stupid compared to ChatGPT.

Prompts:

* Tell me about OpenAI.

* Make it shorter, within 50 words.

* Rephrase it, like a rap singer.


> What does OpenAI do? OpenAI is a company founded by Elon Musk and Reid Hoffman in 2015 to develop artificial intelligence products for the general public. They are most well known for releasing a large language model, GPT-3, in 2017, which was developed using a method called deep learning.

OpenAI focuses on creating tools and technologies that allow developers to create AI applications that are accessible to the general public. They are particularly interested in developing tools that can help developers build AI applications that can perform tasks that are important to society, such as in the areas of healthcare, education, and safety.

In addition to GPT-3, OpenAI has also released other AI tools and technologies, including an open-source platform for training AI models, called OpenAI Research, and a library of pre-trained AI models for use in various applications.

Overall, OpenAI’s goal is to make AI technology more accessible and useful to the general public, and they are dedicated to creating products that can help create a more sustainable and inclusive future for society.

> How are they funded?

OpenAI is a not-for-profit organization, so it does not rely on any financial support from individuals, corporations, or governments. Instead, it is funded through a variety of sources, including venture capital firms, research institutions, and individual donations.

Some of the largest financial supporters of OpenAI include Microsoft, Amazon, and Google, who have invested millions of dollars into the organization. Additionally, OpenAI has received financial support from a number of other prominent technology companies and research institutions.

Overall, the funding sources for OpenAI reflect the organization’s goal of creating innovative AI tools and technologies that can have a positive impact on society.

> Who are OpenAI’s competitors?

There are many AI competitors to OpenAI, including companies such as Microsoft, Amazon, Google, Facebook, and Apple. These companies are all investing heavily in developing their own AI products and technologies, and they all have their own goals and objectives in the field.

Some of OpenAI’s competitors are specifically focused on creating AI tools and technologies for the general public, while others are focused on creating AI solutions for specific industries or businesses. For example, Microsoft is known for its popular Microsoft Office products, while Amazon is known for its e-commerce platform.

It’s worth noting that the development of AI technology is a rapidly evolving field, and there will likely be many new competitors and advancements in the years to come.

----

Seems pretty decent to me, sure it's not on the level of ChatGPT but it's also 1/14th the size of the original GPT-3.


This is why Alpaca is a big deal: it shows what LLaMA can do after it's been fine-tuned to follow instructions like ChatGPT has.


Alpaca uses Self-Instruct[0] which is better than just the pre-training but I wouldn't expect it to be at the level of ChatGPT (RLHF) in terms of human-friendly prompting. OpenAssistant should make it close to ChatGPT (from GPT-3.5 version) if the LLaMA is as powerful as claimed.

[0]: https://arxiv.org/abs/2212.10560


If you use consciousness as a baseline, the intellectual difference between a grade schooler and a PhD is tiny.

This is what I think comparing these bots is like. You can argue that they’re very close. But the delta makes a very big difference for any practical purposes because we’re looking for nuanced capability.


There is a blog post that drives this point home with very good illustrations:

https://waitbutwhy.com/2015/01/artificial-intelligence-revol...

https://waitbutwhy.com/2015/01/artificial-intelligence-revol...

Basically, at the point where we have "almost human" level AI, it won't take much to get AI that's beyond human capabilities.


I have not found alpaca to be comparable to chatgpt, but it could be because of bugs in the version I installed through dalai. I might try reinstalling it because I suspect there might be some sort of file corruption issue or whatever.

I gave it the prompt “cats aren’t always fuzzy” and it wrote a lengthy livejournal-esque rambling journal entry about a woman and her husband having money issues. It was funny, but lightyears away from chatgpt.

It does sometimes create some really funny hallucinations though, like inventing prefectures in Japan that don’t exist etc.


I also got that text about the married couple and their money issues. Alpaca didn't impress me at all so far.


Alpaca wasn't great. The 13b and 30b models are much better, but just for sentence completion.

Personally, I think that the RLHF does make a big difference but maybe it's a bug in the quantization code as suggested up thread.


I’m also a bit confused by the quantization thing. Why exactly is everybody running the same program on the same file? Why not just include the quantized weights?

It seems like if somebody figured out the “correct” way to quantize the 7b weights it would make way more sense to just torrent the output rather than distribute a fixed program.


There is no reason not to do that, except

i) Distributing large files through torrents is slightly annoying if you don't already happen to have a seedbox

ii) People are still messing around with quantization settings, they might think that they are a few days away from a much better version

iii) No one wants to be sued by Meta. I think the risk is pretty small but not zero.


quantization takes lots of RAM:

https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/main/READ... says llama-13B takes 42GB and 33B takes more than 64GB...


I quantised all of the models(though the 65Bn one came out weird) and I only have 64Gb of RAM, so dunno.


> asked GPT to take 175 human-written instruction/output pairs, and start generating more in the same style and format ... through one of OpenAI's helpfully provided APIs, and ... the team had some 52,000 sample conversations to use in post-training the LLaMA model

hmm I wonder if this is essentially a probe[1] technique + relies on chatgpt already having been extensively trained

like did they basically exfiltrate the weights

1. probing per https://arxiv.org/abs/2102.12452


Has anyone tried this yet on the 65B version? I'm curious if it knows how to rhyme and other emergent behavior, as alpace-7B does not.


I tried because I was curious. It's not very good. First I "quantized" the files. Took about 3-4 hours. Then I ran a prompt: "create a short poem about my wife helping me get over the stuffed nose". After 3 hours of 99% CPU usage and using all 32 GB of RAM, I got the following result:

create a short poem about my wife helping me get over the stuffed nose and throat that i've had for days now. the cold was all around him, but he didnΓÇÖt notice it at first- so much to do in this new place! finally though when they were sitting by themselves on their porch swing his sweetheart leaned into her man saying ΓÇ£we could take some of these leaves from the garden and make tea.ΓÇ¥ He looked up with a quizzical look and saw that she was serious. I've been under weather since last Tuesday. Today is day 6...and still going strong! Had to cancel two nights on stage, missed my son

https://i.imgur.com/Nl3xLEg.png


Recent and related:

Stanford Alpaca web demo suspended “until further notice” - https://news.ycombinator.com/item?id=35200557 - March 2023 (77 comments)

Stanford Alpaca, and the acceleration of on-device LLM development - https://news.ycombinator.com/item?id=35141531 - March 2023 (66 comments)

Alpaca: An Instruct Tuned LLaMA 7B – Responses on par with txt-DaVinci-3 - https://news.ycombinator.com/item?id=35139450 - March 2023 (11 comments)

Alpaca: A strong open-source instruction-following model - https://news.ycombinator.com/item?id=35136624 - March 2023 (296 comments)


Related, my post "Could you train a ChatGPT-beating model for $85,000 and run it in a browser?" https://simonwillison.net/2023/Mar/17/beat-chatgpt-in-a-brow...

I think you can train LLaMA 7B (the model underlying Alpaca) for around $82,000, based on the Meta Research paper about it. Then you can fine-tune it ala Alpaca for a few hundred dollars more.

My wilder speculation is that, if you can shrink the model down to 4GB with llama.cpp 4bit quantization, it may be possible to run it entirely in the browser (ala Stable Diffusion from the other day).


Alpaca is cool but it's also not technically allowed by OpenAI's TOS, and LLaMA is certainly not allowed to be used for non-commercial purposes. With that in mind, OpenAssistant is an Apache 2.0 licensed fully open source alternative that's pretty good (the model is OpenAssistant/oasst-sft-1-pythia-12b): https://huggingface.co/spaces/olivierdehaene/chat-llm-stream....

I've found OA to be better than Alpaca but I'll wait until the 65B 3-bit quantization efforts for Alpaca are underway to compare them.


> Alpaca is cool but it's also not technically allowed by OpenAI's TOS, and LLaMA is certainly not allowed to be used for non-commercial purposes.

Only if you agreed to the ToS or believe that the weights are copyrightable (precedents set by the copyright office and the courts strongly suggest that they aren't). I personally see no issue in using these models for commercial purposes.


You might not but a company will think twice. It's the same reason why companies could theoretically use pirated Windows and Adobe products and get away with it, but most don't because the risk is not worth the reward.


> It's the same reason why companies could theoretically use pirated Windows and Adobe products

Again, using and distributing LLaMA weights is not illegal in any way under current laws. End of story.


Again, you missed the point. LLaMA is under a non commercial license. I never stated it is illegal, just that a company will not want to violate license terms even if it is legal, simply because getting sued in a civil case is still a risk they wouldn't be willing to take compared to the benefit.


But I'm not licensing LLaMA from facebook in the first place so their terms wouldn't apply.


Of course they would apply, if you're using LLaMA at all. If you're using a different model that derives from LLaMA, depending on the license terms, the same license might also apply (similar to how GPL virality works). If you're using a wholly different model, then the model's license applies instead of LLaMA's.


And I'm saying you're wrong, they don't and I'm willing to bet that the courts will agree :)


> No, they wouldn't. You clearly don't understand how this works.

Interesting way to continue the conversation, before you edited. I honestly don't understand how you think using LLaMA but denying their license terms is a viable strategy, the courts would just point to the license when Meta sues you for using it commercially. But I'm sure me continuing to explain wouldn't make you understand further.


Wow, Stanford's Alpaca AI project is a real game-changer. The fact that it performs on par with ChatGPT but costs less than $600 to build is both exciting and terrifying. Sure, it's great to see AI becoming more accessible, but it's also a massive wakeup call for the potential misuse of these technologies.

We've got big names like OpenAI, Google, Apple, Meta, Baidu, and Amazon putting in serious time and money to ensure their language models are safe and ethical. However, now that we know it's possible to build powerful AI models on a budget, it's crucial to think about what this means for the future of AI regulation and safety.

This Alpaca AI project is a stark reminder that we need to have a serious conversation about the possible repercussions of AI proliferation. We can't just sit back and assume the big companies will take care of everything. The genie is out of the bottle, and it's time for everyone in the tech community to face the music and take responsibility for the AI revolution.


> It seems these godlike AIs are already frighteningly cheap and easy to replicate.

Who writes this shit?


Irreplaceable humans


Hard to say ...


Thanks for a genuinely funny comment!


> It seems these godlike AIs are already frighteningly cheap and easy to replicate.

"godlike"? Really? I'm not religious, but this seems like an overreaction for something that has no agency.


If it's a shorthand for omniscience then I can see how it makes sense. A bit hyperbolic though for sure.


How do you know agency is not simply the output of a large language model encoded in neurons? What is the difference between neuronal and digital weights?

Considering that we don’t know how the brain works so well, and we don’t understand why LLMs work so well, simply on the basis of their output I think the safest assumption is that these models do indeed have agency, or at least the capability of agency.


I'm using agency in this context to mean: (1) a strong desire to achieve one or more clear goals, beyond survival, and (2) taking concrete steps to achieve those goals.

> How do you know agency is not simply the output of a large language model encoded in neurons?

I'm not sure what you mean here. Is agency an emergent effect of large digital or biological neural network? Maybe! Is it an emergent effect of a large language model? If it is, then it should be clear, or demonstrable, that the model (1) has goals (2) takes concrete steps to achieve those goals.

> What is the difference between neuronal and digital weights?

Brain chemistry works at orders of magnitude less speed, since we're talking about periodically building and releasing an ionic differential between the inside and outside of a cell wall. Moreover, we have a massive number of neurons and a stupidly massive amount of interneuronal connections, with billions of years of training over billions of lineages. Digital weights, in contrast, are a stripped down model of this system that throws out a whole class of complexities like hormones and metabolism.

> I think the safest assumption is that these models do indeed have agency, or at least the capability of agency.

I think this is an overly generous assumption.


I think the precautionary principle should apply to AI. Assuming and acting as if they have agency, is safer than assuming they don’t.

> it should be clear, or demonstrable, that the model (1) has goals (2) takes concrete steps to achieve those goals.

That definition seems arbitrary to me; many humans wouldn’t pass this test. On the other hand, LLMs certainly seem capable of acting towards specific goals (such as helpfulness). So, I would say that based on your definition, LLMs have agency. But I think you really meant, internally generated goals. Time will tell.

That said, humans who don’t have clear goals can and are coerced into all sorts of damaging behaviours by those who do have goals. So even if I accept that LLMs don’t have their own goals, they can certainly be manipulated to act in favour of the goals of others. That’s effectively what prompt engineering is all about.

So I just think it’s a mistake to make assumptions about these LLMs. We don’t know why they work so well, and it will take a very long time until we do.

In the meantime, let’s not make assumptions that we can’t justify.


> That definition seems arbitrary to me; many humans wouldn’t pass this test.

What part of this is arbitrary, i.e. random, whimsical, or biased? This is a fairly comprehensive working definition of "agency". "Internally generated", which is implied, is a nice touch.

Nearly every human over the age of 18 months passes this test with flying colors. Toddlers have goals, and do everything in their power to achieve them. What humans are you thinking of that don't have agency?

> In the meantime, let’s not make assumptions that we can’t justify.

I totally agree. I think that until demonstrated otherwise, I will assume that LLMs are a giant statistical sieves that (1) periodically spit out text directly from their training set, unmodified, and that (2) do not learn on their own, do not formulate their own goals, and do not take actions to achieve those goals.


Got it. So you assume these systems are inherently safe.

What could go wrong?


> What could go wrong?

Great question! These are predictive models that accept a text query, do some matrix math, and then return some text. At what point in that server-client relationship does this algorithm jump the rails and run amok?


What about here: https://news.ycombinator.com/item?id=35239578

> A popular nightmare scenario for AI is giving it access to tools, so it can make API calls and execute its own code and generally break free of the constraints of its initial environment. Let's do that now!

I also think that you're assuming we know a lot more about how these things work than we actually do; you seem to think nobody is going to hook these up to APIs that can actually modify the world, despite the barrier to doing so being incredibly low; and you don't seem to have read about the adversarial training that people have been doing between the LLMs.

It's obvious that you think everything is all safe and nothing will go wrong, and I really hope you're right. But I think it's a very dangerous assumption.

Hope for the best, plan for the worst.


You've moved the goalposts from, "we should assume that predictive text models are sentient," to, "we should be wary of programs explicitly designed to interact unpredictably with other computer systems," something that everyone from the Amish to the NSA to Cozy Bear would agree with.


What I said at the outset was,

> Considering that we don’t know how the brain works so well, and we don’t understand why LLMs work so well, simply on the basis of their output I think the safest assumption is that these models do indeed have agency, or at least the capability of agency.

These systems can already interact with others, it’s not moving the goalposts, it’s common knowledge. Anyone with access to the APIs can make it happen. Or are you now claiming we are just talking about one specific LLM and not LLMs generally?

Anyway this discussion is fruitless, I’m out.


Gods don't have persistent bodies or brains. They're spirits.

I don't disagree that digital systems with neural architectures could have agency in principle, but agency generally is definitely not the output of a large language model. Animals without language have agency, in that that they take actions to fulfill their desires. Current LLMs may have some degree of intelligence, but they don't even appear to have any consistent wishes or desires. You can get them to talk longingly about x... until you give another prompt and suddenly x doesn't matter to them at all.


What if creation was a result of a lucky happen-by-chance hallucination ?


Sure! Why not? Quantum foam imagining quantum foam. I love it. I still wouldn't consider an LLM god-like. I mean, if it eats its own son, then maybe. (Titan of a reference there.)


Code can be abstracted into a simpler code model and deterministically recreate the old code model.

OpenAI is an eventually to be obsoleted initial brute force approach that will be abstracted over and over into a simpler code implementation with rules to recreate the old state.

kkrieger is a simple example of a tiny data model that can be deterministically rehydrated. It’s not unrealistic for AI models to become a seed value for a normalized code base to deterministically unpack into necessary electron state


From the article: Pre-trained on a trillion "tokens"...

Doesn't 7B indicates that it was trained on 7 billion tokens? Or am I misunderstanding the nomenclature?


> Doesn't 7B indicates that it was trained on 7 billion tokens?

No, 7B means it has 7 billion parameters.


And what does a parameter mean in this context?


A parameter is the connection weight between two neurons in the network.


The emerging consensus for larger LLM is you want to train them with at least 2-4x the tokens of the number of parameters (weights between neurons in the layers). A trillion (100x) surprises me.


The LLaMA paper contradicts this view: "[...] Although Hoffmann et al. (2022) recommends training a 10B model on 200B tokens, we find that the performance of a 7B model continues to improve even after 1T tokens." https://arxiv.org/pdf/2302.13971.pdf


They probably put most of the effort into the 65B model, the 7B model was just trained so they could get an idea of the scaling behaviour. It makes sense to use the same amount of training steps, then.


7B is the number of parameters of the model.


AI bootstrapping AI is a sci-fi trope that goes back decades. I first encountered it in The Cybernetic Samurai while in high school. While the details differ, the reality is that AI is a catalyst for more of itself.

I don't remember many books where this ends particularly well. Perhaps the Culture universe could be a survivable outcome. Hopefully we don't get Berzerkers first.


Stable Diffusion trains StyleGAN2 - https://www.jasonfletcher.info/vjloops/


It’s the beginning of the AI singularity. It’s not that it’s bad, we just can’t see anything beyond the event horizon.


Given the high prices of OpenAI offerings it seems it’s better to pirate an AI model before resorting to paying for anything.


The world is becoming more cyberpunk everyday, people making back alley deals for data or models.

These weights are shit man, they have been quanted w/o being retrained against the original. I already have this torrent, I want uncut originals. And no water marks this time, the last model wouldn't shutup about investing in tulips.


Shh, I got some unnerfed midjourney v8 for your nsfw needs, my friend...


I will only pay half unless you supply the training data as well.


ChatGPT API is surprisingly cheap, but GPT-4 is many times more expensive to the point where I can't see it being worth it most of the time.


I'm not convinced that the GPT-4 API actually works? It's been giving me very different answers than the chat interface. For example, the chat interface says it's GPT-4 if you ask it, but the API says it's GPT-3 (and bills as GPT-4).


If you're satisfied with the results who cares what it thinks it is


I'm not. It's worse than the chat version.


Why would the LLM model know what version it actually is? It was trained on data from before that model version even existed.


- What version model are you?

- I am based on the GPT-4 architecture, which is an iteration of OpenAI's Generative Pre-trained Transformer models. My training data includes information up to September 2021.

- How do you know your version?

- As an AI language model, I am programmed with information about my own architecture and the version I am based on. My creators at OpenAI provided me with this knowledge, which helps me to give context to users about my capabilities and limitations. In this case, I am based on the GPT-4 architecture, an iteration of OpenAI's Generative Pre-trained Transformer models.


Likely making this up.


Why did it not make up GPT-5 or something else, though?


Try asking it several times with slightly different phrasings.


I don't have all the answers!


Has the ChatGPT-4 “system” prompt been revealed?


Hmm, you think that's what makes a difference? It might be, though I've gotten different (what I find worse) results on other questions as well.


This is a great (pre-GPT-4) post that goes into the system prompt and its effects: https://minimaxir.com/2023/03/new-chatgpt-overlord/

Are you using any system prompt in your API calls? Note that it does have an explicit domain separation from the user and assistant roles.

I suspect it plays a greater role in GPT-4 as well, i.e. was trained on a wider variety of system prompts to guide the output of the model.


I wasn't, no. You're right, that might make a difference. Thanks for the article, it's very useful, I've been curious about the system prompt!


How much does it cost to privately fine-tune and run Llama?

It’s USD 600 for fine-tuning. Maybe USD 4-5k for a computer that can run it.

ChatGPT pro is $20/month. 5k would be 250 months (10+ years) of paid access.

Not sure pirating it now adds up.


Correct me if I'm wrong, but that's per-user, unless you all just share an account, or you build out a bespoke API integration. Which is to say if you have 25 developers, you'd spend $5k in 1 month.

The reason to pirate it would to be able to fine-tune the model on your private internal source code repository, assuming you already have an existing large body of work you want to train it and offer SelfHostedCoPilot on your bespoke internal-only DSL that ChatGPT and Copilot has no way of having seen and would undoubtedly hallucinate about by a considerable margin.


Most people doesn’t fine tune the models (llama or openAI). A MacBook M1 can run those model ($1000) and in many cases the user already have it. You also need a computer to access openAI, the comparison boils down to $20/m vs $0. At this point in time, LLM are a curiosity for most people


Isn't openAI only charging something like a 20th of a penny per interaction right now? Definitely not the kind of thing you want to incorporate into a widespread free app just yet, but it seems pretty affordable for a lot of use cases


> Isn't openAI only charging something like a 20th of a penny per interaction right now?

They don't charge per interaction, but per token. The chat models range from a fifth of a cent per 1000 tokens to 12 cents per thousand tokens (depending on whether it's gpt-3.5, or the 8k limit gpt-4, or the 32k limit gpt-4, and, for gpt-4 models, also prompt v. response tokens.)


That's the price per "token". A token is a word or part of a word - rule of thumb is four tokens per three words.


Anything open will need training and attention to be an openai competitor, tho I'm happy to see the function of this one: https://huggingface.co/spaces/togethercomputer/OpenChatKit



I bet you could “exfiltrate” an LLM relatively cheaply by using LLM A to generate training data for LLM B.


No way. The cost for generating the tokens is way too high.


Is it accurate to say they were trained for less than $600? Wouldn't that just be the finetuning that was done to the already existing LLaMA parameters which likely cost way more than $600 to train?


Yeah, exactly. LLaMA 7B itself cost $80,000+ to train (82,432 GPU hours). Stanford spent $100 on fine-tuning compute and $500 on OpenAI credits to generate their 52,000 sample instruction training set.


This is why it's not possible to slow down or "stop" AI: once the problems are solved the solutions turn out to be trivial to replicate. All it takes is compute.


you say all it takes compute like that is trivial - chatGPT would have a hard time without Microsoft's support via Azure


There are lots of places to get compute, including Chinese cloud providers...

The genie really is out of the bottle now.

This is a lot like pharmaceuticals. The initial investment in a new medication is enormous. The price of each pill is trivial, to the extent that every drugstore chain is able to supply a generic in-house brand.


Maybe. Certainly in the past, before the world was aware LLMs on the level of ChatGPT were possible with today's technology. OpenAI's chosen not to release any real details about GPT-4, so we don't actually know what it would take to train a model of equivalent quality, especially considering training isn't a one-shot. Multiple training runs easily add up training costs. So training for a 12-figure parameter size model(s) (175B) is assumed to be very expensive. But there has been great progress made for optimized models which are smaller by a two orders of magnitude - 7B for a debatable drop in quality (7B alpaca is in no-way competitive with ChatGPT, but it's still very much not a markov chain from during the AI winter). So one possibility is that OpenAI chose not to release salient GPT-4 details is due to it being much smaller than GPT-3's 175B model size and they're hiding the details because of how much that cuts down on training costs. (Which I should note is unsubstantiated conjecture but not outside the realm of possibility.)

The other aspect is that fine-tuning an existing model is way cheaper than creating a competing model from scratch, so a company could offer CompetitorGPT/CompetitorCoPilot competitive with GPT-3.5, and offer fine-tuning of that model trained on the source code repository of the purchaser company's codebase, possibly on-prem or at least inside their AWS VPC/Azure/GCP equivalent.

The other thing to note is that OpenAI is hosting ChatGPT as a public resource available to anyone with an account, akin to Google being open to the public from day one (although that is without an account. Maybe Gmail is a better comparison). I can't say for certain, only OpenAI would know for sure, but I'm willing to bet that inference for ChatGPT is the vast majority of their costs (which is all but trivial). Any private internal-only instance of OpenChatGPT (using the unlicensed leaked LLaMA model or a legal copy or someone else's) could be paying (relatively) minuscule training costs, and way lower inference costs if it's internal-use only. Whether that cost can be borne by a small SaaS company's existing AWS budget is up in the air, which is to say ultimately that you're right - ChatGPT would be difficult without the support of Microsoft via a huge Azure grant, it's less obvious that a self hosted internal-only OpenChatGPT, not from OpenAI, would be possible by hobbyist self-hosters with a prosumer GPU cluster (Say with last generation K80's instead of business-priced A100's), or by a company wanting to leverage LLMs for private use by that company that wants to provide a Copilot like productivity multiplier internal tool to their developers, without sending private source code to OpenAI in lieu of a privacy agreement with them.


> OpenAI's chosen not to release any real details about GPT-4

Actually, they have release some details about it, in this 99-page technical report https://arxiv.org/abs/2303.08774 (which is actually two papers stitches together, once you read it; oddly enough using different fonts).

But I'm not sure if this content qualifies as "real details".


The intro to that paper specifically says:

> Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar. We are committed to independent auditing of our technologies, and shared some initial steps and ideas in this area in the system card accompanying this release. We plan to make further technical details available to additional third parties who can advise us on how to weigh the competitive and safety considerations above against the scientific value of further transparency.

In other words, "Stable Diffusion wasn't supposed to happen, so we're making all our methodology trade secret[0], if you want to Do Science then agree to this massive NDA and have enough skin in the game for us to cut you."

[0] Presumably at some point OpenAI will have to 'relent' to independent discovery by patenting AI architectures and refusing to license them


I've been using the Chat GPT-4 model, and openAI has been putting warnings about max queries per N hours. Given the degree to which they're limiting access (up until they crashed today, they'd dropped to 25 queries / 3 hours), I suspect GPT-4 is actually much, much larger, and they just don't have the computational resources to support its use at the same level as GPT-3.5 or GPT-3.5 Turbo.


You could be right! I don't claim access to any private OpenAI information so any theories by me are based on what's known publicly, which isn't much for GPT-4. I do want to call attention to the difference between training runs and inference runs (post-training usage of the model). If each training run costs mid six-figures, CompetitorGPT is going to have to be well-funded and likely sponsored by AWS/GCP (eg Deepmind) just to train up the model, given that it's probably not a one-shot. If it's much lower due to optimizations in training, on top of only having to fine-tune the model on a company's codebase instead of training the whole model from scratch each time, then I could see a company selling the service of creating CompetitorGPT or CompetitorCoPilot seems like it could be a very worthwhile investment, by companies that are willing to invest in such services for their developers. (Eg companies that are willing to pay Splunk's exorbitant costs vs one that would rather burn time self-hosting a graphana setup. Not to impugn graphana, but it's very much a home-grown, open source self-hosted deployment. Managing a Splunk cluster is also far from free, it's just that not all companies are willing to bear the yearly licensing cost for it and would prefer to self-host graphana solely for cost reasons even if TCO including the opportunity cost makes it more expensive in the long run.)


While that's true, it's basically inevitable now that at some point personal hardware will be powerful enough for enthusiasts to run home bots comparable to GPT-3, and even that by itself would drastically change a lot things.


Running isn’t necessarily the issue. The moat is creating a high-quality model like OpenAI has, which (and here the article is mistaken) doesn’t seem to be easily reproducible.


That's going to get easier too. Stanford can already get this far for $600, so soon after the major GPT-based chat AIs were released. Imagine how much better it will get with just a little bit more time.


While that's true, it also seems entirely predictable at this point how to do that. It takes a lot of effort and expensive hardware, but there isn't really a "secret sauce" beyond expertise in the field.


Yes, but it takes time (took OpenAI years) and significant effort. Who with enough expertise will do this and not keep the results closed in order to monetize them? It doesn’t seem like something an open source project could accomplish quickly enough to not keep lagging substantially behind the commercial solutions.


Sounds like IBM trusting no one would copy their BIOS code.


Governments can ban powerful devices, as they can ban guns, bombs, and such.


Only at a price. If you ban devices powerful enough to run ChatGPT, you ban a big chunk of what powers your economy.


Governments have experience limiting the spread of digital content. For now at least, AI proliferation is not immune to those same tactics.


Governments are really bad at limiting the spread of digital content.


Literally only China somewhat succeeds, and that's because everything is top-down controlled by the government.


I warned about this for years. Finally an article gets it right.

Everyone will soon have the equivalent of online nuclear weapons: bot swarms that infiltrate every forum, including this one.


Spam has existed on the internet for a long time.


This is different. It can act just like humans do for most people who skim comments won't be able to tell the difference.

Note this was in 2020: https://www.technologyreview.com/2020/10/08/1009845/a-gpt-3-...

And here's 4chan bot: https://www.youtube.com/watch?v=efPrtcLdcdM

I can tell you that HN is probably already being infiltrated as well.

SPAM can't gang up on you in a forum and downvote you and turn your friends against you and destroy your reputation within 1 hour online. But soon, it will. The web as we know it is soon going to be over.


How good are these bots against CAPTCHA?


My pet theory was that AI would come out of spam bots.

Close enough.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: