Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Three major LLM releases in 24 hours (simonwillison.net)
129 points by helloplanets on April 10, 2024 | hide | past | favorite | 69 comments



For those primarily interested in open weight models, that Mixtral 8x22B is really intriguing. The Mistral models have tended to outperform other models with similar parameter counts.

Still 281GB is huge. That's at the higher end of what we see from other open weight models, and it's not going to fit on anybody's homelab franken-GPU rig. Assuming that 281GB is fp16, it should quantize down to roughly 70GB at 4bits. Still too big for any consumer grade GPU, but accessible on a workstation with enough system ram. Mixtral 8x7B runs surprisingly fast, even on CPUs. Hopefully this 8x22B model will perform similarly.

EDIT: Available here in GGUF format: https://huggingface.co/MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF

The 2-bit quantization comes to 52GB, so worse than my napkin math suggested. Looking forward to giving it a try on my desktop though.


Does anyone know what's up with the models that aren't available in Europe? There isn't any transparency over this.


They probably aren't up to GDPR standards. Take from that what you will. It's the typical reason why they don't release here.


I think it is mostly for these reasons:

* It's complicated so it takes a while and you need lawyers and such to make it right

* Rules for training are probably hugely vague and undefined. Because you could ingest personal data and it cannot be deleted

* AFAIK it needs to be hosted in Europe (not directly GDPR related, but america has laws that allows them to spy on all traffic in the US, so this is somewhat the counter to that)

In the end from my experience just working at a company that needs to be compliant this usually means:

* All the services need to be hosed in EU including 3rd parties we send any data to

* There needs to be a way (email is enough) to delete user data (including from 3rd parties which need an endpoint so you can trigger it from your side)

* You need to inform the user about the data useage and allow them to opt out of the "usage" of this data for non-essential things (i.e marketing emails). This does not mean you cannot save this data if you also use it for other things, but you can not use it for the non-essential case.

* You could be in trouble if you save data "just because" and do not use it for anything essential or if it is not transparent to the user.

Not a lawyer. Just the things I notice in my day to day. In the end companies need data protection professionals to navigate these things. Which is probably another thing a startup does not worry about it early on.


How can a model not be up to GDPR standards? Or are you talking about the services that provide those models? Genuinely interested.


I would imagine that they hoovered up a lot of data from the internet to train it and don't know if it's private or not. They can't guarantee that the model won't print your home address if you ask it the right way.


In that case it wouldn't matter where they make it available. GDPR protects EU citizens' data everywhere, including in US.


I'm not really sure the law could work that way. I mean in the sense that they may have codified that but realistically I think a nation's laws ended their borders typically


GDPR applies globally. If you handle EU citizens data, you must conform to it.

Of course you might not care, if you believe that the EU has no way of forcing you to pay a fine and if you are certain that you are never going to do any business in the EU. But in such case you might as well provide access to your models for EU IPs too. It makes no difference.


how can a law apply globally? Ignore the idea that EU can fine you, under what basis do laws of a country apply to actions outside the country?


Yeah, but the EU can't do anything to you if you don't sell to the EU in the first place.


Then why block the access, if you are fine with violating EU law because they have no way to make you pay fines?

The model access makes no difference.


To make sure they don't bother chasing you in the first place, because that would mean more work for you.


With people feeding more and more delicate questions (e.g. medical, mental health etc), which can lead to more trouble with what may be stored temporarily etc, GDPR definitely has an impact here (developers must take special care to be compliant etc).


Any personal information which gets into those models makes them incompatible, by my (IANAL) reading.

https://gdpr.eu/what-is-gdpr/

"Personal data" and "data processing" are (deliberately) drawn very broadly:

"""Personal data — Personal data is any information that relates to an individual who can be directly or indirectly identified. Names and email addresses are obviously personal data. Location information, ethnicity, gender, biometric data, religious beliefs, web cookies, and political opinions can also be personal data. Pseudonymous data can also fall under the definition if it’s relatively easy to ID someone from it.

Data processing — Any action performed on data, whether automated or manual. The examples cited in the text include collecting, recording, organizing, structuring, storing, using, erasing… so basically anything."""

And, unlike the arguments about copyright in big AI models trained on the internet (is it 'fair use'? Don't ask me, IANAL!), the requirement for explicit and informed consent is something a general crawl will very clearly fail:

"""Purpose limitation — You must process data for the legitimate purposes specified explicitly to the data subject when you collected it."""

Furthermore, we don't know enough about how the models store knowledge/beliefs to be able to make any claim about accuracy:

"""Accuracy — You must keep personal data accurate and up to date."""

And as for confidentiality… for downloadable models, that's "by obscurity" only, due to the exact same research needed to resolve the previous point about accuracy, and even for secret models like GPT-4, nobody's really sure how to actually guarantee it won't leak info with the right prompt, and there's even some suggestion that this is actually impossible with current approaches because nothing is really deleted by RLHF:

"""Integrity and confidentiality — Processing must be done in such a way as to ensure appropriate security, integrity, and confidentiality (e.g. by using encryption)."""


Ok, thanks. But does not providing - the personal data you can not have or process - to the EU market, make your position any better, legally?


If it's trained on data covered under the GDPR then it means you're 1) a "data processor" of that information and 2) there's a risk of it reproducing that information in some form.

In the case of #1, they probably do not have an agreement with the "data controller" in the case of scraping, which means #2 is a violation of GDPR.

IANAL.


No, it wouldn't help they don't publish it in EU. They would be in violation with GDPR either way.


Yeah it's fear of GDPR, which is kind of like a retroactive set of standards: "we'll know an infringement when we see it", type vibe. Which of course is kryptonite to innovation, and ultimatly will lead to a more fragmented internet.

As a European I to try see it from both sides, consumer protections are generally a good thing, but it right now being restricted by EU vagueness sucks ass because I just want to play with the cool new toys.


But how can you explain OpenAI works in EU but latest google AI is not yet here.

What is OpenAI GPT4 and Google bard/Gemini EU version not doing so they work in EU but the latest Google AI is doing so google is incapable of putting it in EU ?

Maybe latest one is more invasive with ypur personal account? Scanning your personal data without consent ?

Because seems to be a super simple business, user sends you a prompt, you run the prompt and send the result back, you keep nothing unless necessary for the service to work and give the user the ability to purge their history/data .


Open AI is all about LLMs, that's their entire product and core competency, so withholding access in such a big market isn't worth it to them. Sure, there's a risk, but there are far bigger risks facing Open AI now than GDPR noncompliance.

Google is in a completely different situation. Gemini is, so far, a niche product, making little to no money for Google. However, due to how the GDPR works, a mistake with Gemini could cost the entire company dearly, impacting the profits they make from any other products.

This is a more general principle. If you're a small startup that manufactures foos, and there's a risk that manufacturing Foos infringes on Apple's patents, well, that's just one of the risks you have to bear as a startup. If you get wiped out by a lawsuit, you're established as a limited-liability corporation for a reason, if the fine is larger than the worth of your company, you can just declare bankruptcy and go away. You can't decide to stop manufacturing Foos, as that's your entire business.

If you're the size of Google, getting into the Foo-manufacturing business is not worth it. If Apple sues you, that won't just impact the Foo-manufacturing side of your business, the fine can be significantly larger than any profits you ever hoped to make from the venture, not to mention the brand damage, strain on resources etc. If Foo manufacturing is ultimately going to be a side business for you, it's probably not worth it to be the hill you die on.

This is why it makes sense for startups to "move fast and break things™", and for big corporations to require countless legal reviews on the smallest of decisions.


Google AIs are already in EU, so should I conclude that the new AI is much more invasive? Or for some reason the google lawyer team has big backlog ?


Probably the latter.

I wouldn't be surprised if there's a long and arduous procedure to clear a new product for GDPR compliance, and the PM had a choice between launching now and excluding the EU or launching later for everybody. This is also what originally happened with Bard.


this would suggest that the new AI is different. I still get Bard/Gemini new feature updates in EU but not the new model yet. So the lawyers have time for this new features but not for the new model, so the new modl must bring bigger privacy concerns.


I can't explain it. It feels like a pre-emptive fuck you from a few different companies towards the EU.

I'm glad that Microsoft / OpenAI is confident enough to put out a global consumer facing gen AI offering. Maybe they received EU assurances, or maybe they have enough legal muscle to tank any possible hit, and eating Google's lunch is a prize worth the risk.


GDPR is all about being clear about the data you are collecting, what you are planning to do with it, and is what you are doing with it legitimate.

Lots of companies come unstuck because they fall into the trap of “let’s just collect everything and see what we can do with it”.

Or, I’ve got all this data I’ve collected legitimately. Who knew that you could sell it on to some data broken and make loads of money - let’s do that!

Or, I’ve collected all this data, I’m just going to keep it hanging around, oops I just put it on a public bucket and leaked it all. Hmm, I’m not even sure what data we had, have we just compromised a bunch of people? Who knows…


Also, GDPR compliance is forcing companies to collect and store data they wouldn't have been collecting otherwise, such as to be compliant in regards to child safety you must find out which users are children. Data itself can be seen as a somewhat radioactive commodity, requiring exquisite handling, and creating new reputational and security risks.

It's not supprising that many smaller companies are saying fuck this noise.


There are two sides on this coin of course.

If by design you can do whatever pleases you, then yes you have a lot of innovation. But sometimes it leads to normalisation of troubles (e.g. data leaks in the US), and incredulity of the general public ("how did we even get there")?

There are good reasons to ponder ethics in the original balance too, it hasn't got to be completely paralysing either. But this comes with a cost (e.g. typically, for any data-sensitive work these days complying with GDPR, a significant part of the design & implementation time is "are we compliant").


This comment always gets downvoted, but I have seen this happen in my previous company. They hired an expensive lawyer from Europe for GDPR compliance and even his suggestions didn't made sense and in the end we decided to geoblock Europe. e.g. EU didn't clearly banned consent rejection requiring more effort and just skirted around it and that's why every company have two step rejection and one step acceptance.

They could have easily made law requiring sites accept DNT header but they didn't likely because of lobbying.


The first part of your comment lacks understanding of how and why the consent rejection has been worded as it has. If you want to comply it is easy. But if a company wants to skirt the regulation it is written such that the regulatory body can still get you for making it harder.

DNT is not relevant as GDPR is not directly a regulation against tracking, and it certainly isn't because of lobbying.


Not only I don't understand it, the top tier European law firm the company hired also didn't understood it. The law ALLOWS companies to skirt around it. I don't know if that's intentional or not.

> If you want to comply it is easy.

That's the entire antithesis of modern law as opposed to monarchy. Law should be codified in as clear rules as possible.


pure speculation

If I'm to venture a guess, it's probably because data protections are stronger and they want to avoid potential issues should someone test GDPR (or whatever the applicable law is) by asking specific data be removed from the model


They're worried about GDPR with chatbots, but e.g. Claude is available via API.


How hard is it to ask for informed consent?


Pretty easy, simply bombard the user with consent forms every request until they click on "accept all".


They’d actually have to make sure to correctly inform the user.


Regulation too cumbersome, not worth it.

What is the downvote coming from, isn't this just facts? If not for the regulation, why would EU be shunned?


OpenAI doesn't have this issue so it begs the question which part of the regulation is not compliant and why, if it's just Google being lazy, or if they actively do something sketchy and don't want to stop.


Google is a much bigger target than OpenAI from EU regulations perspective. They are going after Google


I don't know. I have an organisation account on Anthropic to use Claude 3, and the credit card is norwegian, the phone number is norwegian, the email finishes with a .no, the country in the address says Norway, and the business tax id is a norwegian VAT number. Sounds like they actually don't mind the regulations for businesses.


What is 2% of Anthropic's global revenue, and what is 2% of Google's global revenue? The penalty for Google getting it wrong are much much higher. Anthropic can get away with moving fast and breaking things (like the GDPR). Google has no such luxury.


If other entities do the same thing without problems it's most of the time a you problem.


[dead]


And this affects UK, How? The one not available is Gemini Pro and its not available in UK too according to the article.


GP was talking about Europe


The UK is still in the continent of Europe. Last I checked, it hadn't swum across the ocean to America...


I still don't understand why the UK is so central to this discussion. What did I miss ?


Because the EU have decided to make it extremely hard for Europeans to benefit from technological advantages trough their GDPR, Cookie Laws and soon AI Act.


Safety does make things harder for those that want to abuse us. We don't want technology at all costs.


There is nothing safe about cookie law or GDPR.

They are literally doing the opposite because they are asking you to commit to the terms and since everyone do you have actually consented to your data being used.

AI Act isn't solving anything that isn't already solved with existing regulation.

That so many people on HN seem to think this is a good idea is very puzzling.


Are any of these stable? I mean when using temperature=0, do you get the same reply for the same prompt?

I am using gpt-4-1106-preview quite a lot, but it is hard to optimize prompts when you cannot build a test-suite of questions and correct replies against which you can test and improve the instruction prompt. Even when using temperature=0, gpt-4-1106-preview outputs different answers for the same prompt.


> [...] but it is hard to optimize prompts when you cannot build a test-suite of questions and correct replies against which you can test and improve the instruction prompt.

I think this is because your approach isn't right. This tech isn't really unit-testable in the same sense. In fact, for many use cases, you may want non-deterministic results by design.

Instead, you probably need evaluations. The idea is that you're still building out "test" cases, but instead of expecting a specific result each time, you get a result that you can score through some means. Each test case produces a score, and you get a rollup score for the suite, and that's how you can track regressions over time.

For example, in our use case, we produce structured JSON that has to match a spec, but we also want to have the contents of that valid-to-spec JSON object be "useful". So there's a function that defines "usefulness" based on some criteria that I've put together since I'm a domain expert. This is something I can evolve over time, using real-world inputs that produce bad or unsatisfying outputs as new evaluations for the evaluation suite.

Fair warning though, it's not very easy to get started with, and there's not a whole lot of information about doing it well online.


This is what I do. I calculate a score over a sample of questions and replies. I'm not doing unit tests.

Comparing the scores of two prompts will not give you a definitive answer which one is superior. But the prediction which one is superior would be better without the noise added by the randomness in the execution of the LLM.


> Comparing the scores of two prompts will not give you a definitive answer which one is superior.

Yes, but it can tell you which is likely to be superior, which is perhaps good enough?

Offline evals are only a part of the equation though, which is why online evaluations are perhaps even more important. Good observability and a way to systematically measure "what is a good response" on production data is what ultimately gets us closer to real truth.


Wouldn't be for reproducability the usage of a seed be a better fit?

[0] https://platform.openai.com/docs/api-reference/chat/create#c...


I just tried the same prompt twice, both with

    'temperature': 0,
    'seed': 1,
And I got two different replies.

The 'system_fingerprint' in the reply was the same in both of the json responses. So it seems that even when you get the same 'system_fingerprint' back, replies for the same prompt will not be the same.


Aren't these models non-deterministic by nature?


no, they run on computers. anything that happens in a computer can be made reproducible


I'm not sure. Googling for "gpu non-determinism" and reading around the results seems to indicate that GPUs are often not deterministic.

I don't know why, but it is easy to imagine a scenario where non-determinism makes sense. For example the computer could say "I am supposed to do this floating point division. Let's see ... my very exact FP-Division unit is pretty hot at the moment. 82°C. Let's pass this calculation to the slightly less exact FP-Division unit then".


cpus can be nondeterministic as well, generally because of race conditions between different cores, and as i understand it, that's what happens with gpus too, just more often because there are more cores

i think the kind of scenario you're describing does not occur in commercially sold hardware, because it poses problems not only for customers but also for verification engineers at the design house. also, in the specific case of floating-point division, it would violate the ieee-754 standard. it might happen in an experimental academic project


Does Gemini have a prepaid mode?

I like that both OpenAI and Anthropic default to the prepaid mode; I can safely experiment without worrying about selecting a large file by mistake (or worse, a runaway automated process).


Cohere’s Command R+ is unimpressive model, because it agrees with me every time I try to argue with smth like: "But are you sure? ..."; also it has: "last update in January 2023".

Mixtral 8x22B is interesting because 8x7B was one of the best (among all others) for me few months ago (in particular, common knowledge, engineering and high-level math, multi-lingual skills like translation, grammatically nicer rewritings)


I haven't tried it yet, but if the model itself isn't impressive, that 128k context window is. That's the largest I think I've seen for any open weights model.


One of the most attractive features about Mistral open models is that you can build a product on top of their API, and switch to a self hosted version if the need arises, such as customer requesting to run onprem due to privacy requirements, or the API service being taken down.


Using Mistral together with groq is amazing.

The reason to be able to migrate is for me personally a huge plus.


i'm interested to hear what kinds of things you're doing with it!


which groq is that,the chip or the gpt?


is the point of system prompts just to avoid prompt injection? or are they supposed to get better outputs too?

I never have found a need for them. i.e. the example in the article Just prompting like:

Write hello 3 different ways in spanish works fine for me


They let you at least partially separate instructions from data. This is useful for things like "Translate this text to French" - you don't want any instructions in the text you are translating to interfere with that goal.

If this was 100% robust then it would also solve prompt injection, but sadly it isn't.


I guess this is what the singularity looks like?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: