FWIW: I created a github repo for compact zero-knowledge proofs that could be useful for privacy-preserving ML models of reasonable size (https://github.com/logannye/space-efficient-zero-knowledge-p...). Unfortunately, FHE's computational overhead is still prohibitive for running ML workloads except on very small models. Hoping to help make ZKML a little more practical.
I was under the impression that, for any FHE scheme with "good" security, (a) there was a finite and not very large limit to the number of operations you could do on encrypted data before the result became undecryptable, and (b) each operation on the encrypted side was a lot more expensive than the corresponding operation on plaintext numbers or whatever.
Am I wrong? I freely admit I don't know how it's supposed to work inside, because I've never taken the time to learn, because I believed those limitations made it unusable for most purposes.
Yet the abstract suggests that FHE is useful for running machine learning models, and I assume that means models of significant size.
The difference between homomorphic schemes and fully homomorphic schemes is that FHE can be bootstrapped; there's a circuit that can be homomorphically evaluated that removes the noise from an encrypted value, allowing any homomorphic calculation's result to have its noise removed for further computation.
My understanding is largely ten years old and high level and only for one kind of fully homomorphic encryption. Things have changed and there is more than one kind.
I heard it described as a system that encrypts each bit and then evaluates the "encrypted bit" in a virtual gate-based circuit that implements the desired operations that one wants applied to the plaintext. The key to (de|en)crypt plaintext will be at least one gigabyte. Processing this exponentially larger data is why FHE based on the system I've described is so slow.
So, if you wanted to, say, add numbers, that would involve implementing a full adder [0] circuit in the FHE system.
Both of these are correct-ish. You can do a renornalization that resets the operation counter without decrypting on FHE schemes, so in that sense there is no strict limit on operation count. However, FHE operations are still about 6 orders of magnitude more expensive than normal, so you are not going to be running an LLM, for instance, any time soon. A small classifier maybe.
LLMs are at the current forefront of FHE research. There are a few papers doing some tweaked versions of BERT in <1 minute per token. Which is only ~4 orders of magnitude slower than cleartext.
This paper uses a very heavily modified version of an encoder-only BERT model. Forward pass on a single 4090 is cited there at 13 seconds after switching softmax out for a different kernel (21 seconds with softmax). They are missing a non-FHE baseline, but that model has only about 35 million parameters when you look at its size. At FP16, you would expect this to be about 100x faster than a normal BERT because it's so damn small. On a 4090, that model's forward pass probably runs at something like 100k-1M tokens per second given some batching. It sounds like 6 orders of magnitude is still about right.
Given individual LLM parameters are not easily interpreted, naturally obfuscated by the diffuse nature of their impact, I would think leaning into that would be a more efficient route.
Obfuscating input and output formats could be very effective.
Obfuscation layers can be incorporated into training. With an input (output) layer that passes information forward, but whose output (input) is optimized to have statistically flat characteristics, resistant to attempts to interpret.
Nothing like apparent pure noise for obfuscation!
The core of the model would then be trained, and infer, on the obfuscated data.
When used, the core model would publicly operate on obfuscated data. While the obfuscation/de-obfuscation layers would be used privately.
In addition to obfuscating, the pre and post-layers could also reduce data dimensionality. Naturally increasing obfuscation and reducing data transfer costs. It is a really good fit.
Even the most elaborate obfuscation layers will be orders and orders of magnitude faster than today's homomorphic approaches.
(Given the natural level parameter obfuscation, and the highly limited set of operations for most deep models, I wouldn't be surprised if efficient homomorphic approaches were found in the future.)
Moore's Law roughly states that we get a doubling of speed every 2 years.
If we're 6 orders of magnitude off, then we need to double our speed 20 times (2^20 = 1,048,576), which would give us speeds approximately in line with 40 years ago. Unless my understanding is completely off.
The rule of thumb is "about a 100000x slowdown". With Moore's law of 2 years that means it would operate at speeds of computers from about 40 years ago. Although really that's still making it seem like it's faster than it is. Making direct comparisons is hard.
Let's admit for a second that the problem around computational cost is solved and using FHE is similar to using plaintext data.
My question might be very naive but I'd like to better understand the impact of FHE, discussions here seem to revolve very much around the use of FHE in ML, but are there other uses for FHE?
For example, could it be used for everyday work in an OS or a messaging app?
There's no value to it in circumstances where you control all the hardware processing data, so "everyday work in an OS" - only if that OS is hosted on someone else's hardware, "a messaging app" - only if you expect some of the messages or metadata to undergo processing on someone else's hardware.
It seems wildly unlikely that the performance characteristics will improve dramatically, so in practice the uses are going to remain somewhat niche.
> There's no value to it in circumstances where you control all the hardware processing data
But what about the case where you don't have so much control about what runs next to your program? Could it be possible for an attacker to run a program in order to extract some data when your program is run?
Also, could FHE offer some protection against vulnerabilities like Meltdown and Spectre?
> It seems wildly unlikely that the performance characteristics will improve dramatically
Why? Are there some specific signs for this already? I had the impression that everytime people tend to believe that with technology they get proven wrong later.
The tipical, and also most useful, example use case for FHE is running computational tasks on some cloud service without having to trust it. And yes, it would provide protection against Meltdown and Spectre (if performed on the hardware running the computation), as the attacker would be able to only extract encrypted data.
The data has to be decrypted at some point in order to display it... unless we're envisioning FHME hardware in the monitor as well - honestly I think we're well across the threshold into fantasy already though.
Of course the data has to be decrypted, but in this case you would decrypt it on your client machine, so that you don't need to trust the cloud provider or other third parties using VMs on the same server (side channel attacks can sometimes be exploited from another VM running on the same hardware, although this is rarely considered as part of one's threat model).
What is the computational burden of FHE over doing the same operation in plaintext? I realize that many cloud proponants think that FHE may allow them to work with data without security worries (if it is all encrypted, and we dont have the keys, it aint our problem) but if FHE requires a 100x or 1000x increase in processor capacity then i am not sure it will be practical at scale.
Oh. It really is that bad still. So if the question is between wrapping the plaintext in layers of security, or building out a million new server instances to do it via FHE, i know which one everyone will choose.
Accelerators are being developed that claim to get down to 10x, though i think they will be more like 100-1000x, which would still be a huge improvement considering how people use LLMs today for basic tasks like string matching.
Are those accelerators software-only? 10x could let 4$ VPS run server side checks for backup software (evil clients cant clean backups) and git forges (eg, dont allow X to push to main).
You may wish to build a protocol where third parties can asynchronously operate on user data. You may also want to have separation between the end app and the compute layer for legal or practical purposes. Finally, you may not want to store large payloads on client devices.
I'm giving you general reasons why this is the case. For our own app, we hope to build a protocol where third parties can operate async on user data (with consent).
Any program which you apply FHE to needs to be expressed as a circuit, which implies that the time taken to run a computation needs to be fixed in advance. It's therefore impossible to express a branch instruction (or "if" statement, if you prefer).
The circuits are built out of "+" and "×" gates, which are enough to express any polynomial. In turn, these are enough to approximate any continuous function (Weierstrass's approximation theorem). In turn, every computable function on the real numbers is a continuous function - so FHE is very powerful.
Differentiability isn’t a requirement for homomorphism I don’t think.
Homomorphism just means say I have a bijective function [1] f: A -> B and a binary operator * in A and *’ in B, f is homomorphic if f(a1*a2) = f(a1)*’f(a2). Loosely speaking it “preserves structure”.
So if f is my encryption then I can do *’ outside the encryption and I know because f is homomorphic that the result is identical to doing * inside the encryption. So you need your encryption to be an isomorphism [2]and you need to have ”outside the encryption “ variants of any operation you want to do inside the encryption. That is a different requirement to differentiability.
1: bijective means it’s a one to one correspondence
2: a bijection that has the homomorphism property is called an isomorphism because it makes set A equivalent to set B in our example.
ReLU, commonly used in neural networks, is not differentiable at zero but it's still able to be approximated by expressions that are efficiently FHE-evaluable. You don't truly care about differentiability here, if you're being pedantic.
Very insightful comment, though. LLMs run under FHE (or just fully local LLMs) are a great step forwards for mankind. Everyone should have the right to interact with LLMs privately. That is an ideal to strive for.
I was surprised that for almost 300 pages there were only 26 references listed in the back. Not the end of the world by any means, clearly a ton of work went into this, but I find it useful to see from references how it overlaps with other subjects I may know more about
There were (at least) two posts from arxiv.org on the front page at the time, and when I was updating the title on the other one I must have applied it to this one instead. I've fixed it now and re-upped it onto the front page so I can have its full exposure on the front page with its correct title.
reply