There's a backstory to this paper from the r/math subreddit [1]. One of the authors initially posted the question there, and stumbling upon a similar question on math overflow emailed Terry Tao. And surprisingly, he responded.
Discovered, sort of. His method was nearly the same as the one brought to Tao, but he didn't make the connection to the full correlation between eigenvalues and eigenvectors when dealing with real numbers. Tao used the method to explicitly prove this correlation when it was discovered by Denton, Parke and Zhang.
Suppose you have a learning algorithm that requires the eigenvectors/eigenvalues for a large matrix. Suppose new data is streamed in that increase the size of the matrix. well, now you have an algorithm for updating your existing solution instead of needing to compute the entire thing from scratch. (consider updating principal components over time for example).
Suppose you want to accelerate the process of computing eigenvectors for a large matrix. Well, you can now do parallel reduction relatively trivially given the theorem.
Not to downplay the pure and applied mathematics implications. But the immediate discovery was made in the context of calculating neutrino oscillation probabilities. Yielding insight into matter / anti-matter imbalance in the known universe ;)
Eigenvalues: the Rosetta Stone for Neutrino Oscillations in Matter
Don't get me wrong, I like your cites - but at the end of the day, math is math. Either it works or it doesn't. If it works, apply it where you can! I've got a sneaking suspicion a PCA based analyzer for lidar generated point clouds is going to be a lot faster.......
Can you go into more detail? I don’t see how that would work. In this algorithm, you need the eigenvalues of all of the minors. So if I’m adding rows/columns to the matrix, yes I have one of the minors from my last solution, but I have to compute N more. And I also need the eigenvalues of the new full matrix.
I think this result has some intriguing opportunities, and have been toying with them since yesterday’s post, but I don’t see something so straightforward.
I obviously haven't worked it out myself, but I think you can memoize the results from the smaller minors to accelerate the computation of the big one. So you would need to decide for yourself the speed/memory tradeoff.
I think what the parent comment meant is it can be recursive. If you update one row, a lot of these recursively small minors of minors won't change, and you probably can do the computation quicker.
Yes but whether such conditioning is important will depend on your domain. In many cases, just knowing the axis and two sample points is enough to fully determine the orientation.
So it's been checks notes 9 years since I've had to deal with eigenvectors and eigenvalues. So the post went way over my head, but folks on twitter were making quite a big deal over the post last night when it went up on arXiv. Could someone explain this like I have a computer engineering bachelors degree? :)
So, you have some infinite space, and now you transform it with a linear transformation A - that could include mirroring, rotation, stretching (but not translation - one of the properties of linear transformations is that the origin, zero, stays at the origin).
Now, there could be lines through the origin that map to themselves under this transformation. These lines are called eigenvectors of A, and they characterise the transformation A. (If you mirror something, a line in the mirror plane maps to itself. If you rotate something, the rotation axis maps to itself. If you stretch something, a line in the direction you stretch in might map to itself. And so on.)
Now, these lines map onto themselves, but they might be stretched or shrunk in the process - that's given by the Eigenvalue.
The famous equation is A x = lambda x: You transform eigenvector x by A, and you get back x itself, but stretched by a factor lambda (the Eigenvalue).
So, if you have some mapping of 5d space to itself, for example, it can be characterised by 5 eigenvectors (lines that map unto themselves), and 5 eigenvalues (the stretch factor for each of these lines).
Now, "traditionally", you compute the eigenvector by solving a 5d linear system for each of the eigenvalues. (5 eigenvalues, 5 slightly modified linear systems, each gives you an eigenvector of 5 coefficients, for a grand total of 25 eigenvector coefficients).
Here, it is shown that you can compute the eigenvectors of the full system without even solving that traditional linear system, but by instead taking the eigenvalues of a slightly modified smaller system. (5 eigenvalues, 5 slightly modified smaller systems M (that you get from A by striking out the i'th column and row). Each of those has 4 Eigenvalues, so you have a total of 5 + 4x5 = 25 Eigenvalues. Now, you compute some differences and products and ratios of those numbers, and hey presto, you get your 25 Eigenvector coefficients (thus the title, "Eigenvectors from Eigenvalues").
Pretty neat, though it's unclear to me how the computational complexity really compares.
More like a gold sovereign in Piccadilly. I only skimmed, but it's a sweet result, and something an insightful undergrad could have come up with, but never did.
Linear algebra for large scale "muh big data" applications has had a mini renaissance in recent years. Haven't seen any this cute and basic though.
Don’t know how simple you want me to go,
so apologies if required.
You’ve got a matrix A, ie a 2D array of numbers.
You’ve got a vector v, that is a 1D array of numbers.
You can multiply A by v, written Av, to get a new vector.
Matrices can therefore be viewed as devices for taking a vector and giving a new one, ie as linear transformations. You can design a matrix to rotate vectors, stretch them etc.
An eigenvector of a matrix is a vector that changes only by a constant factor c, when you apply (multiply by) the matrix. So Av = kv where k is a number. For example, if k=2 then applying A just doubles every number in v. Eigenvectors are stretched or squashed by the matrix, no funny business like shearing or rotating.
An eigenvalue is the “k” mentioned above, ie there’s at least one eigenvector such that Av=kv. There could be multiple eigenvectors all with the same k, ie all changed in the same simple way by applying the matrix.
Ok? Slight complication: matrix A may contain complex numbers, not just your common garden variety real numbers. But the above all still applies.
“Symmetric matrices” are matrices that have a symmetry across the top left to bottom right diagonal. So the number at A[i][j] is the same number as that at A[j][i].
“Hermitian” matrices are the equivalent of symmetric matrices when we’re dealing in complex numbers. Rather than A[i][j] being equal to A[j][i], if one entry is a complex number then the other entry is the “complex conjugate” of the first entry.
(In case you’ve forgotten complex numbers: we extend the real numbers to including numbers of the form r + zi, where r is the “real part” (just a real number) and “zi” is a real number z multiplied by i, where i = sqrt(-1). Imaginary numbers were introduced to allow us to solve equations we couldn’t otherwise solve, but have turned out to be super useful everywhere. The “complex conjugate” of an imaginary number is obtained by flipping the sign of z, eg (2 + 3i) -> (2 - 3i).)
Anyway, so Hermitian matrices are “symmetric” in this way and turn out to be super useful in physics. They also have the property that when you multiply a Hermitian matrix H by a vector v, the possible eigenvalues are real numbers, despite the fact that H may contain complex numbers.
What the paper states is a relationship between the eigenvalues and eigenvectors of a Hermitian matrix H and the eigenvalues of a smaller “submatrix” matrix H’, where H’ is just the same as H but with a column and row removed, where the index of the row and column is the same. This is neat, because it can be used to speed up the calculation of eigenvectors (as important as say array sorting in CS) in some specific situations. Commenting further is beyond me.
Basically, this allows you to compute eigenvectors from eigenvalues of minor matrices. It is an interesting identity, but speculating on its applications is beyond my expertise.
Somewhat related to some of the questions that have been raised here:
- for the set of matrices that possess them ("transformation matrices that only perform stretch/contract"), eigenvectors (with their associated eigenvalues) play a role quite analogous to the role primes play in the integer set. They provide a unique, identifying "spectrum" for said matrix. This is made explicit by eigendecomposition (spectral decomposition).
- with extension via singular value decomposition (SVD method) to any square matrix (e.g. "transformation matrices that might also shear, rotate"), certain operations such as exponentiation of the square matrix can performed very quickly once eigenvectors/eigenvalues have been obtained via the SVD method.
Wow... your comment is infact more useful pearl of wisdom for getting the intuition of Eigen-values/vectors.
Can you suggest some book/reference that best conveys this intuition ?
Somehow, Gilbert Strang's Linear Algebra does put me to sleep :/
As a graduate student at the time, it opened my eyes about a whole bunch of theorems that I thought I already knew. And it is a lot more approachable than a textbook.
It won an award for the best piece of expository writing on mathematics in 1996. It was a deserved win. :-)
Before you excitedly jump to coding a new algorithm to produce eigenvectors from eigenvalues, it's worth noting that:
. This only applies to hermitian matrices and is therefore of limited scope.
. The identity only produces the magnitudes of the coefficients of the eigenvectors. The signs of the coefficient sare still unknown.
. Even if knowing only the magnitude of the coefficients is somehow useful, this is still very expensive to compute - just like Cramer rule which is of great theoretical interest, but has very little practical use when actually computing a matrix inverse or a determinant.
None of the above diminishes the importance of the result, btw.
Came here to pretty much write these same notes. The fact that you need to know the spectrum of all the minors really limits the usefulness for solving general eigen* problems. But I can imagine that this could make a great starting point for all kinds of approximate calculations/asymptotic analyses.
Regarding the first caveat that you bring up though, whereas the problem statement says you need a Hermitian matrix I think the results should generalize to non-hermitian matrices. In particular take a look at the third proof of the theorem at the end of the article. The only assumption required here is that the eigenvalues are simple (which does not preclude them being complex/coming in complex pairs).
Protip: I had to read the second step in the proof a few times before I could see what was going on. Explicitly what you do here is to (i) multiple from the right by the elementary unit vector e_j (ii) set up the matrix vector inverse using Cramer's rule (iii) notice that this matrix can be permuted to have a block diagonal element equal to the minor M_i with the other block diagonal entry equal to 1 and the corresponding column equal filled with zeros (iv) Write the determinant in the block form, which simplifies to the expression involving M_i by itself then finally (v) multiply by e_j from the left to get scalar equation.
Despite the limited scope, this information is "out there" now, and could potentially be crucial for some future "useful" application. Not that I think you're trying to downplay it or anything, I'm just seeing this as another example of how the internet is still a good thing.
Since I’ve always been terrible at abstract linear algebra, I would request that someone do a simple problem, say finding the eigenvalues and eigenvectors of a 3 x 3 Hermitian matrix with three distinct eigenvalues. I’d also be curious under what circumstances the above method leads to a computational advantage.
I have a sneaky suspicion this finding can used to speed up expert system matching (e.g. Rete networks), which would affect rendering and compilation times for lots of software.
The entries of a matrix are eigenvalues of 1x1 minors, and clearly eigenvectors are a function of the entries.
That the result is a function of (n-1) minors is first-order information and would help clarify the default assumption that the result computes eigenvectors from the full Hermitian matrix's eigenvalues.
"The real question is whether the discovery is important or interesting. And whether it is surprising.
[...] OK, why does Wolchover claim that it is interesting or surprising? Because they gave that formula to Terence Tao, a Fields Medal winner, for him to say something and he didn't know the formula and even expressed doubts. [...] So that's how it works. If you want to promote a new chocolate candybar, send a formula – or the nutritional values from the package – to someone like Terence Tao, to a darling of the media. The popular science writers will do the rest."
I believe this is equivalent to the solution given by Gene Golub in 1973 for the constraint vector required to produce a specific set of stationary values in a linearly-constrained quadratic program where the constraint vector is unitary.
This is a good example of math being presented in an unnecessarily confusing way.
Values lambda_i(A) are introduced, but then in the formula also values lambda_i(M) are referenced for some M which is not A.
Basically, we have no idea what lambda_i(M) might mean from the statement of the theorem. Is the order lambda_1(M), lambda_2(M) ... related to the order lambda_1(A), lambda_2(A), ... How many lambda_i(M) exist? And so on.
Maybe this is all clear if you deal with minor matrices all the time, but when I see a statement like that I am not even interested in its proof anymore.
Okay, Donald Knuth might be upset that Terry failed to introduce the notion of a spectrum space in a Matrix Analysis paper to an audience that should know about it already.
If I tell you f(x) = y, do you get upset that I don't define the domain, range, codomain, notion of a function, and so forth? Or if I tell you that https:// prepends a good portion of URLs on the world-wide web, would you subject it to the same scrutiny? I hope note -- to require such would be unnecessarily limiting.
If you tell me that thing about https://, it's not math, it's just a conversation, so why would I care?
If you tell me that f(5) = 10, and then you tell me later on that now I can compute g(x) = f(x) - 4, for all x, from that knowledge, then I surely would wonder what you are talking about.
Just read your original opening statement. I think what everyone is telling you is that you were being unnecessarily hyperbolic towards an author that writes for a mathematical argument. If it offends you so much as you've indicated that you just "moved on," then so be it.
I still stand by my opening statement. Now, after the discussion here I understand what the theorem statement is saying. That's a lot of work I had to do to understand the theorem statement, and it would have been unnecessary, had the theorem been stated "syntactically" correctly.
Now why wasn't the theorem stated syntactically correct? If we had to state it in a mechanised theorem prover like Isabelle then we would extract the definition of the eigenvalues from the statement into a separate definition, like:
For a hermitian n x n matrix H, let lambda_1(A), ..., lambda_n(A) be the eigenvalues of H.
But of course, here we assume an order of the eigenvalues, and we don't really want that, because it is immaterial to what follows. So we rather have to write it like that:
Definition:
For a matrix H, let L(H) be the set of its eigenvalues.
Theorem:
If H is a n x n Hermitian matrix, then |L(H)| = n.
But, of course, this is not true, as |L(H)| < n is a possibility. So you would have to account for that somehow as well in your definition / notation.
So it is probably better to go back instead again, and explain the notation as follows:
Notation:
For a Hermitian n x n matrix H, let lambda_1(H), ..., lambda_n(H) be its eigenvalues (in an arbitrary order).
Theorem:
What Tao said.
The notation could still not be written in Isabelle like that, but at least it has proper scoping now.
No, the ordering of the eigenvalues of M doesn't matter, because we take their product (or rather, the product of their difference from a fixed number), and multiplication commutes. It's maybe not obvious, and as such I can understand auggierose's complaint to an extent, though it was maybe unnecessarily harshly phrased.
Yes, it is written for people who deal with minor matricies all the time. Academic papers are incremental materializations -- it is up to the reader to bring or build the requisite base understanding.
Well, I am sure the meat of the theorem is just in those eigenvalues of the minors. Which are not even defined in the statement of the theorem. That's not academic. That's just bad style.
He also didn't define eigenvalue in the theorem statement, but you don't seem to be criticizing that as bad style.
Part of the art of mathematical exposition is to walk the line between under and over explaining your process based on your audience.
If the intended audience is academic mathematicians or even just people with a math degree, then the notation is probably familiar. I imagine that's the target audience here.
I can now say exactly what I don't like about the theorem statement: That the definition of the eigenvalue notation was not well-scoped. This made it much harder for me to understand what is going on, and has nothing to do with the actual semantic meaning of eigenvalues. Caring about proper scoping is also something mathematicians usually do not, making everything more complicated than it has to be in the process.
They're expecting you to know the spectral theorem for Hermitian matrices. Since M_j is obtained by striking out the jth row and jth column, it too is Hermitian, so it has has n-1 real eigenvalues (possibly with multiplicity). The notation $\lambda_i(M)$ ought to refer to the $i$th eigenvalue of M in sorted order to be unambiguous, but this does not matter for the proof.
[1] https://www.reddit.com/r/math/comments/ci665j/linear_algebra...