Vladimir Arnold famously taught a proof of the insolubility of the Quintic to Moscow Highschool students in the 1960s using a concrete, low-prerequisite approach. His lectures were turned into a book Abel’s Theorem
in Problems and Solutions by V.B. Alekseev which is available online here: https://webhomes.maths.ed.ac.uk/~v1ranick/papers/abel.pdf. He doesn't consider Galois theory in full generality, but instead gives a more concrete topological/geometric treatment. For anyone who wants to get a good grip on the insolubility of the quintic, but feels overwhelmed by the abstraction of modern algebra, I think this would be a good place to start.
Looks like a nice book, but what's up with his assertion on page 148 (164 of the .pdf) that the integers don't form a group under addition?
If he defines integers as "natural numbers excluding zero," that seems goofy and nonstandard but also interesting. Is that a Russian-specific convention?
It seems like a typo where "integers" is used when the intention was to write "natural numbers". That is the solution to exercise 194 part a) which asked if the set of natural numbers is a field.
Whether 0 is a natural number is still fairly ambiguous; I remember being taught (1990s UK) to be specific about which definition was being used, or to prefer another name such as 'positive integers' or 'non-negative integers'
Let's take the Fundamental Theorem of Calculus as an example[0]:
f'(x) = lim_{h->0} {f(x + h) - f(x)} / {h}
This isn't the Fundamental Theorem of Calculus, it's the usual definition of the derivative of a function of a single real variable. The Fundamental Theorem of Calculus establishes the inverse relationship between differentiation and integration [0].
Unless you're Ramanujan, every mathematician has spent hours banging their head against a literal or metaphorical wall (or both!)
Ramanujan was no stranger to banging his head against the wall. My impression from Kanigel's The Man Who Knew Infinity is that his work ethic and mathematical fortitude were as astonishing as his creativity. For much of his career, he couldn't afford paper in quantity and did his hard work on stone slate, only recording the results. This could make it seem like his results were a product of pure inspiration because he left no trace of the furious activity and struggle that was involved.
From The Man Who Knew Infinity:
When he thought hard, his face scrunched up, his eyes narrowed into
a squint. When he figured something out, he sometimes seemed to talk to
himself, smile, shake his head with pleasure. When he made a mistake,
too impatient to lay down his slate pencil, he twisted his forearm toward
his body in a single fluid motion and used his elbow, now aimed at the
slate, as an eraser.
Ramanujan's was no cool, steady Intelligence, solemnly applied to the
problem at hand; he was all energy, animation, force.
Decimate is a word that often raises hackles, at least those belonging to a small but committed group of logophiles who feel that it is commonly misused. The issue that they have with the decline and fall of the word decimate is that once upon a time in ancient Rome it had a very singular meaning: “to select by lot and kill every tenth man of a military unit.” However, many words in English descended from Latin have changed and/or expanded their meanings in their travels. For example, we no longer think of sinister as meaning “on the left side,” and delicious can describe things both tasty and delightful. Was the “to kill every tenth man” meaning the original use of decimate in English? Yes, but not by much. It took only a few decades for decimate to acquire its broader, familiar meaning of “to severely damage or destroy,” which has been employed steadily since the 17th century.
The more language is allowed to drift, the harder it becomes to read old language. I think this is a particularly silly case, but in general, the complaint that people are misusing words shouldn't be met with "It's impossible to misuse words", which this argument implicitly is.
No one allows or disallows language to drift, there are no language enforcers. This argument is not “it’s impossible” but rather it’s pedantic to claim a word is misused, when it’s been used this way for hundreds of years and so the original definition is no longer applicable.
Someone could of course institute language enforcers for English, but I'm very skeptical about both the enforcement mechanisms, and the usefulness of even a successful enforcement.
Bodies like the Académie Française do try to promote language standards ('enforce' is probably not the right word). But I'm not sure how successful they are.
Your intuition's not bad. The expected value for the longest run of heads in N total flips of a fair coin is around log2(N) - 1 with a standard deviation that's approximately 1.873 plus a term that vanishes as N grows large. log2(10B) - 1 is approximately 32 and with that standard deviation, even a run of 100 in 10B flips is incredibly unlikely. For more info see Mark F. Schilling's paper, "The Longest Run of Heads" available here https://www.csun.edu/~hcmth031/tlroh.pdf.
It sounds like jongjong was probably using surrogate gradients. You keep the step activation in the forward pass but replace with a smooth approximation in the backwards pass.
I can't remember the name of the algorithm we used. It wasn't doing gradient descent but it was a similar principle; basically adjust the weights up or down by some fixed amount proportional to their contribution to the error. It was much simpler than calculating gradients but it still gave pretty good results for single-character recognition.
Yeah, I think surrogate gradients are usually used to train spiking neural nets where the binary nature is considered an end in itself, for reasons of biological plausibility or something. Not for any performance benefits. It's not an area I really know that much about though.
There's performance benefits when they're implemented in hardware. The brain is a mixed-signal system whose massively-parallel, tiny, analog components keep it ultra-fast at ultra-low energy.
Analog NN's, including spiking ones, share some of those properties. Several chips, like TrueNorth, are designed to take advantage of that on biological side. Others, like Mythic AI's, are accelerating normal types of ML systems.
Very cool project. If you haven't already, for JAX and PyTorch support take a look at the Python Array API Standard, https://data-apis.org/array-api/latest/, and see https://data-apis.org/array-api-compat/ for how to use it. If you have or can write everything in terms of the subset of NumPy supported in the Array API Standard, you can get support for alternative array libraries almost for free.
That looks like a valuable resource, thank you! I already mostly stuck to a subset supported by NumPy and JAX (because that's the array libraries I'm familiar with). I hope the other are not to far off...
If I had to give a loose definition of topology, I would say that it is actually about studying spaces which have some notion of what is close and far, even if no metric exists. The core idea of neighborhoods in point set topology captures the idea of points being nearby another point, and allows defining things like continuity and sequence convergence which require a notion of closeness. From Wikipedia [0] for example
The terms 'nearby', 'arbitrarily small', and 'far apart' can all be made precise by using the concept of open sets. If we change the definition of 'open set', we change what continuous functions, compact sets, and connected sets are. Each choice of definition for 'open set' is called a topology. A set with a topology is called a topological space.
Metric spaces are an important class of topological spaces where a real, non-negative distance, also called a metric, can be defined on pairs of points in the set. Having a metric simplifies many proofs, and many of the most common topological spaces are metric spaces.
That's not to say that topology is necessarily the best lens for understanding neural networks, and the article's author has shown up in the comments to state he's moved on in his thinking. I'm just trying to clear up a misconception.
I think the truly surprising thing is just how well floating point numbers work in many practical applications despite how different they are from the real numbers. One could call it the "unreasonable effectiveness of floating point mathematics".
There are many situations where you have something you want to compute to within low number of units in the last place, that seem fairly involved, but there are very often clever methods that let you do it without having to go to extended or arbitrary precision. Maybe not that surprising, but it's something I find interesting.
Wittgenstein himself states that the Tractatus is nonsense in its closing pages.
My propositions are elucidatory in this way: he who understands
me finally recognizes them as senseless, when he has climbed out
through them, on them, over them. (He must so to speak throw
away the ladder, after he has climbed up on it.)
I think you may agree with the Wittgenstein of the Tractatus more than you realize. My understanding is that his main goal at that time was to show that many of the classic problems of metaphysics which plagued philosophers for centuries or more are literally just nonsense. He didn't write the Tractatus to convince regular people though, but to convince academic philosophers of his time. He earned his fame by being somewhat successful. Rather than making a logical argument for his point, I understand his aim as stimulating his audience to think things out for themselves by offering them carefully crafted nonsense that gave a fresh perspective.
I think you just have no use for the Tractatus because you're not preoccupied with metaphysical questions.
reply