A very nice fact that I coincidentally learned just yesterday, which is not in the linked answer but is touched on in one of the comments and in an edit on the accepted answer: there is an explicit bijection that maps a binary tree to seven binary trees (and vice-versa, by the inverse function).
Consider the definition of a binary tree: "a binary tree is either empty, or is a root with a left tree and a right tree". This we can write as
T = 1 + T²
Solving the analogous equation z = 1 + z² in the complex numbers gives z to be a primitive sixth-root of unity (https://www.wolframalpha.com/input?i2d=true&i=z+%3D+1+%2B+Po...), i.e. z⁶ = 1, and so z⁷ = z. It turns out that we can similarly show T⁷ = T with some algebra (without subtractions) by repeatedly applying the transformations T→1+T² or 1+T²→T (a "game" version is at http://lebarde.alwaysdata.net/seventrees/):
Monday was looking boring. I'm excited now about the new adventure I've set for myself to somehow inject this (new for me) term into a discussion appropriately.
The particular example is not the point. The question you should be asking is
> would I ever have a need to understand what happens when you perform algebraic manipulations on data structures
and the answer is as usual that it depends on what you're working on. Parsers? Absolutely. CRUD apps? Yes, but only if you look for them. Numerical linear algebra? Probably not.
I upvoted just for this breath of fresh air right here. CRUD apps are boring and crappy because people make them boring and crappy. I find uses for this type of thing all the time in "boring" CRUD apps, because I look for them.
This is like giving an artist a new shade of color they've never had before, like a new rich blue or something. It isn't that everything is improved by using that new color as much as possible everywhere. It's that it opens up new possibilities you may never have considered before. This insight may be useless, but there are other useful ones that can emerge that you wouldn't have thought of on your own in a long time.
The fascination is more that of explanation: if you treat T = 1 + T^2 as an algebraic equation in the complex numbers, you can get a lot of expressions that are meaningless: it's hard to make sense of T = 1/2 ± i√3/2 or T^3 = -1 or even T^6 = 1 (there is certainly no bijection sending six trees to the empty tree). But then, T^7 = T is suddenly very much meaningful and we can turn it into an explicit description. What's going on? Which equations are "meaningful" and which are not?
Blass's paper gives an explanation: an equation is meaningful if you can derive it by working in (something like) N[T]/(T = 1 + T^2) (with some more conditions). Baez's discussion (linked above) puts it in terms of "categorification", and is also interesting.
Sharding a huge database index (or other binary tree structure) onto 7 different servers or hard drives? Or 49 (recursively apply it)? I don't know if that's actually useful here but it's the first thing that came to mind.
as someone that's self taught, i really wish math was taught via stronly typed programming languages.
Math is so full of loosey-goosey context dependent ambiguous notation. It makes it easy to have a very terse expression communicate a powerful idea, but it makes it impossible to just parse an arbitrary expression.
Just as an example, look at the maclauren series as defined in the link. It uses f-dot, f-double-dot, then f^(n), having to note that (n) is the nth derivative rather than exponentiation. You have syntax switching and an ambiguity in the same expression.
The issues you bring up aren't really like types, though. Expecting every domain of math to use the same notation and is more like expecting every programming language to use the same syntax.
Derivatives are a bit of a weird case because some really intuitive and useful sub-domains only really need the 1-2 derivatives in time. So, hitting them with the whole (dy/dx)^(N) is overkill.
> it makes it impossible to just parse an arbitrary expression.
All language suffers from this. Without context (assumed by the reader, provided fully or at least referenced by the writer) all arbitrary expressions are impossible to parse. Tell me what this means:
map f xs
If you guessed it was mapping the function f over the sequence xs, maybe you're right, maybe not. Maybe a fool wrote it who can't communicate and chose bad names. Maybe an adversarial programmer (wants job security, or is just a dick) uses `map` to mean `reduce` or `mapreverse` (so that it does most of what you expect, but produces a reversed output from what you expect).
Math isn't any different here, don't try to parse arbitrary expressions until you've identified the context. Maybe a fool used notation foolishly, maybe an adversarial mathematician used notation to obfuscate and has, hidden away in their diary, a key that explains it all. But most of the time the context will be provided directly or indirectly. The notation will be explained in the text or the text will be in a field with notational conventions that the reader is expected to know. Same with code, same with every other human language.
Sure but I can load any program with that statement into my editor and mouse over each word to view the details or hit F12 to see the definition :p If any of those symbols are undefined, the editor or compiler will tell me.
Without context it's not a running program and I don't really care about single lines of code in isolation. It's just as bad as math symbols written in isolation of any context but that's my point.
> If any of those symbols are undefined, the editor or compiler will tell me.
I'm pretty sure every single language in the world will tell me that statement in isolation will not compile.
> Without context it's not a running program and I don't really care about single lines of code in isolation. It's just as bad as math symbols written in isolation of any context but that's my point.
We are either in violent agreement or you are failing to actually convey your meaning.
To take a step back: The original commenter complained about not being able to parse arbitrary expressions. I responded by pointing out that that's not a math specific thing, and that every language has this problem if there is no context. My overall point being that with context that "arbitrary" math expression would become less arbitrary as the context provides meaning which permits parsing. Since they seemed to be a programmer (or have some familiarity with it, asking for a strongly typed language) I provided an expression you might find in some program, and presented it without any context, that was intended as an analogy to their situation with math. Again, trying to emphasize that you need context to get meaning.
You seem to be agreeing with me (you need context for the expression to have any meaning), but your comment itself comes off as argumentative. So what's your disagreement with my point?
As somebody with a PhD in math, I really wish that math was taught via strongly typed programming languages. There is so incredibly much low-lying work done through deep layers of abstraction that it can take hours or even days to digest a short paragraph in a field adjacent to one's expertise.
Of course, that's the dream. The reality is that people are making very strong efforts at computer proofs, and the languages I've worked with in that space are highly un-ergonomic. On the other side of the chasm, calculus of types is largely relegated to category theory which is its own onion of complexity. To my awareness, Haskell is currently the sweet spot. In the hopes of being proven wrong, I'd guess that we're at least a century away from a Grand Unified Notation that allows a fluid translation between what the mathematician is thinking and how the computer can compute.
He started by getting undergraduates to tackle formalising the maths they did in Lean, and his success with this as a didactic tool has spread the idea among quite a lot if mathematicians.
Yes, progress is under way! It's been a few years since I tried to plumb these depths, it's probably time to look again. But I don't need to look to know that we're not there, where "there" means both accessible and widely adopted.
> There is so incredibly much low-lying work done through deep layers of abstraction that it can take hours or even days to digest a short paragraph in a field adjacent to one's expertise.
Indeed, once you start to think about formalizing a research-level proof, it brings into focus the sheer number of trivial cases and natural transformations that a standard informal proof in a paper omits.
> I really wish math was taught via strongly typed programming languages
I'll use this opportunity to recommend Conrad Wolfram's "The Math(s) Fix" that outlines his Computer-based Math approach. In this book he asks (and answers) what would school look like if we taught mathematics as if computers exist.
Those are very familiar notations to anyone who has taken undergraduate math classes. There's some ambiguity but it's fairly easy to dispel with, like, a cheat sheet.
More importantly they are familiar intimately to those active in the field. And often reflect an underlying ideas that are useful in the field.
People often write equations out in full and marvel at the ludicrous clunkiness of it all. Usually because they expect that will help them understand a concept. That doesn't work.
The notation often suggests a manner of thinking about an idea that is more useful than seeing the idea spelled out.
Of course you probably should have spent some time peering into the blackbox of notation and physicists are probably most notorious for not caring to.
Not the guy you're replying to although I subcommented when I ought to have replied to you.
The notation often suggests a manner of thinking about an idea that most won't be able to see in the expanded notations.
As such those familiar with the field often abstract away the minutia.
I always heard a rather unfair terse statement about science. If you have to read the articles introduction section, you probably shouldn't be reading the article. That's wildly unfair but does reflect that academia is mostly concerned with academics in the same field.
No? Of course people learn in different ways, but it's the people have already learned in those different ways who are in a position to discuss the merits of those different ways (obviously taking into account the understanding that different things work for different people).
The people who haven't learned it and are complaining about how daunting it seems can certainly have opinions but their opinions aren't very useful and interesting.
The issue is that your comments have been full of presumption, which I personally find rude (in addition to being unuseful and uninteresting)
For example, it's not clear that you've considered that the people you're replying to have learned the subject matter and are reflecting on what would have made the process easier for them.
I did indeed presume the OP didn't know this stuff. For one thing: because in my experience it was pretty easy to pick up for everybody who took classes on it. (There are all kinds of notations that weren't, though; I just don't think was one of them.) On re-reading that presumption might not hold though. My mistake.
In general my reaction is a sort of counter-reaction to a HN pattern that I don't like. Every HN post has someone say "this is so stupid" on it, about something (all kinds of things usually). I really hate that. So my flinch response is to say "this objection is also stupid". Not helpful and just as bad as what I'm criticizing, really. I kinda regret it.
Somewhat kooky afterthought: I don't like those initial critical nitpicky reactions because they sort of.. encode.... a rather unlikeable philosophy that is always in subtext here, which is that everything can be solved by building something, or doing something 'right', and regular human effort to solve human problems, with its human flaws, is worthy of scorn. Never cared for that. I think that's why I react negatively to it. Like in this case: those notations are actually good for a bunch of reasons. Strongly typing it? Try that for twenty minutes and you'll see why it's gonna be worse (I mean... probably. maybe I'm wrong! glad to learn). Or maybe every blog post define its terms more carefully for people who didn't learn from school? I dunno, I had a blog and tried that and it makes the posts crap, filled with meticulous definitions. At some point you just have to assume some things about your audience, and let them google what they don't know.
Just as an example, look at the maclauren series as defined in the link. It uses f-dot, f-double-dot, then f^(n), having to note that (n) is the nth derivative rather than exponentiation. You have syntax switching and an ambiguity in the same expression.
That’s nothing. Study real analysis long enough to investigate sequence spaces and you’re looking at sequences of sequences. The notation for those will often use subscripts and superscripts as indices at the same time! But generally you’re not dealing with derivatives or exponents at the time so it’s not too bad.
Math always felt very random and arbitrary to me. Especially the fact that proofs are essentially just prosa and people believe in them or they don't. And that's why we ended up with proofs that essentially were correct but still had flaws and errors in them for a long time. Or proofs that actually turned out to be wrong in the end.
You can give a try formalizing some basic math with proof assistants. With such an attempt, you will discover that 90% of your time will be spent transforming types into other types and working through obvious transformations.
And when you are done, the least obvious steps (for your reader) will be hidden in a forest of obvious steps. Suppose that you want to communicate to someone how to get to B from A. You don't tell them "lift your foot in this manner, then lift your other foot in this other manner, go a little left when you see this small obstacle", etc... you assume that they have competence at navigating commonplace obstacles, and give just the directions that they should take at intersections.
But for math proofs, we shouldn't assume that people have competence of common basic parts. At least not to truly help them gain a good understanding of the constructive proofs.
To me, the example is rather to tell them get into the car then press the button and turn the wheel like this and you'll reach your destination. But they won't understand the car and I think they should, because at some point the need to, unless they never dig deeper - but then maybe we don't need to teach those theoretical things from the beginning.
> But for math proofs, we shouldn't assume that people have competence of common basic parts.
Well, it does depend on the audience you're writing for. So research math doesn't do that, whereas first-year undergrad math makes an effort to. I think it's essential for first-year math programs to have a course where you redevelop basic induction, natural numbers, addition, etc. from first-order logic and rules of inference and ZFC, which I don't think is true everywhere. And there should be an aspect of it that's about converting that formal proof to informal proof given some assumptions about the reader. Beyond this, it starts to get tedious.
Wow, you articulated something I've always been frustrated with, but was never able to put into words. I've always been convinced that most of (thought not all) the difficulty in learning math is because of sloppy and ambiguous explanations.
Unfortunately, even simple undergraduate proofs can be monumentally difficult or even impossible to translate into a type theory due to inherent limitations of the theory or the type checker.
The quote in the footnote[1] says that `f` means something different on the two sides of the equation ∂f/∂x = ∂f/∂u ∂u/∂x + ∂f/∂v ∂v/∂x.
Why is this true? I think about ∂/∂x, ∂/∂u, and ∂/∂v as higher-order functions that take the same `f` as their argument. What is wrong about this thinking?
The signature of the function on the left hand side is f0(x) and of the one on the right hand side is f(u,v). The latter function signature is not f(u(x),v(x)), that would be a very grave mistake (and it would make the chain rule useless).
On the right hand side, it is not lambda x: f(u(x), v(x)) that the differential operator ∂/∂u (or ∂/∂v) is applied on—because the former would be a function of x.
(I use only x rather than (x,y) for simplicity so you can see where a big problem would lie)
Be aware that the notation in ∂f/∂x = ∂f/∂u ∂u/∂x + ∂f/∂v ∂v/∂x is already very simplified and the actual whole thing would be (in somewhat more modern Leibniz notation--as opposed to the one in the footnote):
Where (as is usual) the (...) without space in front (sigh) denotes function calls, and the ∂/∂x f0 makes a new function (with the same formal parameters as the original f0).
Note that the function call is done on that new function (this order of precedence is the usual in mathematics). Also, the multiplication (⋅) is done after the function calls, on actual numbers. And the + here adds numbers.
Also, it is assumed that the two functions f0 and f are connected via the equation f0(x) = f(u(x),v(x)), which is implied. (the right one is called function composition and could be written f0(x) = fo(u,v)(x), but that's an entire extra chapter to define that)
Understanding what exactly is going on where in here is already the basis for fluid dynamics.
The footnote tries to have globally unique names, so you have funny things like u = g(x,y). Ugh. For programmers, it is normal that the formal parameters of a function f (those are u and v) and the global function u are different things, so we don't need to do what the footnote did.
Note: There's also Lagrange's notation for differential operators--but in my opinion it's useless for functions of more than one parameter. There, the first-order derivative of f would be f'. Chain rule of our example cannot be written there as far as I know. The best you could do is f0'(x) = f???(u(x),v(x)) ⋅ u'(x) + f???(u(x),v(x)) ⋅ v'(x) where the ??? is what I wouldn't know how to write. Maybe f.x and f.y or something--but that's totally arbitrary.
Note: The names of the formal parameters have no meaning outside of the function (right?), so it makes sense to just have a "differentiate first argument" operator D1 and a "differentiate second argument" operator D2 instead. Then the chain rule is: D1 f0(x) = D1 f(u(x),v(x)) ⋅ D1 u(x) + D2 f(u(x),v(x)) ⋅ D1 v(x). Again, the function calls are only made after you got the new function from the D1 or D2.
Also, evaluating an expression like f(u(x),v(x)) works just like it would in C.
Be aware that the definition of u(x) itself cannot contain u (that is very important), only x. So f(u(x),v(x)), even if written in symbolic equations, will NOT contain u or v in the end, only x-es.
For example if u(x) := 5 ⋅ x and v(x) := 3 ⋅ x and f(u,v) := 7 ⋅ u + v, then f(u(x),v(x)) = 7 ⋅ 5 ⋅ x + 3 ⋅ x. No u or v left.
In software, it's very common to have someone with broad experience in a lot of fields need to jump in and understand these 20 lines of code that are causing some issue because of a change in business logic requirements. To have a temp or newbie be able to quickly close a Jira ticket, you absolutely want verbose, strongly typed, descriptive text. You want to optimize for comprehension by readers rather than convenience for writers.
Mathematics, though, is a language developed by academics for academics, who will spend a long time iterating through variations on the same pattern. When you're years into your post-grad, looking at the same tiny slice of mathematics over and over again, it's obvious what f^(n) means - you've probably written that and thought about it thousands of times - but yeah, it does suck for outsiders.
It's really annoying when you do the maths equivalent of Stack Overflow copy-pasting.
Often I'll see some maths equation in a paper or blog post or whatever that I want to use in software, but I don't understand the notation and so I can't use it without spending god knows how long learning all the context first.
As someone who has graduate level math education and about 25 years working as a professional programmer, I emphatically disagree.
A type system is itself a programming language. If it is going to implement complex enough logic, it will be Turing complete. It does not magically become clearer simply because it ideas have been expressed in a type system. For an amusing demonstration, read through https://aphyr.com/posts/342-typing-the-technical-interview. That is a working program that solves the 8 queens problem entirely in Haskell's type system.
And the type systems that you are used to are simpler in part because they are NOT Turing complete. This limits what you can say in them. In this case, for example, the author was unable to express higher derivatives as types.
Now perhaps your desire is for unambiguous machine-checkable logic? If so then I highly recommend learning something like https://coq.inria.fr/. It is a worthwhile enterprise, and there are serious mathematicians who say that it, or something like it, should be implemented for all of math. However one of the objections to it is that it is harder to express and understand yourself in such systems. So unambiguous machine-checkability is a win, but not necessarily the one you thought it was.
And then we come to the real issue. Are you optimizing for use, or for the person who doesn't understand the subject? It is literally impossible to think about complex things without a concise notation. The actual notation that we have has a lot of history and rough spots. But verbosity impedes comprehension. So what makes life easy for the novice and expert are exactly opposite. Also most of the use of the material is by experts, who are also who the notation is designed to be most helpful for. However even for students it helps. So much so that learning the notation generally pays off within a course or two. That is, learning the same ideas WITHOUT the notation would have been harder than learning both notation and ideas. So notation merely creates an obvious barrier, but then helps with the real one.
I know you don't believe me. You've told others that it is about what works best for you. But with all due respect, you can't have any idea what actually works best for you until you've actually mastered a sufficient amount of the subject. Which you apparently haven't. Until then you are best off believing people who HAVE mastered the subject, and have spent sufficient time teaching and tutoring to know where the real barriers to understanding are for students. And everyone I know in that boat, including me, says that the notation helps more than it hurts.
Stuff like this is so cool. It reminds me of umbral calculus, another place where identities we’re used to from single variable calc apply correctly in other surprising places.
It’s especially amazing how we can invent “illegal operations” like the square root of a datatype (or the square root of minus one) and prove that we’ll get the right answer in the original domain if it exists. I’d bet I miss lots of opportunities to take advantage of inventing nonsensical operations to solve hard problems.
It seems like a lot of the logic we use for reasoning about complex systems has the same combinatorial underpinnings (e.g. continuous systems in calc, recurrence relations and polynomials elsewhere, and algebraic datatypes in OP). I’d love to see an intro to all these variations of “calculus” written up from those combinatorial foundations that applies to all of the different applications, rather than via epsilon-delta derivations that only apply to continuous systems.
Calculus already has types. The type of a function is f:R -> R.
If X and Y are normed linear spaces and f:X -> Y the type of the (Frechet) derivative is Df:X -> L(X,Y), where L(X,Y) is the space of linear operators from X to Y.
The linked discussion is actually about performing calculus-like operations on type-theory types. (A natural next step after observing the similarities between algebraic types and algebra.) It's not about imposing a type system onto traditional calculus.
I had a different read on that but I am familiar with the link to Haskell zippers. People seem to assume if someone wrote something about Haskell they must be smart.
Their "derivative" of trees implicitly assumes the cartesian product A x B is isomorphic to B x A.
Just like the Frechet derivative for non-commutative Banach spaces, the derivative of T^2 is not 2 T dT, it is T dT + dT T.
Consider the definition of a binary tree: "a binary tree is either empty, or is a root with a left tree and a right tree". This we can write as
Solving the analogous equation z = 1 + z² in the complex numbers gives z to be a primitive sixth-root of unity (https://www.wolframalpha.com/input?i2d=true&i=z+%3D+1+%2B+Po...), i.e. z⁶ = 1, and so z⁷ = z. It turns out that we can similarly show T⁷ = T with some algebra (without subtractions) by repeatedly applying the transformations T→1+T² or 1+T²→T (a "game" version is at http://lebarde.alwaysdata.net/seventrees/): This can be used to give an explicit bijection: see Andreas Blass's paper "Seven trees in one": https://arxiv.org/pdf/math/9405205 (via https://twitter.com/CihanPostsThms/status/157901771352705024... — see also https://math.ucr.edu/home/baez/week202.html which has more.)