Perhaps another way of asking the question is: Are there any results, either about individual programs or in PL theory as a whole, that were made simpler / clarified / generalized because of category theoretic insights?
Yes, but it's much, muvh faster and more productivr to just learn Haskell's type system (mainly Functors, Monads and Applicatives) as 3 individual "design patterns" than it is to figure it how you can even begin to apply the Yoneda Lemma to whatever programing problem you have in front of you.
Category Theory has birthed some useful abstraction, but you don't really need any of CT to learn and use the abstractions.
The basics of Abstract Algebra on the otherhand are absolutely worth the bit of time it takes to learn them.
Groups, semi-groups, rings, fields, monoids - distribution, identity, zeros, associstivity, communitavity are pretty trivia to learn - most people already know the underlying concepts and they pop up all the time in programing.
Monoids are incredibly useful - most people already know them, and intuitively use them, but it's helpful naming and understanding the pattern.
Category Therory just doesn't have the same return on investment for software development.
There is a very useful perspective in which categories are just typed monoids, and the monoid operation can only be applied when the types "line up". For example, here are some useful operations which do not form a monoid:
- PUSH(n) where n is a floating-point number
- POP
- CLEAR
- ADD, SUB, MUL, DIV
- ID
These can be interpreted as operations on a stack of floating-point numbers in the obvious way, PUSH(1.2) * [3.14] == [1.2, 3.14], POP * [1, 2, 3] == [2, 3], ADD * [1, 2, 5] == [3, 5], CLEAR * [1, 2, 3] == [], ID * [1, 2, 3] == [1, 2, 3], etc. However, not all of the compositions of stack operations are legal. For example, ADD * PUSH(1) * PUSH(2) is fine and equivalent to PUSH(3), but ADD * PUSH(1) * CLEAR is illegal.
Ok, so our stack operations don't form a monoid. But they obviously can still be composed, sometimes, so what do we have if not a monoid? They form a category! There is one object for each natural number, representing the height of the stack. So there are arrows like PUSH(3.14) : Height_{n} -> Height_{n+1} for all n, and POP : Height_{n} -> Height_{n-1} whenever n >= 1, and ADD : Height_{n} -> Height_{n-2} whenever n >= 2.
Another common example is matrices. Square matrices form a monoid, but what about arbitrary rectangular matrices? They don't form a monoid, but they do form a category where the objects are natural numbers, and the arrows N -> M are just the MxN matrices. You can't multiply any two matrices, but if you have a P -> Q matrix (QxP) and a Q -> R (RxQ) matrix then you can multiply them to get a P -> R matrix (RxP).
Monads are a traditional stumbling block for Haskell newbies, including me when I was one. I found that those articles I linked demystified them more than any number of "monad tutorials" ever could do.
The thing about questions like this is that the complexity of mathematical explanations is highly dependent on what each reader considers "simple." Consider two different approaches to understanding a concept, expressed in information-theoretic terms:
This additional complexity layer of categorical framing has a nonzero cognitive cost, which varies based on the learner's intuitions and background. The investment in learning categorical thinking only becomes worthwhile when the learner can amortize its cost by envisioning its broad applicability - when they can see how categorical frameworks enable systematic problem decomposition and generalization. This implies they've already recognized the limitations and redundancies of non-categorical approaches, understanding intuitively how many concepts would need to be reinvented as the problem evolves within its conceptual framework (gestell).
However, there exists a middle path that often goes unrecognized as categorical thinking, despite embodying its essence. This approach involves incrementally discovering generalizations -- "oh, this can be generalized with this type" or "oh, if I wrap this in another type I can do something else later on" or "oh this syntactic sugar for this particular operator overload feels quite nice"
Yes, there are a number of them. Here are some examples off the top of my head:
- Moggi was studying the problem of equivalence of programs, and noted that the traditional approach to modeling a program as a total function Input -> Output is problematic. He pioneered the use of monads and Kleisli categories as a foundation for reasoning about equivalence of real programs, including all the real-world nastiness like non-termination, partiality (e.g. throwing an exception that kills the program), non-determinism, and so on. https://person.dibris.unige.it/moggi-eugenio/ftp/lics89.pdf
- Linear logic (and it's close relative affine logic) was the inspiration behind Rust's ownership model, from what I understand. Linear logic was originally described in terms of the sequent calculus by Girard (http://girard.perso.math.cnrs.fr/linear.pdf), but later work used certain categories as a model of linear logic (https://ncatlab.org/nlab/files/SeelyLinearLogic.pdf). This answered and clarified a number of questions stemming from Girard's original work.
- Cartesian-closed categories (CCCs) form models of the simply-typed lambda calculus, in the sense that any lambda term can be interpreted as a value in a CCC. Conal Elliott pointed out that this means that a lambda term doesn't have just one natural meaning; it can be given a multitude of meanings by interpreting the same term in different CCCs. He shows how to use this idea to "interpret" a program into a circuit that implements the program. http://conal.net/papers/compiling-to-categories/
- There is a classical construction about minimizing a DFA due to Brzozowski which is a bit magical. Given a DFA, do the following process twice: (a) get an NFA for the reverse language by reversing all edges in the DFA and swapping start / accept nodes, then (b) drop any nodes which are not reachable from a start node in the NFA. The result will be the minimal DFA that accepts the same language as your original DFA! Bonchi, Bonsangue, Rutten, and Silva analyzed Brzozowski's algorithm from a categorical perspective, which allowed them to give a very clear explanation of why it works along with a novel generalization of Brzozowski's algorithm to other kinds of automata. https://alexandrasilva.org/files/RechabilityObservability.pd...
- I would also put the development of lenses in this list, but they haven't leaked very far outside of the Haskell universe yet so I don't think they are a compelling example. Check back in 5 years perhaps. Here's a blog post describing how lenses relate to jq and xpath: https://chrispenner.ca/posts/traversal-systems
- I've personally had success in finding useful generalizations of existing algorithms by finding a monoid in the algorithm and replacing it with a category, using the fact that categories are like "many-kinded monoids" in some sense. I haven't written any of these cases up yet, so check back in 2 years or so. In any case, they've been useful enough to drive some unique user-facing features.
This comment pairs very well with the recent thread on Heaviside’s operator calculus. He got ahead of theory, did groundbreaking work, and was ultimately superseded by more principled approaches. I think this illustrates a kind of intellectual leapfrog. Sometimes we do things that are right, but it’s not clear why they work, and other times we have theoretical structures that open up new realms of practice.
I just realized I botched the description of Brzozowski's algorithm, step (b) should be "determinize the NFA using the powerset construction". Mea culpa.
Perhaps another way of asking the question is: Are there any results, either about individual programs or in PL theory as a whole, that were made simpler / clarified / generalized because of category theoretic insights?