Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been trying to get it frontpaged because, despite it's length, it's perhaps one of the most startling papers of this decade. Sadly, it seems like the HN voting gestalt hasn't decided to upvote a paper that's the CS equivalent of breaking the speed of light:

"Generic Top-down Discrimination for Sorting and Partitioning in Linear Time" ->

http://www.diku.dk/hjemmesider/ansatte/henglein/papers/hengl...

(if you're daunted by an 80 page paper as I am, there is also a talk on it: https://www.youtube.com/watch?v=sz9ZlZIRDAg)

It is possible, with some proper insight and approaches, to sort general datastructures in linear time on modern computing hardware. The speed limit of sort is O(n) with some extra constant cost (often accrued by allocation). It works by decomposing and generalizing something akin to radix sort, leveraging a composable pass of linear discriminators to do the work.

There's a followup paper using this to make a very efficient in-memory database that one could easily generalize under something like kademelia and with care I suspect could make something like a better spark core.

http://www.diku.dk/hjemmesider/ansatte/henglein/papers/hengl...

I keep submitting and talking about this but no one seems to pick up on it. This paper is crazy important and every runtime environment SHOULD be scrambling to get this entire approach well-integrated into their stdlib.

Unsurprisingly, Kmett has already implemented it in Haskell (it generalized neatly under the dual of the applicative+alternative functor): https://hackage.haskell.org/package/discrimination



I'm happy to see this excellent paper mentioned. Fritz Henglein (the author) was my thesis supervisor last year, and I worked on developing some of his ideas further.

In particular, I generalised discrimination (and added a touch of linear algebra) to devise a simple multi-way join algorithm that computes the join of any number of relations in optimal time (in a specific technical sense). Such results have been obtained recently, but only with far more complicated algorithms.

Alas, the fully general proof of optimality eludes us, so nothing has been published yet.


Is there anything written about your multi-way join algorithm? We've done some really interesting implementations of Generic Join and the like and are always interested in hearing about other potentially better join algorithms.


It's only described in my thesis, which is not yet publicly available. (It should be, but the university thesis publication process seems to have a latency measured in years...)

The case for triangle joins is simple to describe, though. Suppose we have attributes A, B and C and finite maps (relations) R : A x B -> X, S : A x C -> Y and T : B x C -> Z. Now build tries for each so we have

abx : Trie A (Trie B X)

acy : Trie A (Trie C Y)

bcz : Trie B (Trie C Z)

We can join two tries using join : Trie k v1 -> Trie k v2 -> (v1 -> v2 -> v) -> Trie k v. Putting this all together we have:

triangleJoin : Trie A (Trie B (Trie C (X, Y, Z)))

triangleJoin = join abx acy (\bx cy -> join bx bcz (\x cz -> join cy cz (\y z -> (x, y, z))))

Compared to a database that doesn't know how to handle triangle joins the difference is absolutely stunning. The above one-liner managed to do in 5 seconds what took MySQL over an hour...


Is there any way I could get a preprint of your thesis?

I've been working on a project that sits somewhere between spark and riak and this actually might address a core issue I have both at the node and query dispatch level.


I can only do a humble and poor job of explaining this process by analogy to other algorithms. If you've got a better explanation I could rote memorize, I'd be very appreciative.


I can't say I have had much success explaining my thesis clearly to anyone except people from the same department, but I can try to give my understanding of discrimination from an implementor's perspective.

The succinct version is that all (discrete) data can be serialised as bit strings and those bit strings can be sorted in linear time by (say) radix sort.

There are two ideas at work here: that all (discrete) data structures ultimately are composed of primitive types that can be sorted in linear time, and that we should only examine each primitive value once.

However, to be fair it should be mentioned that the analysis uses a different machine model (the RAM model) than is usually employed in sorting (where all comparisons are considered to take constant time, which is arguably an inferior model of real computers if you are sorting e.g. strings of variable length).

To be honest, though, the original paper is so well-written that I have a hard time explaining it better.


You really improve your persuasion skills.

>Sadly, it seems like the HN crowd won't upvote a paper that's the CS equivalent of breaking the speed of light:

Comments like this will turn people off the rest of your post.


Different people have different reactions. That line is actually what piqued my interest.

Since AP CompSci in HS it's been hammered into students that any sort based on comparisons has a strict lower bound of O(n*log n).

And sorting is particularly important in search engines, of which I've been working on.

So, an algorithm that drastically improves the speed of sorting would actually open up a few more possibilities to consider.

Thanks for sharing the paper KirinDave, I plan to read it.


Check my submission history. I tried twice.

I dunno how I "persuade" more.


> I dunno how I "persuade" more.

In your original post, the first paragraph made you sound like a crank. You said the paper is like breaking the speed of light, but you don't even mention the topic of the paper. I almost stopped reading at that point. Similarly for the sentence towards the end saying that this is "crazy important". People can decide that for themselves. They can decide even better if you put stuff into context.

So to persuade more, for this specific post of yours, I would have suggested to replace the first paragraph by something like: "Here is a surprising paper that shows that you can sort general data structures in linear time!"


> In your original post, the first paragraph made you sound like a crank. You said the paper is like breaking the speed of light, but you don't even mention the topic of the paper. I almost stopped reading at that point. Similarly for the sentence towards the end saying that this is "crazy important". People can decide that for themselves. They can decide even better if you put stuff into context.

I literally mention the TITLE of the paper, itself precisely explaining its domain & method, directly after the statement and end it with a semicolon.

Let's be real here. The paper is a real dragon of a read. If you're not going to go past a single surprising sentence maybe it was pointless for me to mention it anyways.

> "Here is a surprising paper that shows that you can sort general data structures in linear time!"

I have done this twice, tried different tact twice more, and been downvoted or ignored every time. This is the new me, assuming that folks just don't know how their every artifice of computation is backed by sort. And I suppose... why would they? A great many people simply skip even the basic theory of computation as they join the industry now, and maybe that's okay.

But I say it precisely as I do to generate shock. It should be surprising. I've caught people interviewing for principal engineering positions at Google and Amazon off guard with this. It's very, very surprising.


> I have done this twice, tried different tact twice more, and been downvoted or ignored every time.

You should have tried this: "This weird trick by a dad sorts in linear time. Check it out!" Proven to work on so many ad-ridden clickbait websites, so why shouldn't it work on HN? ;-)


"I don't want to live on this planet anymore."


I don't know if your comment is humour in reply to my humour, or if you're actually kinda hurt. So let's err on the safe side.

The short comment "weird trick" that you replied to was just a joke intending to yield a smile to readers, including you. Actually I liked the "speed limit" way you previously used to submit the paper on HN.

That said, I skimmed through the article you mention and found it to look serious and instructive (from my experience getting a Ph.D. in computer science / robotics, yet nothing does not guarantee anything) yet needing to allocate a serious time slot for actual understanding. Many people, even on HN, don't upvote due to complexity, yet it was right to submit it. A number of other insightful comments were written in this thread, thanks for them. Also, your ELI5 explanations are interesting.

My current feeling is like: this sort/discriminator stuff is probably valuable, though it will start usage in demanding situations. It may also eventually be used, without their users even knowing, as a private implementation detail of some data structure in high-level languages. Wait and see.

Back to feelings, this planet has some drawbacks but all in all it's worth it. B612 is too small, we're better here. You can expect good things from HN an similar communities but don't expect too much. Try to refrain from complaining, this feeds the negative part of you and readers as human nature tends to stick bad karma to the ones who complain. Also, when disappointed try to not misattribute causes and favor doubt. Feed the positive part of life.


I understood your joke. I was making a reference to a famous episode if Futurama.

I can't bring myself to clickbait genuine science and math.


Complaining that people aren't paying attention to a sorting paper is kinda weird and makes people take you less seriously.

Try to write a blog post on Medium that breaks it down and submit it that way.


Why on earth would I write for Medium? Will they pay me?

I think it's weird that the entire industry is not burning a hole in the atmosphere as they run to implement this wherever they can. It's a very big deal.

I suspect the problem is the paper is 80 pages. But I did link to a youtube talk that covers all the core features.


On the contrary, I'm of the opinion that the GP's line you referenced is nearly the perfect hook for HNers. [1]

[1]: https://xkcd.com/386/

Also, you really improve your forgetting a word skills.


I have mixed feelings about this one. It certainly has inspired some interesting discussion. The paper itself is obtuse and difficult to absorb.

The methodology isn't _really_ new, but it isn't used frequently. While they showed it as competitive to commonly used sorting algorithms, situationally it can really shine and show significant performance benefits. I'm surprised they didn't show this in one of the graphs (or I missed it in my brief speed-thru).

I don't really see a future for this in standard libraries. I can totally see this methodology being used in the wild in a variety of systems where time = money or entities are looking for a small proprietary distinguishing edge.


The follow-up paper I linked is less massive and shows a use case that were I younger I would set out with as a startup.

I'm not sure why stdlibs for some languages shouldn't take this approach though. It's difficult, sure, but so is any new foundational tech. What do you see as the barrier?


complex to do right, limited usefulness, potential for a great deal of memory usage, lack of a common understanding of proper use cases.

I can totally see it as part of any number of extended libraries.

Just my opinion though. I would be happy to be wrong.


If I understand the talk correctly, this is just a way to assign objects a lazy list of shorts, and then sort them by that list using a recursive bucket sort.

I like his system for generating these lists when there's nested data. However, it's often as simple as concatenating the lists generated by the fields of the object.

The O(n) time bound isn't surprising to me, since each byte of information in the input can lead to at most one scan through the bucket array. Actual performance is obviously a lot better.

I'd still like to see this integrated into a more mainstream language. It would be an interface where all you have to implement is a generator that concatenates the generators of the fields of your class in the desired order.

I know how to do any combination of fixed-width numbers, structs, unions, strings, but I'm not sure how to handle tree-like structures, and something with cycles... I can't even think of a regular comparator for.

I assume this isn't this simple, so what am I missing?


>It is possible, with some proper insight and approaches, to sort general datastructures in linear time on modern computing hardware. The speed limit of sort is O(n) with some extra constant cost (often accrued by allocation).

There is a well-known proof from information theory that the problem of sorting n distinct items has a worst-case time complexity of Ω(n log n) fixed-bitwidth operations: (1) Since each of the n items is distinct, the average number of bits needed to encode each item is Ω(log n). (2) In the worst case, in order to correctly sort the items, each bit of each item needs to be read. (3) Therefore, sorting the list requires reading Ω(n log n) bits.

So, I'm not sure how to understand the claim that their algorithm operates in "linear time". Are they saying O(n) operations where each operation operates on O(log n) bits?

[Edit: See below response from kraghen: The time is linear in the total size of the data, not in the number of items. I'll leave this comment up in case anyone has the same misunderstanding that I had.]


Discrimination runs in linear time not in the number of items but in the total size of the data. If you have n items each of size k it takes O(kn). Conventional sorting often assumes that you can compare keys of size k in constant time and therefore gets O(n lg n) but a more honest analysis would yield O(kn log n) for (say) merge sort.


The bound on string sorting is typically written as O(n log n + D), where D is the sum of distinguishing prefixes, ie input length minus some fluff. Since D >= n log n we already have linearity on input length.


Are you saying that you can sort strings in O(n log n + D) using a conventional sorting algorithm such as merge sort? If so, I don't understand why D would be an additive factor implying that each string is only involved in a constant number of comparisons.

(I wasn't, by the way, only considering strings when discussing variable sized keys -- the beauty of discrimination is that we essentially reduce all sorting problems to string sorting.)


A conventional sorting algorithm such as Multikey Quicksort.

http://people.mpi-inf.mpg.de/~sanders/courses/algdat03/salgd...


When people say "conventional" sorting algorithms, they're usually talking about sorting algorithms based around pairwise comparison functions.

I note on slide 14 of this presentation, it looks like this is sort of the discriminator for selecting a better partitioning scheme. So it looks to me like this actually leverages a similar principle?

As we've seen in this thread and others, there are some other ways to measure and/or process that have different characteristics. Surely all of these deserve attention! So, thanks very much for sharing this.


Sorry, I was being imprecise. By conventional I meant comparison-based sorting functions that are polymorphic in the element type and thus not allowed to examine individual bytes.

Multikey Quicksort indeed looks like a special case of discrimination, exploiting some of the same principles.


Thank you, that cleared up my misunderstanding.


> There is a well-known proof from information theory [...]

Remember that this proof is talking in terms of the number of comparisons necessary to sort n items. The moment you stop comparing data, like in radix sort (or more generally, bucket sort), that all flies out the window.


> Are they saying O(n) operations where each operation operates on O(log n) bits

Yes, as in any other claimed complexity about an algorithm. According to your proof finding the maximum in a list is Ω(n log n), which isn't the commonly used measure.


This is a nice technique in theory, but in practice it isn't so fast, mainly because of the implementation details. Radix sort is actually faster than comparison sorts if implemented well.


Do you have benchmarks of Kmett's Implementation? I'm very curious.


What's the catch? Are there any preconditions on the input required?


All orderings must be specified as a reduction to a primitive order using the fact that if you have an equivalence relation on some type A and a reduction f : B -> A then you have an equivalence on B defined by x = y when f(x) = f(y).

Now, take the rational numbers. For the usual binary comparison we can simply define (a/b) < (c/d) as ad < cb. It's not obvious how to express this as a reduction to a primitive ordering (the answer is to compute the continued fraction).

In fact, I'm not aware of any systematic way of deriving discrimination-suitable orderings from binary predicates -- it might be an open research problem, as far as I am aware.


> In fact, I'm not aware of any systematic way of deriving discrimination-suitable orderings from binary predicates -- it might be an open research problem, as far as I am aware.

That'd be an even more remarkable discovery in light of the stuff you worked on though, wouldn't it?


It might, I never really discussed this aspect with Fritz! For my thesis I was mostly focussed on applications to database queries, and I never encountered any concrete examples of orderings that couldn't be dealt with in an obvious way using discrimination based sorting.


No. The catch is that it's hard to write it in a way that doesn't require very fiddly local mutation or explosive allocation.

The other catch is that no one has demonstrated that you can do it without a good static type system. It shouldn't be impossible, but there are a lot of challenges in an already challenging algorithm.


What are the constants hiding inside that 𝒪(𝑛)?


They can be bad. But so was merge sort in its naive implementation and folks worked that out. Radix sort sees a lot of use in the real world and it saves a lot of energy.


Given that I'm an ignorant. Can you ELI5 like what is the impact of this? What does it change? What can you do with it on practical terms?


Imagine that you make your living in a somewhat unorthodox but honest way: you live in a tent outside of the castle and every morning, as the king walks into the castle, he hands you a deck of cards. "I've just played cards last night," he says "and I'm going to again tonight, and I'd like these cards ordered and sorted by suite and number by this evening." Then the king goes into his castle and does whatever it is that kings do in castles all day.

So, you take this deck of cards, which is in any old order, as the king just played with it last night, and you set about sorting it. Now, you've been doing this a long time, and your methodology has improved over the years. When you started, you would just lay the cards out in the order the king gave them to you, and then start at the beginning of the line and ask "is this card greater than the next?" and move the two cards in response to the answer so that the two are in order. You would continue this process, repeating from the beginning of the line, until you made a complete pass through the line without making any changes. Sometimes, this took you all day, sometimes you would even make the king wait for you to finish, and that made him very angry, which is very bad.

Your sorting process is compounded by the limitation of your ability to read and compare the value of two cards, because you don't really know how to read or add, it takes you about ten seconds to decide if a card is greater than or less than another card. For a 52 card deck, this means your original sorting method would require 52 times 52 comparisons, or, in real time, seven and a half hours of non-stop work, leaving you no time to do anything else.

Over the years of doing this job for the king you discovered some shortcuts that would take less time but still produce a sorted deck, despite your limited ability to read and understand the values of the cards. While that still takes ten seconds, your new deck sorting approaches require 52 times 1.7 comparisons, or in real time, about fifteen minutes. This is much, much, better than before, but it would of course be better still if it took even less time, as you have discovered that you can now use this extra time to sort cards for all the other members of the court.

One day a traveling wizard observes you sorting cards and says "you know, this method that you have for sorting cards is quite clever but what if I were to tell you that you could sort these cards in 8.6 minutes flat?" This is about half the time it takes you right now, meaning that you could double the number of decks you sort in a day, doubling your income. You are very interested! "Tell me more," you tell the wizard. "Here, read this paper" the wizard replies, and they give you the paper GP linked.


er, given the specific example used, I would expect the wizard to say "look, you know exactly what the cards are, why don't you buy your own sorted deck of cards and staple them to the floor of your tent. Then when the king gives you a deck, just put each card on top of the matching one, then scoop them all up together in order." to which you would of course reply "but my tent is too small!", and the wizard would say "then just put down one of each number of one suit, plus the other 3 aces. Then pile all the cards on the aces first, just by matching suit, and finally take each of those piles in turn and do it by number."

Now, I haven't read the paper, but my guess is that the next objection is "but the king keeps switching the suits! Yesterday, they were diamonds, today they're little squiggly sketches of soft-serve poop! Which he pays me well enough to not think too hard about!" and the wizard would go on to explain that as long as you ask what the suits and numbers are in advance, you can make piles for them. Or something.

Sorry, usually I try to read papers before commenting, but I've gotta run -- I hear that whining noise that usually means my poop dispenser's jammed again.


quite the catch at the end. The wizard probably talked about teaching a man to fish, too.


ELI5 Edition: Basically everything uses sort (either to rank things or to set up search), so improvements to sort improve basically everything.

Explaining it in further detail like you're a fellow member of our industry:

Edge computing advances are pretty important right now, since the amount of data we're working with on the edge of the network is growing very quickly. Advances in how we compute backpropagation (essentialy using a right-recursive process for the Chain Rule) mean that we can do machine learning on 2015-level phone gpus, rather than 2015-level desktop GPUs.

This advance hints at a promise of similar gains. With care, you can sort much faster. What's more, your sorts and your merge of decomposed sorts is roughly the same cost now. And multiple, layered sorts don't require custom logic or clever functional programming (at the edge developer's model) to compose.

Essentially you pick what fields in complex (maybe even nested) data structures to sort by, in what order, and the system makes a sort.

Why does this matter? Well, it will make nearly all data structures Just Faster; many of them secretly rely on ordered and sorted arrays. It makes it easier for distributed systems authors to make queryable systems (it's pretty easy to write a discriminator or specify a discriminator function remotely and then use that on the fly, as opposed to a comparator).

One place in the client-software world where it can have big impact is offline webapps. Right now, it's almost always cheaper to call out to a remote datastore rather than use a local datastore even if you're using sqlite under the covers because of the cost of synchronizing sqlite's data model. A discriminator-based approach would let you do complex queries of data in memory, but that data could still be themselves commutative data types that are coalescing data out of a data stream (or multiple data streams) remotely.

It's also worth noting that another really cool paper from this year improving on the HAMT trie (https://michael.steindorfer.name/publications/phd-thesis-eff...) could also benefit from this approach, which means all of us using immutable data structures can continue to do so with MUCH better performance characteristics but continued thread safety.


> Essentially you pick what fields in complex (maybe even nested) data structures to sort by, in what order, and the system makes a sort.

Wait, is that it? What makes this novel?


What's novel is projecting that operation efficiently into a field where radix sort can operate on all the discrimination at once.


Can you give an example where the obvious method is inefficient, and what this would do?


I just wanted to give you a taste of how this works in an ungeneralized way before I went. Linear time algorithms are fast and feel very different. I want to stress this is not directly related to the technique in the paper, but is in the same "family" of algorithms, and is very easy to understand.

Imagine you're doing the classic "write an algorithm to determine if one word is the anagram of another". The classic solution is to sort and compare the strings to be checked.

We can do this pretty elegantly for strings without using quicksort. It goes like this: allocate a block of 26 bytes. Each byte is a counter for a letter in that position (0 -> A, 1 -> B, ...). Now sweep across the string, and each time you see a letter, increment that number. If you really care hard, you can go crazy with SIMD and other clever tricks here.

Once you're done this for 2 strings, you need only compare the two regions for equality (a constant time operation).

The worst case, average case, and minimum running time of this algorithm is O(len_word_1+len_word_2)+c, and because we can safely make assumptions about the length of the strings (no word in anym ISO-Latin-1 encoding exceeds 255 of any character), we can do it fairly compactly. This is MUCH faster and can be done with perfect memory safety (which is to say, it is optimal with mutability but all operations are commutative so they may be subdivided, done in any order, and recombined at will).

Try implementing this. It's really, really fast on modern hardware. Especially if you're working it out for a full /usr/share/dict/words file on a normal _nix/bsd installation.

We can even see that composability feature in our current problem! Imagine we have have that big dictionary file and we want to find ALL the anagrams in it. Imagine we start cutting the above technique in half and computing the byte vector for every word in the dictionary (which is O(n) time). If we sort THESE vectors, we could just pick out all the equal values and say "There are the anagram groups", right?

If we were to insert them into some very wide trie (as is the style these days), that'd be O(Numwords * log(numwords)), which is really just sorting again. We'd do it in minimum 2 passes with O(n log n) cost. But if we have composable discriminators we could do this in one big pass with O(n) cost! Our constant factors would grow because we're dealing with more memory. We could then not only sweep across the sorted list to extract groups (note the symmetry with the ABC counter approach above), but we could also pretend it's a tree structure (start at the length/2 element in the list and pretend its a binary tree) and search for new items inside the block, so we could check for new word anagrams subsequently in O(log n) time.

The paper I linked talked about generalizing this idea (a special case of radix sort) and making it composable.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: