This is a good reminder of just how young of a field compsci still is. Sorting is one of those fundamental things you don't usually think about these days in your day-to-day work. Unless you're an academic, for most practical uses it's a solved problem. Of course there are edge cases like sorting all of Twitter and whatnot, but for most situations, that holds. It's then both exciting and refreshing to see new work still being done on these fundamentals, pushing the envelope where you might've not even known there was anything left to push.
I double checked and it's indeed faster, up to 3x faster if the data is partially sorted.
It uses O(n) aux memory so if std::sort were to use this it would still need IntroSort as a fallback, but that's pretty much a non-issue since std::stablesort already does the same.
Is there a kind of JIT-like concept where a sort function might learn about runtime sorting trends in unsorted data layout and automatically change to a different algorithm based on those trends?
Ie. I'm getting a lot of mostly sorted clusters so algorithm X is best.
In modern computers locality is usually the biggest factor if you want speed. A lot of sorting algorithms that might have the same algorithmic complexity can have different locality properties.
If you think about quicksort, it partitions everything starting from the coarsest level, so the locality increases as it goes along.
There are likely a lot of unexplored sorting algorithms that use statistics gathering to at least increase locality faster by placing items closer to their final place sooner.
If you think about an array of numbers, you can gather a distribution first and try to place numbers where they would end up according to the distribution to get them much closer to their final position than an initial partition. That would mean running through the array once first, but because of prefetching, that would be much faster than accessing elements by skipping around.
The concept is interesting, but I'm a little leery of the detail that the benchmark performance reported is in each case "the best run out of 100".
One can claim, thus, only that this sort method's near-best-case performance is better than `std::sort`'s near-best-case performance (at least at the, er, P01 level). That says little to nothing about the modal or reasonably-bad-case (say, P95) performance.
A comparison of median values would lend much more weight to the claim that this method is "faster" than `std::sort`.
But a really good benchmark report would show the entire histogram of performance for each method.
Average is reported in the data table as well. In every case average is exceptionally close to best, leading me to believe that your interpretation of "the best run" is incorrect and that it's merely eliminating system jitter not looking at different data.
While you get better performance, you do trade off for more memory usage. std::sort is in-place and uses O(1) extra memory. gridsort uses O(n) extra memory, where n is the number of elements you are sorting.
I was gonna say: isn't that just the MIT license? But then I saw this:
> The person recognizes Mars as a free planet and that no Earth-based government has authority or sovereignty over Martian activities.
Yeah... Thanks... As much as I (and much of the world's countries[0]) may agree with you, now I can't use the code, even in an MIT licensed library or program.
Look, I get why people do this: it's cute, fun, or whatever, but please: don't. It makes it impossible for those of us to use a library legally. Sure, most won't care, and I'm sure the author won't enforce it, but it doesn't matter. A corporation's legal team is going to deny any attempts to use software with licenses like this.
Take, for example, the JSON license[1]. It contains this:
> The Software shall be used for Good, not Evil.
Who decides what "good" is? Crockford? Me?
Is Google evil? They say they're not, but some think they are. Who's right? It's a legal conundrum that makes things harder than they should be. This blog post[2] explains this license's line's problems better.
when I worked at google "don't be evil" was not anywhere in the new employee stuff, or anywhere else I saw after.
That said I agree entirely: what one person says is good and evil is depressingly variable (to use the internet extreme: to a nazi capturing jews is good, because jews are evil - which to you and I is [presumably] clearly insane)
I was trying to avoid “Poe’s Law-ing” my comment, but Nazi genocide was exactly what I was thinking of. They (at least the higher ups) believed it was for the greater good, but the rest of the world (and history) has disagreed with them. So much so, we executed the higher ups during the trials afterwards.
Slightly offtopic, but how does a license affect the idea of this sort? If I wanted to use this, and I couldn't because of the quirky mars license, would I be allowed to write my own implementation?