Gridsort: A stable sort faster than std:sort

Etheryte · on Feb 9, 2021

This is a good reminder of just how young of a field compsci still is. Sorting is one of those fundamental things you don't usually think about these days in your day-to-day work. Unless you're an academic, for most practical uses it's a solved problem. Of course there are edge cases like sorting all of Twitter and whatnot, but for most situations, that holds. It's then both exciting and refreshing to see new work still being done on these fundamentals, pushing the envelope where you might've not even known there was anything left to push.

nedoka · on Feb 9, 2021

I double checked and it's indeed faster, up to 3x faster if the data is partially sorted.

It uses O(n) aux memory so if std::sort were to use this it would still need IntroSort as a fallback, but that's pretty much a non-issue since std::stablesort already does the same.

richardwhiuk · on Feb 9, 2021

How can stablesort use IntroSort if IntroSort is non-stable?

jsnell · on Feb 9, 2021

They mean that std::stable_sort already needs an extra memory buffer, and falls back to a different algorithm if that can't be allocated.

Waterluvian · on Feb 10, 2021

Is there a kind of JIT-like concept where a sort function might learn about runtime sorting trends in unsorted data layout and automatically change to a different algorithm based on those trends?

Ie. I'm getting a lot of mostly sorted clusters so algorithm X is best.

CyberDildonics · on Feb 10, 2021

In modern computers locality is usually the biggest factor if you want speed. A lot of sorting algorithms that might have the same algorithmic complexity can have different locality properties.

If you think about quicksort, it partitions everything starting from the coarsest level, so the locality increases as it goes along.

There are likely a lot of unexplored sorting algorithms that use statistics gathering to at least increase locality faster by placing items closer to their final place sooner.

If you think about an array of numbers, you can gather a distribution first and try to place numbers where they would end up according to the distribution to get them much closer to their final position than an initial partition. That would mean running through the array once first, but because of prefetching, that would be much faster than accessing elements by skipping around.

birdbrain · on Feb 9, 2021

The concept is interesting, but I'm a little leery of the detail that the benchmark performance reported is in each case "the best run out of 100".

One can claim, thus, only that this sort method's near-best-case performance is better than `std::sort`'s near-best-case performance (at least at the, er, P01 level). That says little to nothing about the modal or reasonably-bad-case (say, P95) performance.

A comparison of median values would lend much more weight to the claim that this method is "faster" than `std::sort`.

But a really good benchmark report would show the entire histogram of performance for each method.

BugsJustFindMe · on Feb 10, 2021

Average is reported in the data table as well. In every case average is exceptionally close to best, leading me to believe that your interpretation of "the best run" is incorrect and that it's merely eliminating system jitter not looking at different data.

RcouF1uZ4gsC · on Feb 9, 2021

While you get better performance, you do trade off for more memory usage. std::sort is in-place and uses O(1) extra memory. gridsort uses O(n) extra memory, where n is the number of elements you are sorting.

olliej · on Feb 9, 2021

you need to compare to std::stable_sort, not std::sort

olliej · on Feb 9, 2021

The license does make it incompatible with most/all? other open source licenses

colejohnson66 · on Feb 10, 2021

I was gonna say: isn't that just the MIT license? But then I saw this:

> The person recognizes Mars as a free planet and that no Earth-based government has authority or sovereignty over Martian activities.

Yeah... Thanks... As much as I (and much of the world's countries[0]) may agree with you, now I can't use the code, even in an MIT licensed library or program.

Look, I get why people do this: it's cute, fun, or whatever, but please: don't. It makes it impossible for those of us to use a library legally. Sure, most won't care, and I'm sure the author won't enforce it, but it doesn't matter. A corporation's legal team is going to deny any attempts to use software with licenses like this.

Take, for example, the JSON license[1]. It contains this:

> The Software shall be used for Good, not Evil.

Who decides what "good" is? Crockford? Me?

Is Google evil? They say they're not, but some think they are. Who's right? It's a legal conundrum that makes things harder than they should be. This blog post[2] explains this license's line's problems better.

[0]: https://en.wikipedia.org/wiki/Outer_Space_Treaty

[1]: http://www.json.org/license.html

[1-alt]: https://spdx.org/licenses/JSON.html

[2]: https://www.synopsys.com/blogs/software-security/json-licens...

olliej · on Feb 10, 2021

when I worked at google "don't be evil" was not anywhere in the new employee stuff, or anywhere else I saw after.

That said I agree entirely: what one person says is good and evil is depressingly variable (to use the internet extreme: to a nazi capturing jews is good, because jews are evil - which to you and I is [presumably] clearly insane)

colejohnson66 · on Feb 10, 2021

I was trying to avoid “Poe’s Law-ing” my comment, but Nazi genocide was exactly what I was thinking of. They (at least the higher ups) believed it was for the greater good, but the rest of the world (and history) has disagreed with them. So much so, we executed the higher ups during the trials afterwards.

olliej · on Feb 11, 2021

I couldn’t remember the name of poeing hence my generic “internet extreme”

nextaccountic · on Feb 10, 2021

Also, by itself is not an open source license either (it discriminates between persons or groups, point 5 here https://opensource.org/osd)

lionkor · on Feb 10, 2021

Slightly offtopic, but how does a license affect the idea of this sort? If I wanted to use this, and I couldn't because of the quirky mars license, would I be allowed to write my own implementation?

scandum · on Feb 10, 2021

It broke my heart, but I got rid of the quirky mars condition.

lionkor · on Feb 11, 2021

Does this set legal precedent for martians being denied rights? Oh man, I knew it would start this way...

chrchang523 · on Feb 9, 2021

Interesting work.

License is worth noting. I am considering an "#ifndef FREE_MARS" std::sort/qsort fallback for any code I write that uses this.

shureluck · on Feb 9, 2021

This is super cool!