Versions of quick sort differ in how the partitions are determined. A guess is that some of the versions are not worst case O(n log n) for sorting n keys. In that case, for sufficiently large n, on a normal computer, any version of heap sort will beat that version of quick sort in number of comparisons, time in seconds, Joules of energy to run the computer, etc.
It is this point that has much of the academic computer science community saying that no sorting algorithm based on comparing keys two at a time can beat heap sort.
Sure, we don't know how big n has to be. In practice, in an actual case, the n might have to be too big for current computers.
Sure, in practice, for some value of n, on some list of n keys, some version of heap sort, and some version of quick sort, the quick sort might run 10 times faster in seconds of elapsed time than heap sort.
I'm not completely sure what you are saying, but do you actually think a heap sort is in general faster than a quicksort or mergesort? You realize that the worst case of a quick sort is easily avoided right? The only way it happens is if you have an already sorted array and you pick your pivots from the minimum or maximum values on every single partition.
It is this point that has much of the academic computer science community saying that no sorting algorithm based on comparing keys two at a time can beat heap sort.
I think you're the only one saying that. Where did you get this idea? I just showed you a quicksort being 10 times faster than a heapsort. You can try this for yourself.
Sure, we don't know how big n has to be. In practice, in an actual case, the n might have to be too big for current computers.
This is never going to be true. Quicksort and other sorts exploit locality much better. Partitioning an array gradually makes the sorting more local. That's going to work better relatively to a heap sort as data gets bigger, not worse.
Sure, in practice, for some value of n, on some list of n keys, some version of heap sort, and some version of quick sort, the quick sort might run 10 times faster in seconds of elapsed time than heap sort.
No, on any modern computer other sorts are going to beat a heapsort. It isn't exotic, it is how it works. Even if you have to scan every element first to get a distribution of your elements to choose your pivot, that is still going to be O(n log n) and it will still beat a heap sort.
Heap sorts are not the fastest way to sort. They can be consistent and they can split the time between inserting and extracting elements.
> I'm not completely sure what you are
saying, but do you actually think a heap
sort is in general faster than a quicksort
or merge sort
"in general" is not very specific.
The main issue is the big-O expression.
Again, if win in the big-O expression,
then, in the assumptions of the context,
for sorting n keys for all sufficiently
large n will win.
Sure, maybe the heap sort O( n log(n) ) is
wrong. In that case, maybe could beat
heap sort.
How could O( n log(n) ) be wrong? Well
for large enough n, could encounter
virtual memory paging that would make the
cost of a comparison grow with n and an
algorithm that had better locality of
reference and less paging might win. So,
for such a context, would have to redo the
big-O derivation.
Merge sort is based on comparing keys two
at a time, and that was the assumption in
Gleason's argument. So, Gleason's
argument should also apply to merge sort.
Some people prefer heap sort or quick sort
because they are in place and merge sort
is not.
I wrote and you quoted:
It is this point that has much of the
academic computer science community saying
that no sorting algorithm based on
comparing keys two at a time can beat heap
sort.
> I think you're the only one saying that.
Also Gleason, Knuth, and much of the
academic computer science community.
> Where did you get this idea? I just
showed you a quicksort being 10 times
faster than a heap sort.
Your "10 times" is not for all values of
number of keys n. The crucial issue is the
big-O expression, and your "10 times" says
next to nothing about the big-O
expression.
Again, the big-O expression is crucial
because winning in the big-O expression
means winning for all sufficiently large
n, and that is, sadly, the only really
meaningful means we have of comparing "in
general" two pieces of code.
Instead, sure, if in a particular
application have an upper bound on n, say,
always less than 10,000,000, then use the
code that is fastest from 1 to 10,000,000
or in expectation on the probability
distribution of n from 1 to 10,000,000 in
the particular application. Or, maybe in
some application don't care about the
average execution time but definitely
always want execution time faster than
some given T milliseconds for all n from 1
to 10,000,000. Okay, pick the code that
best achieves this.
Gleason's work didn't cure cancer, exceed
the speed of light, say what is in the
center of a black hole, say what happened
before the big bang or will happen after
the heat death of the universe, ....
Instead, Gleason stated and proved a
theorem in applied math. The theorem
shows what is the best possible given the
assumptions. Similarly for what Knuth said
about heap sort and Gleason's theorem. A
lot of progress in pure/applied math and
science is like that -- not everything but
just a specific result given some specific
assumptions.
Or, assume that Joe has a new computer,
1000 cores, 10 GHz clock, 500 GB first
level cache, and writes some really
careful code in assembler that makes full
use of the 1000 processors, to sort 10
billion 32 bit numbers via bubble sort.
Then it runs in O(n^2), that is, for some
constant k_Joe runs in
k_Joe n^2 = k_Joe (10^10)^2
milliseconds.
Along comes Bill with a PC based on a 64
bit single core processor with a 2 GHz
clock. Bill's code uses heap sort. Then
Bill's code runs in O( n log(n) ) or for
some constant k_Bill runs in
k_Bill (n log(n) ) =
k_Bill (10^10 log(10^10) )
milliseconds.
And Bill is a bit sloppy and uses an
interpretive language so the constant
k_Bill is large.
Then the ratio of running times is
(k_Joe / k_Bill) (n / log(n) )
Well, assume for simplicity and without
loss of generality that the log has base
10. Then
log(n) = log(10^10) = 10
so that
n / (log(n)) = 10^10 / 10
= 1,000,000,000
Well, let's account for Joe's 1000 cores
to get a ratio of 1,000,000 and Joe's 5
times faster clock speed to get 200,000.
Say the interpretative language Bill used
is 10 times slower than compiled C for a
ratio of
20,000
and say that Joe's hand coding in
assembler is 2 times faster than C code
for a ratio of
10,000.
So, still Bill's code runs
10,000
times faster than Joe's.
That's an example of, the algorithm that
wins in big-O wins for all sufficiently
large n, even considering coding in
assembler with 1000 cores and a 10 GHz
clock.
Sure, for n = 100, maybe Joe's code wins
-- I'll let you find the ratio.
That's what the computer science community
noticed some decades ago and, thus,
settled on big-O as the way to compare
algorithms and code. Again, heap sort
meets the Gleason bound in worst case.
Since in the context the Gleason bound
says that O( n log(n) ) is the best can
do, the best quick sort can do is tie.
And if the quick sort code does not make
the partitions carefully, quick sort won't
be O( n log(n) ) in worst case and for
sufficiently large n will lose by whatever
ratio want, 1,000,000:1 if you want.
Look, guys, all this is just from, say,
the first week of a ugrad course in
computer science.
Arguing with the Gleason bound -- about
like arguing with the Pythagorean theorem.
Don't argue with me. Instead, send a
question to a comp/sci prof at your local
university giving him the URL of this
post.
Also Gleason, Knuth, and much of the academic computer science community.
Prove it, show me who is saying 'nothing can beat a heapsort'.
says next to nothing about the big-O expression.
Why do you think heap sort is the only n log n sort? Link something that proves what you say.
settled on big-O as the way to compare algorithms and code
Time determines speed, that's what this thread is about. I already linked you a benchmark of 32 million floats. Link me something that actually backs up what you are saying.
Arguing with the Gleason bound -- about like arguing with the Pythagorean theorem.
The only thing that comes up is your comment. I've shown you actual results, you keep saying 'maybe possibly in some scenario I can't show, this other one wins, so it is the fastest'. You are hallucinating a reality that you can't demonstrate.
I guess this is the "I already told you" part of the conversation, but you didn't link a single thing.
All you did was repeat your claim over and over. No benchmarks, no link to 'gleason bound' and nothing but hallucinating hypothetical scenarios and declaring that somehow they would back up your claims that go against literally all benchmarks and computer science knowledge. If I'm wrong, show me some links.
If something runs in less time, it's faster. If something runs in 1/10th the time as something else it is multiple orders of magnitude faster. I don't think I've ever seen someone try to redefine 'faster' before.
You keep trying to equate naive algorithmic complexity and speed, but you haven't even linked anything that shows heapsort is better than other sorts like quicksort at that. You haven't actually linked anything to back up any of what you're saying, even the irrelevant offtopic claims.
https://johnysswlab.com/why-is-quicksort-faster-than-heapsor...