Floyd's algorithm scales worse than the naïve implementation though (for small d...

Floyd's algorithm scales worse than the naïve implementation though (for small data elements, when comparisons are cheap).

See "Performance Engineering Case Study: Heap Construction" by Jesper Bojesen, Jyrki Katajainen & Maz Spork

The reason why the naïve algorithm scales better is because with each successive push into the heap, you are likely to hit on the same cache lines (assuming your individual elements aren't the size of the cache line or bigger). This is significantly more important for performance than saving a few comparisons, assuming comparisons are cheap.