If these customers have set an automated system to pay you $200 every single minute, that’s correct. If they haven’t and it was just a one off sale, you are missing the “recurring” part in ARR.
OSRM founder, here. Yes, you are right, many of the speedup techniques are related. My personal opinion is, tho, that looking at the identification of important nodes is best captured by the ideas of applying partitioning to multi-level dijkstra and by what’s called hub-labels. The latter has a close relationship to Contraction Hierarchies.
I've been researching this lately, as we've recently implemented traffic patterns in our routing model and are just now working on live traffic updates. The easiest way to adapt our existing code looks like Customizable Contraction Hierarchies. There's a really nice review paper here: https://arxiv.org/abs/2502.10519 . The technique is to apply nested dissections to build a "metric-independent" hierarchy based purely on connectivity, which gives a decent quality of contraction regardless of transit times. Is that what you mean by decomposing the network into "cells"?
I was referring to variants of Customizable Route Planning. Easiest to implement is likely the partitioner of Sommer et al, and a unidirectional multi-level dijkstra.
There are many similarities between this approach and customisable contraction hierarchies. The latter allows a particularly elegant query-time algorithm involving only a couple of linear sweeps, I suspect even in the many-many scenario.
Efficiency, not effectiveness. They are all effective in the sense that they produce sorted results. Even the non-modern sort algorithms are effective in the sense that the results are correct. This should be about the efficiency with which they do it, right?
Agreed. Effectiveness would imply that some algorithms are more likely to sort the list correctly than others, or they sort a higher percentage of elements. Efficiency is about factors external to the correctness
"The Unreasonable Effectiveness of Mathematics in the Natural Sciences" is one of those titles that gets imitated a lot for some reason. Maybe even more than "Goto Considered Harmful".
A way to give radix sort a performance boost is to allocate uninitialized blocks of memory for each bucket that are of size n. It exploits the MMU and you don’t have to worry about managing anything related to buckets potentially overwriting. The MMU is doing all the bookkeeping for you and the OS actually allocates memory only on page access.
A winner tree uses extra space, doesn't it? That might exclude it from certain applications to be an alternative. Four-ary heaps are roughly (ymmv) twice as fast as binary heaps by exploiting cache locality in a better way for small key/value types. And it seems to be a sweet spot since eight-ary heaps don’t deliver additional improvements.
Yes, since the inner nodes duplicate information, it uses more space (roughly twice as much). I've found them generally most effective for things like merging 256 streams or doing Dijkstra with not super-many nodes (e.g. Contraction Hierarchies). If you have millions or billions of entries, then cache considerations start becoming more important than branch mispredictions and you want something like a B-heap.
That's a nice niche you found (spoken from one heap fan to another) but I have to say I strongly disagree with your use of *roughly* twice as much
At best you were off by one but in the context of performance, you'd want to assign that extra to a super-root of ±inf in order to save log2n extra range checks per heap-up, no?
Ah, I meant that for a classic heap, it's convenient to assign h[0] to the limit of your goal function (e.g. -inf for a min heap) cause that way you can skip the while i>0 check and just do while h[i>>1] > min(h[i], h[i+1]), asserting that the condition is definitely false when i < 2
I guess that it's not as important when you're explicitly willing to pay the piper and run the full log2n iterations just for the sake of branch predictability.
Thanks for the algo though, before today I never would've thought about doing a branchless downheap with i = i<<1 + h[i<<1] != h[i]
The bad news is that their build system is extremely hand-rolled, and so if it works for you, count yourself lucky, because when it doesn't work you're in for 4 hours of python hell
Interesting stuff. Not sure if I read this right that it‘s 16 und 32 bit values of integers that get sorted. If yes, I‘d love to see if the GPU implementation can beat a competitive Radix sort implementation on a CPU.
It's 32 32-bit values which get sorted. I don't think a GPU sort would beat a CPU sort at this scale, even if you don't take kernel launch time into account. CPUs are simply too fast for (super-)small data, especially with AVX-512.
But if we're talking about a larger amount of data, that would be a different story, i.e. as part of a normal gpu mergesort.
It is also useful if your data already lives on the GPU memory. For example, when you need to z-sort a bunch of particles in a 3d renderer particle system.
reply