> Google does not use std::ranges for many reasons, including performance, the ability to create dangling references, and a cubic stack blowup when deeply nesting some range adapters like std::views::filter
I remember discussing this exact problem for boost range in the boost mailing list almost 20 years ago. My proposal was to make ranges be their own thing and generate the iterators on the fly when begin/end was requested, instead of always storing them internally. But I think it was not considered a big issue in practice.
Today we have sentinel end-iterators that can be stateless, so it should be possible to avoid blowout without fundamentally rethinking ranges.
My, admittedly, not so big experience with std::ranges is that they generate an abnormal binary bloat in comparison to the traditional code, so, I also tried to argue against their usage in one my ex-teams who, paradoxically, wanted to achieve tight execution on platforms with tight resources.
Another paradox was that the exceptions were disabled because of reasons while at the same time std::ranges (range-v3) implementation did not support this. I wonder how Google solved this problem since AFAIK they also disallow exceptions.
> cubic stack blowup when deeply nesting some range adapters like std::views::filter
I have no doubt this observation is true, but I haven't been able to picture it, can somebody explain? I don't think ELI5 (let alone an explanation suitable for a Golden Retriever) will work here, but in terms that say a C programmer would grasp. What exactly gets shoved onto the stack when we're using a std::views::filter ?
Thanks but I saw the numbers, however to me the numbers just say "It's terrible" and leave me with the same question I had before, what's actually on the stack ? It's not going to just be empty space left there as a joke, are there... cached values? Pointers to something (what?). I suspect it's supposed to be obvious but I don't get it.
My "Eric's Famous Pythagorean Triples" blog post is 5+ years old and possibly not addressing exactly the same issue as think-cell/Google cares about, but, you might still be interested to read it:
https://quuxplusone.github.io/blog/2019/03/06/pythagorean-tr...
In my case, the blowup happens mainly because the Ranges style tends to turn easily fungible code like `if x < y` into relatively non-fungible data like the capture-list of `[y]() { return x < y; }`. The former is easily subjected to optimizations like the hoisting of loop invariants; the latter is not. Another way to put this is: The compiler is happy to mess with the layout of the function stack frame up until the very last minute; and to mess with the coroutine frame almost as long; but the layout of a lambda (or std::bind) capture frame is set-in-stone very early, by the front-end. And Ranges loves lambdas.
According to https://en.cppreference.com/w/cpp/ranges/filter_view return value of views::filter is either a new ranges::view or some sort of range adaptor that seems to be equivalent to a ranges::filter_view.
I can only guess that it is those return values that are temporarily put on stack so that the consequent | operations could be chained. My hypothesis, without knowing much about this stuff, is that since the compiler cannot elide them (according to the godbolt link from the presentation), stack usage, as found by Google, has cubic growth. I don't quite understand why since I guess it's expected to grow linearly, e.g. 48 bytes at a time.
The naive implementation of range would have a range represented of a pairs of iterators, with every iterator containing a copy of each upstream range, which blows up quadratically; libstdc++ implementation is not implemented this way though: each range has one copy of the upstream range plus the predicate, while this could be optimized further for stateless predicates, the range growth should still be linear.
Yet, when I print the final size of the chained filter ranges and I saw it increasing superlinearly with the number of stages, so I must be missing something.
There is also additional wasted stack space for each temporary range. In theory the compiler should be able to optimize them away, but it probably doesn't always.
Since C++ ranges kinda look like Rust iterators, but I believe Rust doesn't have this problem, can somebody explain why? What's the difference? Or maybe it also has that problem?
Rust's iterators are single-use streams of items, and nothing more.
A Rust iterator can be fully implemented by a closure that either returns the next element or signals it's done. This minimalist interface composes well, especially that it always uses single ownership, so there's always only one copy of the state accessed only from one place.
They can't describe collections. They don't remember their beginning. They don't need to know their end until they reach it. They can't be sorted. They don't need to support splitting or iterating in reverse.
hm, so I tried this on the playground and it seems like Rust iterators also scale super-linearly in debug mode (which is expected, because every adaptor creates a new local that contains the previous one plus something extra), but this seems to go away in release mode (as everything is inlined). Is the problem in C++ something different?
it's exactly the same behaviour in C++. but people are used to this behaviour in Rust while in C++ they're used to writing traditional for-loops which do not have such an overhead at -O0 over -O3
I expect (but I'm not sure) it's because Rust iterators are a deliberately thinner, less powerful idea. Suppose I give you a Rust iterator A and a C++ range B.
You can ask A for the next() item and get back an Option<Item>, this literally iterates as an inherent part of getting your answer, there's no current() or previous() and there's no separate finished() if you ran out of items you got None instead of Some(item). A can be asked for a hint about how much is left, but it's allowed to say it has no useful idea: between zero and more than you can count, inclusive.
B is expected to provide a richer, albeit unsafe, API.
I believe somehow this means it's only reasonable for B to cache answers, as it can be asked to give the same answer again and it'd be inefficient to re-calculate it, whereas there is no way to ask A to repeat an answer so it makes no sense for it to cache (obviously there are special purpose Rust iterators which can have any feature, but the Iterator trait itself doesn't need to cache).
I remember discussing this exact problem for boost range in the boost mailing list almost 20 years ago. My proposal was to make ranges be their own thing and generate the iterators on the fly when begin/end was requested, instead of always storing them internally. But I think it was not considered a big issue in practice.
Today we have sentinel end-iterators that can be stateless, so it should be possible to avoid blowout without fundamentally rethinking ranges.