If you look at slide #14, you'll see Opera was an early adopter.
Firefox published a "No" blog-post in 2013, and Safari removed WebP support from Sierra preview in 2016, eventually adding it back in 2020. Stuff happened.
And yes, when WebP was created there was a real, non-incremental, need for a Web-oriented image format. Nowadays, it's just incremental improvement on this idea for browsers.
AVIF is equivalent in decoding complexity, even in software. It's not order of magnitude different.
Encoders OTOH can be as slow as you want them to be.
To minimize additional computation JPEGs often use a limited number of scans, for example 5.
By default JPEG XL's progressive uses only two scans, 8x8 DC, and the transforms in the second pass using a 256x256 tiles in encode-time chosen priority order. This choise allows JPEG XL to do only one round of DCTs even in the progressive case. The 8x8 DC is interpolated using cheaper methods.
Because of the design choices, every JPEG XL image is guaranteed to be at least minimally progressive in the same manner, i.e., 8x8 DC first. Having a guarantee will make it more rewarding for system designers to focus on extracting some user-experience benefit from that feature.
But, provided you don't need the intermediate results, you can rearrange the data back to progressive and then render it the simple way, for those times when CPU power is constrained more than the network. Total CPU use ends up being pretty much the same (the data rearrange step is rather cheap compared to the idct's)
No... After all but the final pass of the image has been delivered you have enough data to start doing progressive rendering. The final pass contains most of the bytes, so you still get a pretty decent progressive render too.
iirc, back in ~99, Unreal Engine was doing dithering on the u/v coordinates (!!) for texture mapping instead of bilinear interpolation. They used fixed Bayer matrix.
This was quite faster, visually pleasant, and was adapting nicely to viewing distance.
This 'new' WebP format is not radically different, but just pushes the same use-cases further
(The web. Not archival. Not camera capture.), sticking to simplicity and usefulness.
10 years is a long time since WebP-v1, and techno has evolved since. We can go much lower in bitrate for still-good-enough quality, and that's what we're interested in exploring.
AVIF was designed starting from a video codec, similar to WebP-v1 in the early stages, and this has some drawbacks (that later we needed to correct in WebP, that was confusing).
WebP2 is aiming at images-on-the-web right away, and not as an afterthought.
video format usually take up too much memory: what you gain in efficiency costs in resources.
Conversely, animated WebP are dirt cheap: one buffer only, written over and over.
Then, there's optim being made in WebP to allow fast jump to keyframe, even when there's transparency. Video codec don't allow that, and you can have an arbitrary long torture sequence of transparent frame that needs to be decoded back when the video comes in the view again.
Last, animation are usually low-fps (~10fps): there, video codec don't perform very well and are basically keyframes. So the difference isn't as great as one would think.
Oh, and hardware need a 'reset' between decoding tasks, to reconfigure memory, and decoding can't be parallelized.
It is complementary to AVIF (which targets photo capture more than delivery over the web).
WebP2 is like WebP: keep it simple, with only useful and efficient features, no fancy stuff. And aim at very low bitrate, it's the current trend, favoring image size over pristine quality.
Progressive rendering uses ~3x more resources on client side. So, instead, better have efficient 300-bytes thumbnails in the header.