Segmentation in CPU is pretty common as there's huge opportunity for caching. It's possible to push segmentation into GPU as an optimization but it might not be worth it if the GPU is busy enough with downstream operations. Even then you still cache the results.
I would be surprised if they don't have a dynamic pipeline that can be optimized at runtime.
I would be surprised if they don't have a dynamic pipeline that can be optimized at runtime.