Latency hiding is a a way to substantially increase throughput by queuing massive numbers of requests or tasks while waiting on expensive resources (e.g. main memory access can have latency in the hundreds of cycles for GPUs). By scheduling enough tasks or enough requests that can be dispatched in parallel (or very soon after one another), after an initial delay you may be able to process the queued requests very quickly (possibly once per clock cycle or two) providing similar overall performance to running the same set of tasks sequentially with very low latency. However, such long pipelines are very prone to stalling, especially if there are data dependencies that prevent loads from being issued early, so getting maximum performance out of code on architecture that heavily exploits latency hiding techniques can require a lot of very specific domain knowledge.