How do you visit every coordinate in an NxM grid once? The easiest is to process line-by-line, from the first to last column, but if you want better caching you might try alternating the column direction for each row. The Morton/Z-order and Hilbert order give even better cache coherency for some tasks, although the classic versions only work on squares with power-of-two length sides.
Luckily for me, people have developed generalized versions of those algorithms which can handle arbitrary-sized rectangles.
I've taken those and packaged all of the those curves into "rectfillcurve", with an iterator API for generating those curves, and a bonus "mlcg curve" with a pseudo-random visit order that should have poor cache behavior. Implemented in stand-alone C, and also available as a Python module.
Left to right, top to bottom matches memory order exactly and is always going to be fastest if your algorithm allows it. The benchmarks show that.
Zigzag is slightly slower because big CPUs have fetch predictors that recognise backwards access patterns so it's basically the same as going left to right except you trip the predictor up at the start of each row.