UTF-8 encoded codepoints can have an odd number of bytes, so processing a file 2...

UTF-8 encoded codepoints can have an odd number of bytes, so processing a file 2 bytes at a time would be a little more complicated. Processing the file one codepoint at a time works because you can decode the UTF-8 stream into a stream of 32-bit codepoints before passing the codepoints to the lookup table. I suppose you could also transform from UTF-8 to UTF-16 before passing values to the lookup table.

Processing byte by byte isn't necessarily faster than processing codepoint by codepoint, or any other size. You'd need to measure performance empirically, and it probably depends on caches sizes and other factors. In theory, you could also process bit by bit — then you'd only need 32 2-element lookup tables — but that's unlikely to be efficient, since you'd need to do a lot of bit manipulation.

Edit: Upon inspection, the method I described doesn't appear to be the method used by the featured program. It still basically uses a lookup table for detecting space characters, byte by byte, but the states are not how I described. Instead of the states representing which byte of a UTF-8 encoded codepoint is being processed, and the word count being incremented upon certain transitions — a Mealy machine — the state represent the class of the codepoint last fully processed, and the count is always increased based on the current state (often by zero) — a Moore machine.