Yes, you can look at the frequency with the highest intensity in the FFT. This is the "dumb" version of converting music to notes (and is what I really intended to say but didn't choose to for sake of brevity).
The thing is that the process looks for spectral patterns, let's call it "harmonic content per unit of time," not just notes. Mere notes would result in lots and lots of false positives.
Let's just agree that the process is not too far removed from my initial brief description, and should be simple to implement, as the article shows. For any competent signal processing engineer, this should all be evident, which was the main point.
Also, even if you have many false positives, you have already narrowed down the search, and this allows you to do more brute-force searching like computing cross-correlations.
https://www.toptal.com/algorithms/shazam-it-music-processing...