I enjoy listening to mixtapes, so I use Shazam frequently. The problem with using existing CLI programmes is that I can't loop Shazam while it listens to my speaker or mixtape files.
While working on the project, I found two secrets about Shazam's API:
1. Shazam allows for a maximum audio length of 12 seconds per request.
2. You can send up to 20 requests per minute before hitting Error 429.
1. I did some development myself for a "Track Discovery for Djs"[1] project in this space of "dj music recognition" and I am wondering how are you able to handle mixtapes and dj mixes when there is a significant element of sound manipulation/distortion applied, like pitch/tempo + various effects? In my tests this totally confused the algorithms which were not designed to handle such cases.
2. Can you share which algorithm have you implemented for this project? I did read most of the research papers in this space and my preferred solution was to build upon https://github.com/JorenSix/Panako which I did.
In the space of "minimal microhouse techno" type of genre where there are often similar rhythm patterns or even tracks build up using same sample packs it proved to be more difficult to have reliable results than not.
I was investigating how Spotify and other market leaders can do track recognition and they do train ML models on the same track which has applied 100+ various different effects...
Curious to hear your thoughts...
[1] - https://rominimal.club