Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Apple is probably going to kill the app. They probably just bought it for the tech.


Apple would have no problem implementing something similar.

It's the brand, mindshare and music store/service lead gen that's more difficult to replicate. Why get rid of an icon that's already on everyone's phones that could be a funnel to apple music instead of spotify?


It's not the tech, it's the data.


It's odd that they didn't try to purchase soundhound then. The company has more evolved tech, and also has voice recognition services beyond just music through houndify.


If they bought soundhound for the tech to bake into their own service, they'd be competing with Shazam. If they bought Shazam for the tech to bake into their own service, soundhound would just exist as an alternative. Seems like an attempt to buy the "name brand" to get tech and inherently beat the competition at the same time.

Note: I've never heard of soundhound though, so it might be popular in some places. Shazam is like the name-brand of music recognition though, to the extent of being a verb.


thats what they did with the MOG turned BEATS music turned apple music in 8 months.


It's not just voice recognition.

Shazam has an augmented reality based advertising platform.


What basis do you have for that? When they bought Workflow they didn't kill it.


Workflow wasn't a popular Android application that sends users to a competing service though... I could see the app live on iOS with Spotify integration stripped out, but seriously doubt it has a future on Android.


Apple Music is on Android, it could just redirect to that and live on.


It'll be interesting to see how long the Android version lasts


Siri has had this feature for a while, so unlikely for the tech.



(removed)


This has the smell of a comment written by someone with limited real-world experience. Simply writing down the list of problems you would have to solve to build Shazam would take an entire afternoon.

Yes, deep neural networks have proven remarkably useful for machine perception, but you would still need to collect a colossal amount of audio data, fingerprint all of it, build a low-latency processing infrastructure for making inferences, and convince a hundred million people to install your software to feed you copious real-world training data that you can use to improve model performance.


> and convince a hundred million people to install your software to feed you copious real-world training data that you can use to improve model performance.

That's actually the easy part. You already have the music. Distorting it by superimposing background noise is really not difficult.


Lol. When you superimpose noise, the original data is still there. When you have a FM radio playing staticky, heavily compressed music through crappy speakers in an acoustically terrible store and being captured by a terrible microphone and then being compressed, a significant amount of nonlinear distortion has taken place. That is extremely hard to model. And you would have to model it or have real data to train a neural network. Neural networks are extremely hard to train without excellent data.


> playing staticky, heavily compressed music through crappy speakers in an acoustically terrible store

I think you just confirmed how easy (and cheap) it is to actually generate this data.


If by "model" you mean reproduce, why is it so hard to simulate the distortion introduced by poor FM reception? Or by the acoustics of a store?


I wonder if you could build a neural network that would do that...


The easy part? If you're apple, google, or microsoft, maybe.


Can you explain?

I mean, you can easily find thousands of hours of music online. Recording background noise is easy (just go to a random bar where they are not playing music). Now simply add the two signals (you can shift them randomly to generate more data). You can also add some linear filtering if you like (just imagine random settings of an equalizer for starters).

This should give you enough data to build a proof of concept at least.


For training:

Illegally grabbing thousands of hours of music to train a commercial model hardly qualifies as fair use. Any company you build upon that would be tainted.

For sustaining:

In addition, you'll need to keep an updated catalog of music to identify new songs against, and most uses of a service like shazam are to find names of songs people aren't familiar with, so that catalog needs to be very fresh.

That means you'll have to grab some sort of feed, and engage in large scale music piracy for commercial gain or have access to a library of songs from many disparate music providers, such as ascap.

Background noise:

there are literally hundreds of different background noise environments you need to train against. Dozens of common microphone configurations. Clipping, variations.

It's very much a problem where a proof of concept is neat but doesn't really get you anywhere.


Also, I'm not saying it's impossible or not worth doing (obviously, it's possible and worth doing), just that a few minutes of thinking and hacker news comments are going to hardly touch the breadth of difficulties required to get this to work even somewhat reliably.


> use to improve model performance.

Shazam doesn't actually let you improve the answer, nor report incorrect guess. They are so confident with them, even if it's sometimes completely missed genre and style of music.


I'd be more curious to see you try to build this in an afternoon.

Also, it works a lot better than being able to find "slightly distorted" versions. It can catch a song in a noisy room where you can barely make out the song to begin with. Couple months back it found a song when there was a very loud crowd yelling over it. They're also able to determine differences between versions of songs pretty well. Some remixes might sound very close to the original.

Other thing you might be missing is just how fast it is even on a slow mobile connection.


(removed explanation because it seems not appreciated)


This is a heap of nonsense. Not only does this summarily dismiss the enormous challenges in digital signal processing required for removing arbitrary background audio, it exposes some confusion associated with the ideas of correlated random variables, inner products, and affine transformations.


> it exposes some confusion associated with the ideas of correlated random variables, inner products

Read this article, then come back: [1]

[1] https://en.wikipedia.org/wiki/Cross-correlation


> (removed explanation because it seems not appreciated)

I think describing the reaction as lack-of-appreciation is a bit misleading. Perhaps disbelief might be a better description.


For the wrong reasons. See my response.


Since you deleted the original comment, it's hard to evaluate it. I think you should have left it there.


So they're doing this vector correlation for every song in their database (40 million)?


A tiny percentage of the time occasionally when a successful company makes a N-hundred million purchase of a technology and company and you don't understand why, it's because they have made a mistake.

The smart money, though, is on the main chance: you don't understand the purchase, or the problem domain, or both.

In this case I think you are overestimating the progress in NN and search, and underestimating the signal processing. Have you tried this with any significant corpus?

"Whack it through a FFT and do correlation " seems like one of the obvious solution to the toy problem version, but this is exactly the sort of thing that usually falls apart in practice.


> isn't it possible to build this yourself in Python in an afternoon?

Is anyone keeping a running list of products that HN commenters have suggested could be built in an afternoon/weekend?

Ones I've seen so far: Facebook, Twitter, Dropbox, and now Shazaam.


Building the service is not usually the hard part, but building the ecosystem around it is. There were/are countless services similar to Facebook or Twitter, but only a few of them can be really successful because of herd mentality.


Go build it yourself in an afternoon and come back.

Then we'll talk.


Some insight into the Shazam tech and article has a link to the algorithm being used.

https://www.toptal.com/algorithms/shazam-it-music-processing...


Thanks. This actually proves my point that the core concepts of Shazam can be implemented in a weekend. Of course, programming the front-end etc. is more work, but that is besides the point.


You'd need access to the fingerprints database but otherwise should be fairly straightforward.

What's not straightforward is recognizing cover songs and the like. But that's not only non-trivial but AFAIK can't be done.


> What's not straightforward is recognizing cover songs and the like. But that's not only non-trivial but AFAIK can't be done.

Well, you could translate the music into actual notes (or musical intervals), and use Smith-Waterman (or any more advanced and more recent technique) to find the song with the lowest edit-distance.


Converting digital audio to notes is both not as easy as it sounds and not how Shazam works.

https://www.toptal.com/algorithms/shazam-it-music-processing...


Yes, you can look at the frequency with the highest intensity in the FFT. This is the "dumb" version of converting music to notes (and is what I really intended to say but didn't choose to for sake of brevity).


The thing is that the process looks for spectral patterns, let's call it "harmonic content per unit of time," not just notes. Mere notes would result in lots and lots of false positives.


Let's just agree that the process is not too far removed from my initial brief description, and should be simple to implement, as the article shows. For any competent signal processing engineer, this should all be evident, which was the main point.

Also, even if you have many false positives, you have already narrowed down the search, and this allows you to do more brute-force searching like computing cross-correlations.


Where are you going to get 'the music'? There are millions and millions of hours of music out there, how are you going to gather and fingerprint it all?


Uhh isn’t creating the “fingerprint” the non-straightforward part? Keep in mind you could start listening at any point in the song as well.


Yeah you can build the platform but you won't have any data; no history of searches. You also won't have any users.

I'm not disputing they overpaid. However, long to short, building the technology is the easy part, and just a fraction of the brand / product value.


y'all realise they run a music service too right? Having access to cross-platform data that gives them insight into bleeding edge emerging/trending artists and songs is priceless.


Apple already had a music service too (itunes).


Different signals.

"What's that song?" is a different signal to buying a song. Especially when "what's that song?" isn't restricted by licensing agreements.


It is already working. It is not that simple to start from scratch and get it working properly + the DB and everything


It's not only the song recognition technology, Shazam also has video recognition and AR technology.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: