Apple is probably going to kill the app. They probably just bought it for the te...

nerfhammer · on Dec 11, 2017

Apple would have no problem implementing something similar.

It's the brand, mindshare and music store/service lead gen that's more difficult to replicate. Why get rid of an icon that's already on everyone's phones that could be a funnel to apple music instead of spotify?

pookeh · on Dec 11, 2017

It's not the tech, it's the data.

zjaffee · on Dec 11, 2017

It's odd that they didn't try to purchase soundhound then. The company has more evolved tech, and also has voice recognition services beyond just music through houndify.

drusepth · on Dec 11, 2017

If they bought soundhound for the tech to bake into their own service, they'd be competing with Shazam. If they bought Shazam for the tech to bake into their own service, soundhound would just exist as an alternative. Seems like an attempt to buy the "name brand" to get tech and inherently beat the competition at the same time.

Note: I've never heard of soundhound though, so it might be popular in some places. Shazam is like the name-brand of music recognition though, to the extent of being a verb.

couchdive · on Dec 12, 2017

thats what they did with the MOG turned BEATS music turned apple music in 8 months.

threeseed · on Dec 11, 2017

It's not just voice recognition.

Shazam has an augmented reality based advertising platform.

annexrichmond · on Dec 11, 2017

What basis do you have for that? When they bought Workflow they didn't kill it.

jonknee · on Dec 11, 2017

Workflow wasn't a popular Android application that sends users to a competing service though... I could see the app live on iOS with Spotify integration stripped out, but seriously doubt it has a future on Android.

jdminhbg · on Dec 12, 2017

Apple Music is on Android, it could just redirect to that and live on.

arprocter · on Dec 11, 2017

It'll be interesting to see how long the Android version lasts

scosman · on Dec 11, 2017

Siri has had this feature for a while, so unlikely for the tech.

stingraycharles · on Dec 12, 2017

Which they implemented using Shazam.

http://appleinsider.com/articles/14/09/19/siri-partners-with...

amelius · on Dec 11, 2017

(removed)

vedant · on Dec 11, 2017

This has the smell of a comment written by someone with limited real-world experience. Simply writing down the list of problems you would have to solve to build Shazam would take an entire afternoon.

Yes, deep neural networks have proven remarkably useful for machine perception, but you would still need to collect a colossal amount of audio data, fingerprint all of it, build a low-latency processing infrastructure for making inferences, and convince a hundred million people to install your software to feed you copious real-world training data that you can use to improve model performance.

amelius · on Dec 11, 2017

> and convince a hundred million people to install your software to feed you copious real-world training data that you can use to improve model performance.

That's actually the easy part. You already have the music. Distorting it by superimposing background noise is really not difficult.

tsomctl · on Dec 11, 2017

Lol. When you superimpose noise, the original data is still there. When you have a FM radio playing staticky, heavily compressed music through crappy speakers in an acoustically terrible store and being captured by a terrible microphone and then being compressed, a significant amount of nonlinear distortion has taken place. That is extremely hard to model. And you would have to model it or have real data to train a neural network. Neural networks are extremely hard to train without excellent data.

amelius · on Dec 11, 2017

> playing staticky, heavily compressed music through crappy speakers in an acoustically terrible store

I think you just confirmed how easy (and cheap) it is to actually generate this data.

abraae · on Dec 11, 2017

If by "model" you mean reproduce, why is it so hard to simulate the distortion introduced by poor FM reception? Or by the acoustics of a store?

d33 · on Dec 11, 2017

I wonder if you could build a neural network that would do that...

throwawayfinal · on Dec 11, 2017

The easy part? If you're apple, google, or microsoft, maybe.

amelius · on Dec 11, 2017

Can you explain?

I mean, you can easily find thousands of hours of music online. Recording background noise is easy (just go to a random bar where they are not playing music). Now simply add the two signals (you can shift them randomly to generate more data). You can also add some linear filtering if you like (just imagine random settings of an equalizer for starters).

This should give you enough data to build a proof of concept at least.

throwawayfinal · on Dec 11, 2017

For training:

Illegally grabbing thousands of hours of music to train a commercial model hardly qualifies as fair use. Any company you build upon that would be tainted.

For sustaining:

In addition, you'll need to keep an updated catalog of music to identify new songs against, and most uses of a service like shazam are to find names of songs people aren't familiar with, so that catalog needs to be very fresh.

That means you'll have to grab some sort of feed, and engage in large scale music piracy for commercial gain or have access to a library of songs from many disparate music providers, such as ascap.

Background noise:

there are literally hundreds of different background noise environments you need to train against. Dozens of common microphone configurations. Clipping, variations.

It's very much a problem where a proof of concept is neat but doesn't really get you anywhere.

throwawayfinal · on Dec 11, 2017

Also, I'm not saying it's impossible or not worth doing (obviously, it's possible and worth doing), just that a few minutes of thinking and hacker news comments are going to hardly touch the breadth of difficulties required to get this to work even somewhat reliably.

dzhiurgis · on Dec 11, 2017

> use to improve model performance.

Shazam doesn't actually let you improve the answer, nor report incorrect guess. They are so confident with them, even if it's sometimes completely missed genre and style of music.

dawnerd · on Dec 11, 2017

I'd be more curious to see you try to build this in an afternoon.

Also, it works a lot better than being able to find "slightly distorted" versions. It can catch a song in a noisy room where you can barely make out the song to begin with. Couple months back it found a song when there was a very loud crowd yelling over it. They're also able to determine differences between versions of songs pretty well. Some remixes might sound very close to the original.

Other thing you might be missing is just how fast it is even on a slow mobile connection.

amelius · on Dec 11, 2017

(removed explanation because it seems not appreciated)

vedant · on Dec 11, 2017

This is a heap of nonsense. Not only does this summarily dismiss the enormous challenges in digital signal processing required for removing arbitrary background audio, it exposes some confusion associated with the ideas of correlated random variables, inner products, and affine transformations.

amelius · on Dec 11, 2017

> it exposes some confusion associated with the ideas of correlated random variables, inner products

Read this article, then come back: [1]

[1] https://en.wikipedia.org/wiki/Cross-correlation

xapata · on Dec 11, 2017

> (removed explanation because it seems not appreciated)

I think describing the reaction as lack-of-appreciation is a bit misleading. Perhaps disbelief might be a better description.

amelius · on Dec 11, 2017

For the wrong reasons. See my response.

xapata · on Dec 11, 2017

Since you deleted the original comment, it's hard to evaluate it. I think you should have left it there.

mmmmmbop · on Dec 11, 2017

So they're doing this vector correlation for every song in their database (40 million)?

ska · on Dec 11, 2017

A tiny percentage of the time occasionally when a successful company makes a N-hundred million purchase of a technology and company and you don't understand why, it's because they have made a mistake.

The smart money, though, is on the main chance: you don't understand the purchase, or the problem domain, or both.

In this case I think you are overestimating the progress in NN and search, and underestimating the signal processing. Have you tried this with any significant corpus?

"Whack it through a FFT and do correlation " seems like one of the obvious solution to the toy problem version, but this is exactly the sort of thing that usually falls apart in practice.

paulcole · on Dec 11, 2017

> isn't it possible to build this yourself in Python in an afternoon?

Is anyone keeping a running list of products that HN commenters have suggested could be built in an afternoon/weekend?

Ones I've seen so far: Facebook, Twitter, Dropbox, and now Shazaam.

fuck_google · on Dec 11, 2017

Building the service is not usually the hard part, but building the ecosystem around it is. There were/are countless services similar to Facebook or Twitter, but only a few of them can be really successful because of herd mentality.

cocktailpeanuts · on Dec 11, 2017

Go build it yourself in an afternoon and come back.

Then we'll talk.

Zenst · on Dec 11, 2017

Some insight into the Shazam tech and article has a link to the algorithm being used.

https://www.toptal.com/algorithms/shazam-it-music-processing...

amelius · on Dec 11, 2017

Thanks. This actually proves my point that the core concepts of Shazam can be implemented in a weekend. Of course, programming the front-end etc. is more work, but that is besides the point.

ghaff · on Dec 11, 2017

You'd need access to the fingerprints database but otherwise should be fairly straightforward.

What's not straightforward is recognizing cover songs and the like. But that's not only non-trivial but AFAIK can't be done.

amelius · on Dec 11, 2017

> What's not straightforward is recognizing cover songs and the like. But that's not only non-trivial but AFAIK can't be done.

Well, you could translate the music into actual notes (or musical intervals), and use Smith-Waterman (or any more advanced and more recent technique) to find the song with the lowest edit-distance.

rhizome · on Dec 11, 2017

Converting digital audio to notes is both not as easy as it sounds and not how Shazam works.

https://www.toptal.com/algorithms/shazam-it-music-processing...

amelius · on Dec 11, 2017

Yes, you can look at the frequency with the highest intensity in the FFT. This is the "dumb" version of converting music to notes (and is what I really intended to say but didn't choose to for sake of brevity).

rhizome · on Dec 11, 2017

The thing is that the process looks for spectral patterns, let's call it "harmonic content per unit of time," not just notes. Mere notes would result in lots and lots of false positives.

amelius · on Dec 11, 2017

Let's just agree that the process is not too far removed from my initial brief description, and should be simple to implement, as the article shows. For any competent signal processing engineer, this should all be evident, which was the main point.

Also, even if you have many false positives, you have already narrowed down the search, and this allows you to do more brute-force searching like computing cross-correlations.

cortesoft · on Dec 11, 2017

Where are you going to get 'the music'? There are millions and millions of hours of music out there, how are you going to gather and fingerprint it all?

azinman2 · on Dec 11, 2017

Uhh isn’t creating the “fingerprint” the non-straightforward part? Keep in mind you could start listening at any point in the song as well.

chiefalchemist · on Dec 11, 2017

Yeah you can build the platform but you won't have any data; no history of searches. You also won't have any users.

I'm not disputing they overpaid. However, long to short, building the technology is the easy part, and just a fraction of the brand / product value.

lavezzi · on Dec 11, 2017

y'all realise they run a music service too right? Having access to cross-platform data that gives them insight into bleeding edge emerging/trending artists and songs is priceless.

amelius · on Dec 11, 2017

Apple already had a music service too (itunes).

oarsinsync · on Dec 11, 2017

Different signals.

"What's that song?" is a different signal to buying a song. Especially when "what's that song?" isn't restricted by licensing agreements.

coppolaemilio · on Dec 11, 2017

It is already working. It is not that simple to start from scratch and get it working properly + the DB and everything

donarb · on Dec 11, 2017

It's not only the song recognition technology, Shazam also has video recognition and AR technology.