Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There have been some attempts at doing this, some of which have been moderately successful. But fundamentally you still have the problem that from the NN's perspective, it's relatively easy for it to learn the magnitude but very hard for it to learn the phase. So it'll guess rough sizes for the real and imaginary parts, but it'll have a hard time learning the correct ratio between the two.

Models which operate directly on the time domain have generally had a lot more success than models that operate on spectrograms. But because time-domain models essentially have to learn their own filterbank, they end up being larger and more expensive to train.



I wonder if there might be room for a hybrid approach, with a time-domain model taking machine-generated spectrograms as input and turning them into sound. (Just a thought, no idea whether it actually makes sense.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: