Love this idea. I hoped that my AirPods with Live Listen would do the same but was disappointed that it sounded similar to your benchmark example or worse.
I wonder if you all can use another layer in your ML stack to "fill out" the voices once you've isolated them. Your example leaves voices sounding very thin/hollow and even a bit garbled.
I wonder if you all can use another layer in your ML stack to "fill out" the voices once you've isolated them. Your example leaves voices sounding very thin/hollow and even a bit garbled.