You could probably achieve the same outcome by combining two approaches though. ...

You could probably achieve the same outcome by combining two approaches though. Use traditional timing and phase management that existing noise cancelling headphones do. Then, using the data from that same set of microphones use AI to extract the conversation of interest (maybe using timing differences from left/right to determine who's "in front" of you) and inject that as the thing to overlay on top of the inversion. This way there's no risk of AI error on the noise cancellation and you can rely on existing solutions.