Taperssection.com

Gear / Technical Help => Post-Processing, Computer / Streaming / Internet Devices & Related Activity => Topic started by: randy3732 on May 25, 2021, 10:42:23 PM

Title: Machine Learning (AI) to quiet talkers
Post by: randy3732 on May 25, 2021, 10:42:23 PM
Has anyone used or heard of any program(s) or online services to quiet talkers (people talking) that ruin live recordings?

I've successfully used Machine Learning video noise reduction and resolution enhancement and was thinking Machine Learning could be used to suppress the voices of people talking while a band is playing. There are a few programs and online services that claim to separate drums, bass, guitars, and vocals, but none I found focus on keeping all the music and quieting non-musical sounds such as talkers.

Things I'd be concerned with:
Keeping the music and vocals the same.
Keeping the ambiance the same.
Not sounding processed or robotic.
Only quieting the people talking.

I have at least 50 recordings I'd like to release where despite my best efforts of recording in front of a PA, some people nearby started talking loudly and they got recorded.

I would be willing to pay $100 for every dB of talker noise reduction. Surely there's a smart group of designers that could make it possible.

Randy3732
Title: Re: Machine Learning (AI) to quiet talkers
Post by: opsopcopolis on May 26, 2021, 03:23:58 PM
I'm sure somebody much smarter than me will have a better answer, but my understanding is that what you're asking for would be very difficult for a few reasons. First off, from my understanding of the tech that is used in those AI instrument separators, it uses a combination of frequency/spatial separation to capture a sort of 'image' of each instrument which is then used to separate out each instrument. For general noise reduction, a similar process is used to capture an 'image' of the noise, often requiring a pretty good section of relative silence to capture successfully. In both cases, you need pretty consistent and clear 'images' of the noise you're trying to separate out, which isn't really available in this scenario.
Title: Re: Machine Learning (AI) to quiet talkers
Post by: hoserama on May 26, 2021, 04:56:18 PM
You could to a mix of AI plus good old fashioned spectral repair.

Split the audio into the stems and then do spectral repair. I imagine the vocals/drums/bass algorithms won't pick up the chatter (although the vocals might). Then you do spectral repair on the remainder, then reintegrate together.

That way if you do spectral repair, you're leaving alone components. Transients like drums would remain relatively unaffected.
Title: Re: Machine Learning (AI) to quiet talkers
Post by: Gutbucket on May 26, 2021, 07:14:58 PM
Moore's law marches ever onward and many compromised recordings await. Mark my words. We'll get there, just not quite yet. Look at what we can do know that was unimaginable a couple decades back. Yet even once we do, its generally good advice to not buy the first model year of any new car.

Title: Re: Machine Learning (AI) to quiet talkers
Post by: Chilly Brioschi on May 27, 2021, 07:29:22 PM
Tasers are illegal in NJ ...
Shock collars, however....     8)
Title: Re: Machine Learning (AI) to quiet talkers
Post by: hoserama on May 28, 2021, 10:29:21 AM
Tasers are illegal in NJ ...
Shock collars, however....     8)

The high pitched YIP when you shock somebody is easy to remove via Izotope RX, but the broadband bzzzzzzz as the electricity is going through them is a bit more challenging.
Title: Re: Machine Learning (AI) to quiet talkers
Post by: checht on May 29, 2021, 03:08:09 PM
You could to a mix of AI plus good old fashioned spectral repair.

Split the audio into the stems and then do spectral repair. I imagine the vocals/drums/bass algorithms won't pick up the chatter (although the vocals might). Then you do spectral repair on the remainder, then reintegrate together.

That way if you do spectral repair, you're leaving alone components. Transients like drums would remain relatively unaffected.

This is my current workflow using RX-8. Split out vocal stem, use spectral repair on it, mix back in. Easy to remove yells and whistles, but talking is really tough.

Additional benefit is that it makes it easy to bring up the vocal on a recording that sounds distant. My 80's km84i > D5 recordings are improved by a bit of this.

At the same time, best way to quiet talkers is to not let talking get on the recording. Currently mastering The Band 12/31/83 opener for the Dead, recorded from the taper section, and there's not one talk/yell/whistle on the whole thing. Really hitting me how different things were back then. Sigh.
Title: Re: Machine Learning (AI) to quiet talkers
Post by: randy3732 on May 30, 2021, 04:08:00 AM
Spectral repair works well to quiet feedback and the occasional too loud "YEA!!!". The recordings I've tried it on for talkers, I can't even see the talkers in the spectrum.

Thank you all for the comments. Hopefully I've planted a seed for someone really smart to figure out an AI solution.
Title: Re: Machine Learning (AI) to quiet talkers
Post by: wforwumbo on June 03, 2021, 04:24:56 PM
I did my doctoral thesis on a related topic - AI for reflection identification, treating the reflections as noise.

It’s a complex problem. Even if you try to remove things in the spectral domain (common with iZotope), you need some decent model of the noise (in this case, a talker) and noise is highly varied.

Video and image processing is a different domain altogether. For one thing, it has a lot more funding generally thrown at it and is more detailed in its study. For another, light and sound operate very differently - especially in the digital domain. I’ve spoken extensively about some of these differences here, I really wish the answer were as simple as “the same concepts in vision apply to audio” but they really do not.

The tech may perhaps get there someday, but I’m under 30 and it’s unlikely to happen in my lifetime.
Title: Re: Machine Learning (AI) to quiet talkers
Post by: Gutbucket on June 03, 2021, 06:08:20 PM
I think a reasonable approach will be similar to a judicial application of noise reduction.  If we expect to achieve absolute elimination of the problem we're just setting ourselves up for disappointment.  But I suspect a multi-pronged approach will lead to reasonably beneficial results and incremental improvements over time.

It may happen partly by a focus on identifying the noise so as to isolate and reduce it, as problematic as that is for a difficult to define variable noise signal as wforwumbo describes from a high-level of expertise..  And partly, and probably more fruitfully in the near-term I suspect, by identifying, extracting and amplifying the desired signal using an approach like that of the Music Rebalance function in Izoptope R8 cheht describes using for isolating and enhancing vocals.  Take something like that and do a multiple stem extraction of all desirable elements, perhaps including a baseline isolated ambient/reverberant stem.  That should all be super useful yet may be somewhat over-isolated and artificial sounding, so use the original recording including the talking noise as a bed and seed it with the extracted stem elements, balancing it all to best effect and achieving an increase in desired signal to unwanted noise.
I think a significantly useful reduction in chatter could be achieved that way.  Especially as such Rebalance-like signal extraction tools continue to improve. 

Its just the conceptual difference of identifying and amplifying all desired signals, if doing that is easier, than identifying the problematic noise signal.

With regards to identification and isolation of the talking noise, I see specific talkers (relatively easily identified as such by a human listener as having diction from a specific location) as being a different, although related issue to the general conversational din and murmur of massed talking, which effectively arrives diffusely from all directions.