I think a reasonable approach will be similar to a judicial application of noise reduction. If we expect to achieve absolute elimination of the problem we're just setting ourselves up for disappointment. But I suspect a multi-pronged approach will lead to reasonably beneficial results and incremental improvements over time.
It may happen partly by a focus on identifying the noise so as to isolate and reduce it, as problematic as that is for a difficult to define variable noise signal as wforwumbo describes from a high-level of expertise.. And partly, and probably more fruitfully in the near-term I suspect, by identifying, extracting and amplifying the desired signal using an approach like that of the Music Rebalance function in Izoptope R8 cheht describes using for isolating and enhancing vocals. Take something like that and do a multiple stem extraction of all desirable elements, perhaps including a baseline isolated ambient/reverberant stem. That should all be super useful yet may be somewhat over-isolated and artificial sounding, so use the original recording including the talking noise as a bed and seed it with the extracted stem elements, balancing it all to best effect and achieving an increase in desired signal to unwanted noise.
I think a significantly useful reduction in chatter could be achieved that way. Especially as such Rebalance-like signal extraction tools continue to improve.
Its just the conceptual difference of identifying and amplifying all desired signals, if doing that is easier, than identifying the problematic noise signal.
With regards to identification and isolation of the talking noise, I see specific talkers (relatively easily identified as such by a human listener as having diction from a specific location) as being a different, although related issue to the general conversational din and murmur of massed talking, which effectively arrives diffusely from all directions.