Suggestions above are good practical advice.
I'm posting not with current practical advice but to spark discussion on a potential remedy I've wondered about for a years..
Can we measure the transfer-function relationship between channels of a specific recording configuration with the intent of using that information to create a pseudo replacement channel with at least some subset of attributes of the original when needed? I speculate we may be able to do this before or even after a recording has been made.
Prior to the recording-
Think of digitally modeling an analog effect processor by capturing impulse responses through it. Or modeling the reverberant behavior of a space by capturing its spatial impulse response. Instead one would need to measure the actual microphone configuration itself, likely capturing impulse responses so as to only capture the relationship between the two microphones. One might need to capture impulses for both a direct-arrival frontal source in a pseudo anechoic environment as well as diffuse-arrival (from all directions equally) in a fully echoic environment.
Otherwise it may work best by having stereo impulses made in the acoustic environments in which one records, through the specific microphone arrangement which one is measuring for later channel reclamation should the need arise.
For a pair of microphones, I imagine this would measure the specifics of the cross-correlations between channels for a frontal source as well as the relationship between channels for diffuse reverberant content arriving from all directions. One could subsequently apply that to the "good channel" via convolution to mimic some traits of the missing channel with respect to the existing one.
Challenges will be figuring the need impulse measurements, making them, and applying them.
After the recording-
Whole portions of a recording already have the relationship between microphone channels encoded in them. It should be possible to run an analysis of the complete sections of a recording in order to determine that relationship, then use that information to synthesize at least some missing channel from the good one. High end audio analysis systems (Meyer and such) can do this in real time, with the ability to derive hall-response impulses using just live sound itself as the stimulation signal. We don't need to do it live in real time, instead we can simply run the good portions of the recording through such an analysis and let the computer determine which stereophonic attributes of the recording do not vary over time and are specific to that particular recording.
Either way, I image this should work even better with multichannel recording configurations, where one could tap the interchannel relationships between multiple channels in recreating a single missing channel, making the system more robust. It does require sufficient content "overlap" between channels (indeed that "overlap" is what it would recreate, rather than whatever is unique to that channel), but there is plenty of that in audience recordings.