OK, here's the basics were dealing with here without too many complex terms-
Stereo is basically level and/or timing differences between channels. Timing in this case is equivalent to phase differences. You can have either or both of those types of differences between the two channels of a stereo recording. A recording made with coincidently arranged X/Y pair of mics or panned inputs through a sound board will produce level differences without significant timing (phase) differences. An A-B spaced pair of omnis produces timing (phase) differences without big level differences. A near spaced pair of directional mics like DIN90 produces both level and timing differences. Level and timing are separate domains. Both effect stereo imaging and adjusting one can somewhat compensate for but not specifically correct for the other. Small adjustments to either will shift the apparent playback imaging around.
I hesitated to mention the Mid/Side idea earlier specifically because I didn’t want to overcomplicate the discussion. I mostly threw that out to Page and bombdiggity because it really does fundamentally address the issue here, I’m just uncertain how to best apply it. Mid/Side manipulation is one of the most basic core elements of stereo imaging tools such as the Waves S1. Converting between Left/Right and Mid/Side is powerful because it is a very fundamental way of manipulating differences between two signals.
You needn't record with a Mid/Side microphone setup to use Mid/Side as a post-production technique. It's just an alternate way of handling 2 channels of differing information other than Left/Right. The Mid/Side and Left/Right are losslessly interchangeable if you make no manipulations before changing back to the other format, and that's how it is used for robust stereo radio and television sound transmission. However in this case, the whole point is that we want to make manipulations of the sound before switching back to the other format and that's where both benefits and problems can arrise. So we listen closely, and do our best to understand what is going on to keep those pitfalls and obscure complications under control. Mid/Side processing is powerful and one of the most commonly used audio mastering techniques for precisely the reason this recording is problematic- dividing the channel differences into center/sides is often more useful than dividing them into left/right. In this case the problem is that left/right differences would preferably be center (vocals) / sides (audience) differences with the music spread between them.
I haven't listened to the sample, but that leads me to a point that often crosses my mind- an unusual playback image is perfectly OK as long as it isn't over distracting or disturbing to listen to. We've all become accustomed to a rather rigid concept of the stereo playback imaging ideal over the course of 50 years of recorded stereo music, yet vocals and crowd reaction don't always have to be centered the same way, they really just need to be enjoyable to listen to. We aren't nearly as critical about those things at the actual live event. I think we can easily get too hung up on listening to specific technical aspects of sound when we are turning knobs, and it can be hard to put that out of mind for a moment and listen simply as a 'music enthusiast’ on purely enjoyment terms. When I can let myself slip over to 'music listener' mode for a moment, the 'audio engineer' part of my brain recognizes that other things like direct/reverberant ratio and frequency equalization are far more important than playback image. Get those things right and you're most of the way there to pleasing the ‘music enthusiast’.
I have a few recordings that are oddly balanced but I actually find them interesting because of that, and attempts to force them towards a more standard/traditional stereo image begin to cause problems with the more important aspects that make them enjoyable.
Oh, and here's a suggestion for addressing the initial root issue of appropriate mic setup, which I find especially useful myself- Don't listen to what your eyes tell you when it comes to the final tweak of pointing the mic array. Close them, listen while trying to forget where things are visually, turning your head until the stereo image is centered as you prefer it to be (doing your best to disregard what you know about the actual physical layout of the venue or placement of musicians PA speakers and all that) and then point the center of your mic array in the same direction as your head. Sometimes I’m surprised at how differently I point the mics using my ears instead of my eyes.