Yesterday, at my church's Maundy Thursday service, I set up my 170s in 90° X/Y configuration, about 15' behind the director, or 21' from the choir front row. We had the choir, a string quartet, oboe, bassoon, and harpsichord (actually a Clavinova) performing a cantata based on Handel's Messiah; I couldn't monitor since I was singing in the choir. My X/Y setup ends up with about a 1½" vertical spacing between the mics; I might try redoing it to lower the spacing.
Don't worry too much about a vertical spacing of about 1½". Closer is better if you can do it but in reasonably ambient mic setups such as the one you describe I don't see it causing any serious problems and certainly not those you describe. 1½" isn't very different from the capsule spacing in some large diaphragm stereo mics assuming that you mean the capsule centres are 1½" apart, not that there's a 1½" air gap between the two vertically aligned mics?. If it's the latter then you should get them closer - as close as possible without them touching. On closer micing techniques and/or with smaller ensembles or single instruments/voices, then vertical spacing is more of an issue and getting the capsules as physically coincident as possible is much more important. Though the same is true of any coincident technique. Closer and smaller makes for reduced margin for error. Usually, moving one source a foot closer to a mic six feet away makes a huge difference compared to moving it a foot closer to a mic 20' away. It's a game of ratios.
When I got home, and loaded the file into Audacity, I noticed that the instruments were quite clear, while the choir had a strange, distorted sound. Meanwhile, the stereo image left a lot to be desired. However, my purpose in going with X/Y was so that I could convert it to Mid/Side to tinker with the stereo image.
I'm curious what you mean by "the choir had a strange, distorted sound". Without hearing it, it's impossible to be sure but I'm wondering whether it might be a facet of the pair positioning relative to the choir/instruments. At 21' from the front of the choir, I'd guess that unless it's either a large choir or laid out in a very wide arc (which, with rows of 6/7/7, doesn't seem likely), the majority of the singers are being picked up in a relatively narrow portion in the centre of the XY pair acceptance angle. In a 90 degree XY pair of cardioids, the centre of the image is well off axis on both mics, in a region where their frequency responses and polar patterns can be very 'untidy', and I frequently find that the centre stage image produced by such a pair is poor in terms of sounding uneven, too narrow, and 'congested', even distorted. Also I find that it exaggerates the depth and width perspectives between close and far which adds to the perception of distant sources sounding more mono and less clear, when the imaging and clarity on closer sources is proportionally much stronger. This is particularly bad in more reverberant acoustics.
Centre stage clarity, more even left right image, a smoother/more even transition from front to back, and greater control of perspectives are why I prefer MS for most larger ensemble situations requiring a co-incident pair (though I tend to favour some kind of near-coincident or spaced rig over co-incident and usually prefer omnis where possible for their wider frequency response). Even if the situation calls for a Blumlein pair of fig-8s, I often find that an MS pair of fig-8s produces a more pleasing result as I prefer the clarity produced by having a mic facing the centre of the image, with the edges getting progressively a little more 'misty' rather than a dollop of varingly thick fog bang in the centre of the image and comparatively clearer, though often artificially wide, edges.
That said, with the right ensemble playing the right programme in the right acoustic, and the right mic position, a single Blumlein pair can sound absolutely amazing. I just find it rather unforgiving of any compromises. (This after spending three years early in my career working for a small classical label run by an engineer/producer who was a retired maths teacher and adamant that "Mr Blumlein correctly determined the
only correct way to record
anything is using a single coincident pair of figure of eight microphones at 90 degrees"' (in this engineer's case, an AKG stereo mic). Some of his resulting recordings I'd put up there with the very best I've ever heard but many more (imo) are flawed in ways which would've benefitted from some flexibility in terms of mic technique! If only in de-stressing the musicians by saving them from having to spend sometimes hours getting increasingly wound up as ever more implausible setups were tried (some of which were prety unbelievable and today would be outlawed by health and safety legislation instead of just being counter to common sense!) in the vain persuit of an unattainable perfection limited by a dogmatic approach.)
I converted the L/R channels to M/S, and was stunned when I boosted the Side track - the stereo image became vastly better, and the distortion went away. I wound up boosting the Side track by +7 dB, which gave me what I felt was the most realistic imaging. The final result was nothing short of fantastic.
When you say "the stereo image became vastly better, and the distortion went away" how did the image get 'better'?
My takeaway on this is that if you're doing X/Y and it doesn't sound right, convert it to Mid/Side and fiddle with it - you might get a big improvement! I'm wondering, though, if the vertical spacing might be introducing a phasing problem, or if the real issue was recording angle. The choir was in three rows, with six in the front row, and seven in the back two rows; sections were split for balance. The instrumentalists were between the choir risers and the director.
Converting XY to MS is a very useful (and often overlooked) tool for fixing stereo image related problems. It can also cause some problems of it's own which may or many not be bigger than those which it's employed to fix. For example, with the Side running 7dB higher than the Mid you could improve the centre upstage stage image but end up with some rather strange things happening in the phase correlation and image stability on downstage sources. That's not to say the result may not sound better, and from your description, in this case, the possible tradeoffs have paid off and you got a worthwhile result.