I've been wanting to dive a bit deeper on what aspects are effected by the spacing between mics, and your "wide ORTF + center" question made for a good jumping off point for me typing this earlier this morning.
Similar to the addition of some soundboard feed to any taper stereo pair recording, if inclusion of soundboard vocals helps fill the center when using the "wide ORTF" mount, an additional center mic may work by sort of playing the same role, but ideally the spacing should be made greater when introducing that center microphone position at the array itself. That's because introducing a center microphone or pair to the microphone array raises additional aspects that the inclusion of direct soundboard output does not. There is more happening than just reinforcement of some sources in the center by mixing in some soundboard:
There are at least three things effected by the near-spacing between microphone positions:
1) Constructive/destructive interference
2) Diffuse field correlation/decorrelation
3) Imaging aspects
These things apply to any number of microphones greater than one, but generally, the well-known two-channel near-spaced microphone configurations have already been optimized in terms of spacing (the spacing for DIN, NOS, ORTF, etc, was determined long ago), and is of somewhat less significance when the channels aren't going to be mixed together electronically, which includes panning microphone outputs to positions other than hard left/right. With three or more microphone positions these things become more significant when some of those channels are going to be mixed together, such as the center microphone being mixed into Left and Right output channels. This is closely related to mono-compatibility, only extended to more than two microphones.
Somewhat more spacing helps with all three of those things.
1) The constructive/destructive interference thing relates the spacing between micrphones to wavelength. The phase difference between direct-arriving signals at more than one microphone shifts with different angles of arrival. That will produce comb-filtering to some degree, which differs with source position. Comb-filtering can be a most audible thing, most obvious by far while actively changing the spacing or effective spacing (or the delay between channels), especially when the two channels are mixed together. At closer spacings it audibly effects mid or even somewhat higher frequencies where it tends to be especially noticeable. Wider spacings shift the peaks and valleys downward in frequency. It is very obviously audible when you hear those peaks and valleys shifting around as the spacing is changed, and audible but not nearly as obvious when the spacing remains are static - the peaks and valleys are still there though, acting sort of like and EQ with a wavy curve. In my experience, this is something a taper needs to home in on empirically, by trying different spacings and figuring out what sounds right to them in terms of tone and frequency balance. It's certainly the most tweaky aspect, and a relatively small change in spacing will shift the combing significantly, either for the better or worse. Overall its probably the most important aspect of the three, yet is most difficult to predict beforehand simply by measurement. I know from experience that the spacings I use myself work, but I cannot offer a prediction for folks using significantly different spacings with regard to this. The same thing goes on at lower frequencies with wider spaced omnis, except the spacing between comb peaks and valleys is greater. If you've ever found that sometimes you seem to get less bass rather than more with a pair of spaced omnis, and the bass sounded right in the room, its is likely that the spacing between omnis is causing a destructive valley at the frequency where the bass seems weak, relating to some lateral or off-axis bass mode. With a different spacing that weak frequency zone would become neutral or emphasized.
2) Diffuse field correlation/decorrelation is related, but easier to predict as the relationship is more straight forward, essentially about the phase difference between channels being above or below a certain threshold. Essentially, we want the direct arriving sound from the stage and PA to produce a signal relationship between channels that is mostly phase coherent with clear and predictable phase correlation. This can occur with spaced microphone positions when the spacing is perpendicular to the wavefront arrival, minimizing the difference in distance from the source to either microphone (even though its these relatively small non-zero differences that contribute to comb-filtering). At the same time, it helps if the indirect-arriving reverberant sound, dominated by the room and audience sound, has a mostly random phase relationship between channels, making it decorrelated. That makes that stuff sound diffuse, open, airy, lush, eliminates comb-filtering and perceptually keeps the reverberant room and off-axis audience sound from interfering with the coherent direct sound from the stage and PA. More spacing = more decorrelation for sources that are increasingly off-center with respect to the array. Directional pattern and angle can also achieve this, by having the null of one pattern line up with the on-axis direction of the other, but that means pretty wide angles which tend to put the mics off-axis from from the PA, so in most taping situations spacing is the better option to achieve good low diffuse field correlation.
3) Imaging- This is probably least important but easiest to talk about, and predict (along with correlation). It is what the Stereo Zoom and virtualization app tools do well. Some of those, the Schoeps Image Assistant in particular, also predict diffuse correlation, but it's hidden on a different graph than the imaging information which tends to be the primary focus of the apps. Adding a microphone or coincident pair between an existing stereo pair of microphones will make the stereo recording pickup angle wider. That may seem somewhat contradictory to it also solidifying the center, and it is to some extent, but its more about how wide the recording sounds, sort of "how much is pulled in" rather than how solid the center seems. Another aspect of imaging is how accurate the apparent source positioning is - do sources on stage sound like they are well focused and placed in the same position as they were live during the performance? This is more of a nice to have thing. Only folks that were there at the live event and pay attention to this kind of thing will know if the imaging accuracy isn't accurate, otherwise it just need to sound good and plausible, and some recordings are more plausible than others. The previous two aspects are more fundamental and effect enjoyment of a recording far more. They can even be heard and appreciated with one ear or one speaker, while imaging requires two ears and a proper stereo triangle or headphones.