In any "coincident" miking arrangement, the transit time of direct sound to the microphones, on average (based on the physical center of where the direct sound is coming from), should be identical for both/all channels. Overall, localization cues should be based only the directional pattern of the microphones; their positioning in relation to the sound sources should be neutral. For sounds coming from center, you want to avoid any "precedence effect" (the tendency of the brain to localize according to the first-arriving wavefront--even if the same sound immediately follows in the other channel and is louder). Nor do you want any differences in level to occur due to one mike being closer to the sound source than the other.
The "zenith" orientation of the arrangement as a whole is part of this equation. Elevation and tilt of the array are factors to consider, in other words--particularly if you're doing the crazy thing that this thread is talking about, where a pair of microphones (one directly above the other) is separated vertically. However far apart your "coincident" capsules are, imagine a straight line from the center of one capsule's membrane to the center of the other one. Then choose a point in the direct sound source area that you consider to be its center, and imagine a straight line from there to the midpoint of the line between your capsules. The two lines should always be perpendicular, or else the "coincident" pair should be tilted as a pair to make the lines perpendicular.
As with anything else that's wavelength dependent, higher frequencies are affected more than lower ones by imprecisions in physical setup. But "high-ish" (upper midrange) frequencies are the most important ones for stereo localization, so it's really something to consider.
--best regards