Kuba, congratulations on what you're doing. I hope it continues to be interesting and productive for you.
-- Your questions about equivalences between time differences (delay) and level differences can be answered clearly only for abstract, simplified cases such as a single pure test tone at some frequency such as 400 Hz. With actual program material, the correspondence is much harder to estimate because it depends so much on the content of the material. For example, a close-miked recording of someone playing cymbals (with very strong upper-midrange content and lots of transient energy) would be affected more obviously by timing and level shifts than a semi-distant recording of a men's chorus singing a series of sustained "oooh" and "aaah" sounds in a reverberant space. So, while there is at least one "rule of thumb"-type formula that lets you convert time shift into a roughly equivalent level shift and vice versa, it is only approximate.
-- Your observations about polarity inversion are spot on. Actually whenever any stereo playback system is set up, those tests should be done. You'd be amazed at the number of people, including some audiophiles, who have one speaker wired in opposite polarity to the other one and go for years that way, even though it makes such a big difference in the effect of many recordings.
Note that with many spaced-omni recordings, however, a listener can't tell when one speaker is miswired relative to the other one. Flipping the wiring alters the stereo effect, yes--but usually, neither version sounds clearly right or wrong; they're often just two more or less equally acceptable, alternative renderings of the recording. That's really something to think about--what does it tell us about spaced omnis as a recording method in general?
-- One thing is, though (if I can sneak one more concept in at this point): As you spread two microphones farther and farther apart, what happens is more subtle and complex than polarity change. If you imagine a single-point sound source that's in front of two microphones which are equally distant from that sound source, then the sound from that source will reach both microphones simultaneously regardless of how far apart they are. So there's no conflict and no cancellation (for the direct-arriving sound, anyway). However, if you spread the mikes apart, and move the sound source to where it's no longer on the center line between them, then the direct sound from that source will reach one microphone before the other by some number of milliseconds. As you already know, the earlier time arrival in one channel of the recording gives the listener a cue to the position of the sound source relative to the microphones.
But it isn't exactly polarity that we're talking about here; the signal received in the farther microphone isn't necessarily the inverse of the signal received in the closer one. In fact, the odds are very strong that it won't be. Instead, it will be some degree of difference in a continuous parameter known as "phase." Polarity is really a special case of phase.
This is based on the understanding that any sound in the real world could, with enough time and effort, be analyzed as some mixture of pure tones of various frequencies and intensities. You might need to postulate hundreds or even thousands of such pure tones to account for a single second of a car crash or a rocket launch or a baby's cry--but the theory says that the analysis can eventually be made as complete and precise as you wish it to be. You could add all the (correctly prescribed) pure tones together and the result would be as close to the original as you would like it to be (time and effort being the determining variables).
But for each of those pure tones (sinusoids), there is one further parameter beyond its amplitude and frequency, and that is its phase. Where, at your chosen time of reference (the onset of the sound, for example, or at any particular moment you may choose before or after that), is that sinusoid within its own ritual pattern of oscillation? Is it just beginning to rise from 0 amplitude toward its eventual positive peak, or is is already moving downward from that peak? Has it crossed the 0 line going downward again, heading toward its negative peak, or is it coming back up toward the 0 line and the next positive peak, etc.? Those questions merely break the cycle of that one pure tone's oscillation down into "quadrants" (as in basic analytic geometry), while in fact phase is an "analog" or continuous-value parameter. The phase of the sinusoid at that moment can have any value whatsoever between 0 and 360 degrees (or between -pi and pi if you're schooled that way).
So what is really affected by moving the source relative to the microphones, or the microphones relative to the source, is the phase of each of those signal components (pure tones) relative to each other, and relative to their counterpart(s) in the other channel of the recording. So when you want to gauge the effect of combining signals in various ways, in most cases you have to deal with that level of complexity (or a statistical summary of it, at least) rather than the more black-and-white-seeming issue of polarity.
-- All that being said, you are still exactly right when you say that the farther apart the two main microphones are, the farther into the low frequencies the discrepancies between them will extend. It's just that those discrepancies are a matter of degree (phase) rather than being absolute (polarity). And the corollary is that at higher frequencies, the farther apart the microphones are, the more the phase relationships tend to become random / diffuse / uncorrelated.
The crux seems to be that those discrepancies can be delicious at low frequencies while being confusing and even nasty sounding at midrange and upper-midrange frequencies. This is why some people (I don't know whether there are any on this board, but some people in Europe that I've heard of) record with coincident or closely-spaced pairs of directional microphones placed at the center, while mixing in the signals from a spaced pair of omnis, with "crossover" filtering so that the overall pickup comes increasingly from the spaced pair the farther down in frequency you go.
--best regards