This is a fascinating thread. I'd like to add a few notes, if you don't mind - both in the theory behind correlation, and in practice for how that translates to the end product on tape.
A quick preface: I did an undergraduate degree in electrical engineering, specializing in signal processing and communications. I also did a masters and am currently not far off from defending my dissertation in acoustics. I have studied room acoustics, binaural perception, and - notably, for this thread - digital room modeling using filter design.
There are three forms of correlation that should be under consideration. The most generic form, and the one that is often referred to when using the term "correlation," is sort of a running window; its output is a function of time. If we have two stationary signals x(t) and y(t), their correlation is computed by taking one signal, sliding it from negative infinity to infinity over a tertiary dummy-time axis (often notated as tau), and for each moment in tau where the sliding signal is moved computing the area overlapping between the two signals and dividing by the total energy contained in both signals. To describe this process, if we have two signals that are perfectly overlapped, their correlation will be 1. If we have two signals that have zero overlap, their correlation will be zero. And if we have two signals that are perfectly out of phase with one another, their correlation will be -1. It's worth noting that this is again instantaneous as we slide one function over another.
While this is convolution (which is what happens when you filter a signal, for example), which is the time-reverse of correlation, I always liked using this graphic when introducing the concept of correlation/convolution:
https://en.wikipedia.org/wiki/Convolution#/media/File:Convolution_of_spiky_function_with_box2.gif. This is probably a more concrete explanation of what I mean by the sliding-tau axis.
The second form of correlation is more concretely known as the correlation coefficient. This is a single number, and is more likely what you are used to seeing on audio gear (i.e. a meter of some sort that slides between -1 and +1 over time). This is simply the maximum value of the above correlation function. Note that it is not an all-inclusive metric that tells the whole story; however, in practice in the field more often than naught the correlation coefficient can still be useful when you don't need all of the details given by the full correlation function (for example, lots of radar operates on the correlation coefficient of bit sequences to perform signal extraction in noisy environments).
And lastly, a bit more complex... is the interaural correlation (IAC) and its corresponding coefficient (IACC). The IAC function is similar to the above correlation, however it is usually taken on binaural signals and the sliding window goes only from -80 ms to +80 ms. This shortened window correlates with the latest early reflections in most traditional concert halls, and as such it's used to judge music. For the world of rock music and live rock taping, it's mostly unimportant as the metric is focused almost solely on the perception of unamplified music. I'm mostly going to ignore this one, as it is less useful in demonstrating how DFC is useful to us.
There are two forms of correlating one or a group of signals that give us useful information: cross correlation, and autocorrelation. Cross correlation of two signals is when you perform the correlation calculation on two signals that are unique (for example, a stereo signal). Autocorrelation is when you correlate a signal to itself. Let's take a look at some simple examples to help illustrate the point:
- The autocorrelation function of an additive white gaussian noise (AWGN) source is zero for all points, except at tau=0, where the correlation is one. This one's important, I'll talk a bit more about this in a second.
- A sine wave correlated with itself is a cosine wave (at tau = zero shift, the signal is perfectly correlated to itself; at 90 degrees out of phase, it's 0-correlated; at 180 degrees out of phase... you get the idea).
- A true AWGN source correlated with anything that isn't also a noise source, is theoretically zero - except at tau = 0, where the correlation is 1. In practice we measure some correlation because no noise source is truly AWGN (as one of my professors always noted in class, "I can't wait a thousand years for you to generate true white noise!). What the theory means is that for a true white noise source, sliding it over itself, there will be zero overlap (not even out of phase) at all points in time, except for when the signal is literally copied right on top of itself.
- Any periodic signal with complex harmonics will have spikes in its autocorrelation at a peroid equal to its fundamental. Likewise, crossing two seemingly uncorrelated signals can show you a common fundamental; this is particularly useful for things such as modulation, or for example estimating the timing of reflections, or for another example estimating the fundamental pitch of recorded sound (this is how "de-glitched" digital pitch shifters work).
- The correlation of a left/right stereo signal will change depending on how "direct" or "diffuse" (more on these terms later, too) the recorded sound is. For more direct sound, the correlation peak will be tighter and more highly-peaked (closer to a dirac delta, or a Gaussian with very low variance), like a sharp mountain with high prominence; for more diffuse sound, it will look closer to a small hill, with a lower peak and the tails on either side being more spread out.
Another quick note I want to make, regarding diffusion and room reflections. There are two major types of reflections in a room: specular and diffuse. Specular reflections are those that follow Snell's Law (angle of incidence = angle of reflection) when a wave is incident on a boundary. This will usually happen if the incident sound has lots of energy, and the boundary material is hard and smooth. Diffuse reflections will scatter according to a cosine law that I don't remember precisely off the top of my head (some function similar to (1-cos(theta))/2, or something of that nature), and happen when sound has less energy and/or the surface is very rough and/or soft and/or porous. If you've ever seen a time plot of a room impulse response, you can think of the early reflections as specular and the late reflections as diffuse. A sound field is considered diffuse when after excitation, you are equally probable to find energy at any point in time and space across the room. Both contribute to our perception of an auditory space: early reflections give us a sense of geometry of the room - the timing of reflections are a direct measure of mean-free (or, for simplicity's sake though this isn't precise, "average-shortest-path") distance that sound has to travel from a source, off of one or two walls, and eventually reach an ear; this gives us auditory information about how far away side walls and ceilings are from us. Diffuse reflections help to provide a sense of envelopment and arguably the size of a room. I should note, everything contained in this paragraph is VERY rough, up for debate in academic and sound engineering communities, and is not necessarily the word of law - rather, it's my understandings and beliefs of the terms in academics and in practice.
Yet another feature making this all the more complicated, is binaural perception. In hearing science, we VERY carefully measure human responses to stimuli with isolated reflections. This has led to the theory of the Precedence Effect, summing localization, echo thresholds, and so on. In short and taken with a HEAVY grain of salt, we SOMETIMES fuse the information from a reflection with its direct sound, if we receive that reflected information within a certain time window; this window changes depending on the source (speech, opera, classical music, rock music, etc; all have different thresholds). Outside of that window, we perceive the reflected sound as separable information and must process the direct and reflected sound separately. This only holds true for specular reflections; diffuse reflections are interpreted as confusing information and will lower the overall correlation. It's why I have a slight problem with your experiment of one ear vs both ears on headphones example above (though that by NO means invalidates most of your other points, many of which I agree with). There's really WAY too much literature out there regarding binaural perception of reflections and the precedence effect, so I'm mostly going to leave that on the table and if anyone has questions or wants to discuss further I'm happy to play ball.
I'm probably going to stop rambling here, as I was gearing up for a discussion about narrowband ITDs and ILDs yet recordings are broadband... it's too much to type out in addition to everything else I've mentioned above - maybe another day.
But the major point I want to add to this thread, is how it all relates to taping. Basically, from what I've seen and done on live tapes I've worked on, is that for live rock music it's best behind the board to point microphones at the stacks. Yes this reduces DFC, but mostly because you're maximizing direct sound and rejecting ALL reflections, not just diffuse sounds. This increases the correlation, not just for the diffuse field but also for early reflections.