Gear / Technical Help > Microphones & Setup

Examples of Diffuse Field Correlation (DFC) - hear what it's about

<< < (2/3) > >>

Gutbucket:
Link to some more recent discussion on Correlation / Decorrelation starting at this post in the Oddball Microphone Techniques thread- https://taperssection.com/index.php?topic=96009.msg2244172#msg2244172, including a link to a technical paper on the topic.

It's somewhat related to direct/diffuse balance, yet different- more about how the direct and diffuse components of a recording are perceived.  I now realize that as recordists we have more direct control over these aspects than I realized in the past, via microphone technique and post-recording mixing methods.

wforwumbo:
This is a fascinating thread. I'd like to add a few notes, if you don't mind - both in the theory behind correlation, and in practice for how that translates to the end product on tape.

A quick preface: I did an undergraduate degree in electrical engineering, specializing in signal processing and communications. I also did a masters and am currently not far off from defending my dissertation in acoustics. I have studied room acoustics, binaural perception, and - notably, for this thread - digital room modeling using filter design.

There are three forms of correlation that should be under consideration. The most generic form, and the one that is often referred to when using the term "correlation," is sort of a running window; its output is a function of time. If we have two stationary signals x(t) and y(t), their correlation is computed by taking one signal, sliding it from negative infinity to infinity over a tertiary dummy-time axis (often notated as tau), and for each moment in tau where the sliding signal is moved computing the area overlapping between the two signals and dividing by the total energy contained in both signals. To describe this process, if we have two signals that are perfectly overlapped, their correlation will be 1. If we have two signals that have zero overlap, their correlation will be zero. And if we have two signals that are perfectly out of phase with one another, their correlation will be -1. It's worth noting that this is again instantaneous as we slide one function over another.

While this is convolution (which is what happens when you filter a signal, for example), which is the time-reverse of correlation, I always liked using this graphic when introducing the concept of correlation/convolution: https://en.wikipedia.org/wiki/Convolution#/media/File:Convolution_of_spiky_function_with_box2.gif. This is probably a more concrete explanation of what I mean by the sliding-tau axis.

The second form of correlation is more concretely known as the correlation coefficient. This is a single number, and is more likely what you are used to seeing on audio gear (i.e. a meter of some sort that slides between -1 and +1 over time). This is simply the maximum value of the above correlation function. Note that it is not an all-inclusive metric that tells the whole story; however, in practice in the field more often than naught the correlation coefficient can still be useful when you don't need all of the details given by the full correlation function (for example, lots of radar operates on the correlation coefficient of bit sequences to perform signal extraction in noisy environments).

And lastly, a bit more complex... is the interaural correlation (IAC) and its corresponding coefficient (IACC). The IAC function is similar to the above correlation, however it is usually taken on binaural signals and the sliding window goes only from -80 ms to +80 ms. This shortened window correlates with the latest early reflections in most traditional concert halls, and as such it's used to judge music. For the world of rock music and live rock taping, it's mostly unimportant as the metric is focused almost solely on the perception of unamplified music. I'm mostly going to ignore this one, as it is less useful in demonstrating how DFC is useful to us.

There are two forms of correlating one or a group of signals that give us useful information: cross correlation, and autocorrelation. Cross correlation of two signals is when you perform the correlation calculation on two signals that are unique (for example, a stereo signal). Autocorrelation is when you correlate a signal to itself. Let's take a look at some simple examples to help illustrate the point:


* The autocorrelation function of an additive white gaussian noise (AWGN) source is zero for all points, except at tau=0, where the correlation is one. This one's important, I'll talk a bit more about this in a second.
* A sine wave correlated with itself is a cosine wave (at tau = zero shift, the signal is perfectly correlated to itself; at 90 degrees out of phase, it's 0-correlated; at 180 degrees out of phase... you get the idea).
* A true AWGN source correlated with anything that isn't also a noise source, is theoretically zero - except at tau = 0, where the correlation is 1. In practice we measure some correlation because no noise source is truly AWGN (as one of my professors always noted in class, "I can't wait a thousand years for you to generate true white noise!). What the theory means is that for a true white noise source, sliding it over itself, there will be zero overlap (not even out of phase) at all points in time, except for when the signal is literally copied right on top of itself.
* Any periodic signal with complex harmonics will have spikes in its autocorrelation at a peroid equal to its fundamental. Likewise, crossing two seemingly uncorrelated signals can show you a common fundamental; this is particularly useful for things such as modulation, or for example estimating the timing of reflections, or for another example estimating the fundamental pitch of recorded sound (this is how "de-glitched" digital pitch shifters work).
* The correlation of a left/right stereo signal will change depending on how "direct" or "diffuse" (more on these terms later, too) the recorded sound is. For more direct sound, the correlation peak will be tighter and more highly-peaked (closer to a dirac delta, or a Gaussian with very low variance), like a sharp mountain with high prominence; for more diffuse sound, it will look closer to a small hill, with a lower peak and the tails on either side being more spread out.
Another quick note I want to make, regarding diffusion and room reflections. There are two major types of reflections in a room: specular and diffuse. Specular reflections are those that follow Snell's Law (angle of incidence = angle of reflection) when a wave is incident on a boundary. This will usually happen if the incident sound has lots of energy, and the boundary material is hard and smooth. Diffuse reflections will scatter according to a cosine law that I don't remember precisely off the top of my head (some function similar to (1-cos(theta))/2, or something of that nature), and happen when sound has less energy and/or the surface is very rough and/or soft and/or porous. If you've ever seen a time plot of a room impulse response, you can think of the early reflections as specular and the late reflections as diffuse. A sound field is considered diffuse when after excitation, you are equally probable to find energy at any point in time and space across the room. Both contribute to our perception of an auditory space: early reflections give us a sense of geometry of the room - the timing of reflections are a direct measure of mean-free (or, for simplicity's sake though this isn't precise, "average-shortest-path") distance that sound has to travel from a source, off of one or two walls, and eventually reach an ear; this gives us auditory information about how far away side walls and ceilings are from us. Diffuse reflections help to provide a sense of envelopment and arguably the size of a room. I should note, everything contained in this paragraph is VERY rough, up for debate in academic and sound engineering communities, and is not necessarily the word of law - rather, it's my understandings and beliefs of the terms in academics and in practice.

Yet another feature making this all the more complicated, is binaural perception. In hearing science, we VERY carefully measure human responses to stimuli with isolated reflections. This has led to the theory of the Precedence Effect, summing localization, echo thresholds, and so on. In short and taken with a HEAVY grain of salt, we SOMETIMES fuse the information from a reflection with its direct sound, if we receive that reflected information within a certain time window; this window changes depending on the source (speech, opera, classical music, rock music, etc; all have different thresholds). Outside of that window, we perceive the reflected sound as separable information and must process the direct and reflected sound separately. This only holds true for specular reflections; diffuse reflections are interpreted as confusing information and will lower the overall correlation. It's why I have a slight problem with your experiment of one ear vs both ears on headphones example above (though that by NO means invalidates most of your other points, many of which I agree with). There's really WAY too much literature out there regarding binaural perception of reflections and the precedence effect, so I'm mostly going to leave that on the table and if anyone has questions or wants to discuss further I'm happy to play ball.

I'm probably going to stop rambling here, as I was gearing up for a discussion about narrowband ITDs and ILDs yet recordings are broadband... it's too much to type out in addition to everything else I've mentioned above - maybe another day.

But the major point I want to add to this thread, is how it all relates to taping. Basically, from what I've seen and done on live tapes I've worked on, is that for live rock music it's best behind the board to point microphones at the stacks. Yes this reduces DFC, but mostly because you're maximizing direct sound and rejecting ALL reflections, not just diffuse sounds. This increases the correlation, not just for the diffuse field but also for early reflections.

EmRR:

--- Quote from: Gutbucket on August 04, 2017, 11:23:30 AM ---Here's a link to a page at Helmut Wittek's hauptmikrofon website with sound sample examples of the same source recorded using several different microphone setups- http://www.hauptmikrofon.de/audio/diffusefield.html.   

--- End quote ---

Thanks for that link, rare to see good comparative samples like this.  Mic and Room and Stereo Ambience are good technique examples also. 

I mostly live in a recording studio (the occupants of which virtually never think about diffuse field recording), but also do live remotes which are typically authorized multitrack affairs, so mainly stage mic feeds plus whatever ambience I can add, last small club recording I did added XY KM140's at FOH, and mid-side Beyerdynamic M130/M160 ribbons as drum OH's, which typically aren't needed in that particular small room situation.  M130/160 like that allow a lot of choices for ambient control in the stage blend.  My thoughts on concert taping are stone-age and catching up, even with awareness of the equipment available nowadays, I still think of taping in the 80's with cassette decks, or the minidisc recorder I carried in the late '90's, no updated portable battery rig and the MD just keeps on working in a pinch.  Attempting stealth audience recordings in the last 20 years, I eventually dropped back to mono, was never happy with the decorellated sound of the two DPA 4060's I was wearing at collar, was always fighting that in post, especially with body movement accounted for.  I've done a few authorized acoustic house party remotes using a Samar MF65 ribbon set and a DPA 4060 as a horizontal surround B-format array placed up close, still well in the learning curve there.  I've added a pair of MKH 30's recently that haven't seen action yet, looking forward to working with those.

Anyway, long rambling first post, thanks again for that link. 

Gutbucket:
Thanks for posting! Great contribution to the thread.  Its really good to get feedback from someone well versed in the mathematical and technical aspects, areas in which I have little more than a layman's understanding.


--- Quote from: wforwumbo on November 29, 2017, 01:09:11 PM ---
* The correlation of a left/right stereo signal will change depending on how "direct" or "diffuse" (more on these terms later, too) the recorded sound is. For more direct sound, the correlation peak will be tighter and more highly-peaked (closer to a dirac delta, or a Gaussian with very low variance), like a sharp mountain with high prominence; for more diffuse sound, it will look closer to a small hill, with a lower peak and the tails on either side being more spread out.
--- End quote ---
^
This aspect is perhaps the most relevant to tapers, and mostly what I've been focusing on.


--- Quote ---Another quick note I want to make, regarding diffusion and room reflections. There are two major types of reflections in a room: specular and diffuse. Specular reflections are those that follow Snell's Law (angle of incidence = angle of reflection) when a wave is incident on a boundary. This will usually happen if the incident sound has lots of energy, and the boundary material is hard and smooth. Diffuse reflections will scatter according to a cosine law that I don't remember precisely off the top of my head (some function similar to (1-cos(theta))/2, or something of that nature), and happen when sound has less energy and/or the surface is very rough and/or soft and/or porous. If you've ever seen a time plot of a room impulse response, you can think of the early reflections as specular and the late reflections as diffuse. A sound field is considered diffuse when after excitation, you are equally probable to find energy at any point in time and space across the room. Both contribute to our perception of an auditory space: early reflections give us a sense of geometry of the room - the timing of reflections are a direct measure of mean-free (or, for simplicity's sake though this isn't precise, "average-shortest-path") distance that sound has to travel from a source, off of one or two walls, and eventually reach an ear; this gives us auditory information about how far away side walls and ceilings are from us. Diffuse reflections help to provide a sense of envelopment and arguably the size of a room. I should note, everything contained in this paragraph is VERY rough, up for debate in academic and sound engineering communities, and is not necessarily the word of law - rather, it's my understandings and beliefs of the terms in academics and in practice.
--- End quote ---

Not rigorously defined perhaps, yet essential elements applying all this to the perceptions generated from listening to an audio recording.


--- Quote ---Yet another feature making this all the more complicated, is binaural perception. In hearing science, we VERY carefully measure human responses to stimuli with isolated reflections. This has led to the theory of the Precedence Effect, summing localization, echo thresholds, and so on. In short and taken with a HEAVY grain of salt, we SOMETIMES fuse the information from a reflection with its direct sound, if we receive that reflected information within a certain time window; this window changes depending on the source (speech, opera, classical music, rock music, etc; all have different thresholds). Outside of that window, we perceive the reflected sound as separable information and must process the direct and reflected sound separately. This only holds true for specular reflections; diffuse reflections are interpreted as confusing information and will lower the overall correlation. It's why I have a slight problem with your experiment of one ear vs both ears on headphones example above (though that by NO means invalidates most of your other points, many of which I agree with). There's really WAY too much literature out there regarding binaural perception of reflections and the precedence effect, so I'm mostly going to leave that on the table and if anyone has questions or wants to discuss further I'm happy to play ball.
--- End quote ---

Binaural perception is the critical second-half of the recording equation (playback), which of course is tied to the particulars of how we setup to make the recording as well. 

A better way of doing the "one ear vs both ears with headphones" DFC demonstration would be to listen with both ears, summing the independent left and right channels to mono for the DFC=1 case.  I suggested the "listening via one or both headphone cups method" simply because it is easy for anyone to do so as to get a practical sense of what all this is about, and unfortunately most software media players don't have easily accessible switching for that sort of thing.  Note that this method would also be imperfect in that it would fully correlate all direct sound arrivals as well as the diffuse content, including early specular reflections.  Fortunately those example files were chosen so as consist predominantly of diffuse sound pickup with very little direct sound or early arriving specular reflections and most everything heard in them is diffuse.  So yes an imperfect example, and your comment about the importance of binaural perception is astute, yet I think the example illustrates what we're talking about for folks who might otherwise be baffled by all this technical talk.


--- Quote ---But the major point I want to add to this thread, is how it all relates to taping. Basically, from what I've seen and done on live tapes I've worked on, is that for live rock music it's best behind the board to point microphones at the stacks. Yes this reduces DFC [my edit and comment here in bold- no, it increases DFC (diffuse pickup becomes more correlated not less as the angle between microphones is reduced) - a typo here I think], but mostly because you're maximizing direct sound and rejecting ALL reflections, not just diffuse sounds. This increases the correlation, not just for the diffuse field but also for early reflections.
--- End quote ---

Yes, from further back it is advantageous to point at stacks to maximize the direct/reverberant ratio as much as possible*.  Yet we needn't accept overly reduced DFC at the same time.  We can maintain low DFC by increasing the spacing between microphones as the angle between them is reduced.  In that way, correlation of diffuse pickup (averaged over the entire sphere) is reduced, as is the correlation of all sources arriving from locations away from central medial plane. Granted, diffuse sound arriving anywhere along the approximate medial plane will not have that reduced correlation, in the same way that direct sound arriving along the medial plane will not, but that's only a fraction of the entire diffuse energy being picked up.

Good direct/reverberant ratio is the most important thing to my way of thinking.  In that way it is of more fundamental importance than keeping DFC low, but I'd argue that low DFC is not far behind in the hierarchy of what makes for a good subjective listening experience with live music recordings.

*I try to avoid statements such as "rejecting ALL reflections", as first order microphones are simply not that directional when used at a significant distance from the source, and it's a common misconception to think of microphone pickup pattern as somewhat analogous to the field of view of a camera, cropping off everything outside of the frame.  That's even more the case for a distant recording position, in which case a predominant portion of direct reflections  arrive from directions not significantly different than the direct sounds themselves (from within the poorly analogous "cropped window").

wforwumbo:
A semantical error on my behalf - I mean that pointing at stacks will be minimizing both specular and diffuse reflections. Though your point here (and most of your other points) still stand.

Either way, I'm excited this thread exists and will be contributing more in the near future as I keep chewing over your thoughts in the back of my head. Also more than happy to explain any concepts to people that have questions. I can also generate plots and short animations in matlab for anyone that wants to more directly see these concepts in a more concrete and easily-understood fashion.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version