Author Topic: Taperssection - product listening comparisons gold standard (Read 19962 times)

digifish_music · « **on:** December 02, 2008, 06:38:38 PM »

Aaronji asked a good question and I thought it deserved it's own topic...perhaps we can come up with a gold standard for product comparisons, and how some of the criteria may be achieved?

Quote from: aaronji on December 02, 2008, 08:03:52 AM

Quote from: digifish_music on November 30, 2008, 11:14:01 PM
All 'sake oil' is 'essential oil' until it fails a well constructed blind listening test

digifish

Funny, when I made the "snake oil" crack, I was thinking of this thread http://taperssection.com/index.php/topic,109703.0.html, so I didn't really get your response. Then I read the FR2LE thread, and it all made sense...

I am curious what people would consider constitutes a "well constructed blind listening test", though. Seems like it would be difficult to simulate a concert setting at home and even more difficult to run a controlled test at a show. Any thoughts?

A couple of basic things (all intended to improve listener sensitivity to differences)...

1. Levels need to be matched. Louder almost always sound better. The threshold for loudness differences is in the order of 0.5 dB (under the best listening conditions in an A/B back-to-back change. It rises to about 1.5 dB at 1 minute separation between the examples, again under ideal listening conditions). In the real world 1 dB is fairly close to the just noticeable limit.

2. The samples need to be spliced back-to-back with an instant (click free) transition between the recordings. The duration of the selections should ideally be around 5 seconds (however this is open to debate and will depend on the qualities being assessed. Generally, short selections A/B no gap usually produce the most sensitive discriminations).

3. The listener should be allowed repeated presentations of each comparison paring until they make their decision.

4. There should be an objective response indicator. That is they should be asked to choose based on preference - either 1st or 2nd half, or pick the modded one or whatever. But they should be given a A/B 'forced choice'.

5. The number of comparisons should exclude chance from playing a significant role in the outcome. It's typical to shoot for 95% confidence in the perceptual world. So that means a performance of ~7 consistent identifications out of 8. Ideally 10 randomized A/B or B/A comparisons would make a good standard.

5. The comparison should be done blind. That is the listener should not know which example they are listening to on any given session.

6. No feedback during the assessment session should be given. The internet makes this easy as you can download a file and listen to it without seeing the experimenter or interacting with them.

7 The recordings should be simultaneous from the same set of mics. Alternatively, the same the same microphones used in the same position recording the exact same sound. That is why the recording of a Hi-Fi source, ticking clocks, test signals or whatever is a great paradigm.

Note: One thing you should not do is compare A vs B with the knowledge of what you are listening to. Perception is just too influenced by expectation.

digifish

Church-Audio · « **Reply #1 on:** December 02, 2008, 06:39:45 PM »

Yeah but it must be a live band not a recording because the sound stage is not the same.

it-goes-to-eleven · « **Reply #2 on:** December 02, 2008, 06:58:16 PM »

Quote from: Church-Audio on December 02, 2008, 06:39:45 PM

Yeah but it must be a live band not a recording because the sound stage is not the same.

Whoa, we agree!

Church-Audio · « **Reply #3 on:** December 02, 2008, 07:07:05 PM »

Quote from: Freelunch on December 02, 2008, 06:58:16 PM

Quote from: Church-Audio on December 02, 2008, 06:39:45 PM
Yeah but it must be a live band not a recording because the sound stage is not the same.

Whoa, we agree!

Sorry I was joking

digifish_music · « **Reply #4 on:** December 02, 2008, 07:12:24 PM »

Quote from: Church-Audio on December 02, 2008, 06:39:45 PM

Yeah but it must be a live band not a recording because the sound stage is not the same.

In this case point 7 requires some compromise, options may be ...

A. Parallel recording rigs. Same brand/model of mics into the recorders. Mics located coincident as possible.
B. A mixer feeding the same signal to both recorders from the same set of mics simultaneously.
C. Sequential recordings of the same band using the same mics etc (not my preferred)

others?

Although I don't necessarily agree that recording a live band will reveal the differences most clearly between recorders, even if that is the ultimate usage. Perhaps test-signals or mechanical noises may be easier to hear differences?

digifish

digifish_music · « **Reply #5 on:** December 02, 2008, 07:14:54 PM »

Quote from: mshilarious on December 02, 2008, 07:10:54 PM

Quote
5. The number of comparisons should exclude chance from playing a significant role in the outcome. It's typical to shoot for 95% confidence in the perceptual world. So that means a performance of ~7 consistent identifications out of 8. Ideally 10 comparisons would make a good standard.

95% confidence for statistical significance is not the same thing as a 95% selection rate. Statistical significance is mainly influenced by sample size.

Anyway, you can have a very small difference in observed measurement and still have statistical significance; that simply means that the observed difference, while significant, had a small effect.

Lots more here:

http://en.wikipedia.org/wiki/Statistical_significance

I do agree that the first step is objective measurement of whether the source files can be discerned is helpful. That can be done both by listening and analysis, ideally both. Once statistical or analytical significance can be established in differentiation, then it becomes interesting to make subjective observations.

I would also add each sample shall be a .wav file of not more than 10MB. There is simply no way I am ever going to download a 300MB test file, by the time it's done downloading I have probably lost interest, and I'm not going to devote 30 minutes+ to a listening test unless there is the potential for financial gain . . . which can't be realized with a 5 minute listening test

If you are to have a pool of subjects than any statistical test would be done on the pooled population. But I don't think we will ever get that far. We just need a criteria that can reasonably exclude chance from the listening session for individuals.

BTW: it would only take 5-10 minutes to work through 10 x 10 second clips and make A/B preference decisions. If the differences were pronounced, then you would do it in about 4.

digifish

page · « **Reply #6 on:** December 02, 2008, 09:28:31 PM »

Quote from: digifish_music on December 02, 2008, 07:12:24 PM

In this case point 7 requires some compromise, options may be ...

A. Parallel recording rigs. Same brand/model of mics into the recorders. Mics located coincident as possible.
B. A mixer feeding the same signal to both recorders from the same set of mics simultaneously.
C. Sequential recordings of the same band using the same mics etc (not my preferred)

others?

Although I don't necessarily agree that recording a live band will reveal the differences most clearly between recorders, even if that is the ultimate usage. Perhaps test-signals or mechanical noises may be easier to hear differences?

digifish

1) I vote B, we just need to find a mixer that will do 2ch > 4ch. (I'm not familiar with one)
2) Why not do that, *and* the signal tests. I don't see a reason to cheapen the process.

Gutbucket · « **Reply #7 on:** December 02, 2008, 10:16:07 PM »

My pragmatic side urges me to mention that we all sort of come up with our own standards of rigor for comparison testing. Some don't care to do any at all. I've found it incredibly useful to run and compare two rigs myself.. more so than to download and listen to the comparisons of others if only because I know the original sound and all the dirty details of the comp. I guess that invalidates my impartial observer status but makes applying what I hear to what I do next time clearer.

Curious if anyone has tired this comparison software yet. I haven't had time to fool with it but the thought of tuning the typical ABX comp upside down by listening to only the difference between recordings is intriguing. It could also potentially 'train' the listener for what to listen for to hear a difference which might have been missed in an typical quick AB comp, but might begin to hear with repeated or long term listening or as her listening skills advance over time. After all, those Edison cylinders sounded just like the real thing to the first, less critical listeners, but I bet their standards of fidelity changed by the end of their lifetimes.

it-goes-to-eleven · « **Reply #8 on:** December 02, 2008, 10:33:25 PM »

I think the notion of a "*gold standard*" is overly presumptuous.

There are many useful methodologies depending on the gear and source available. All will have compromises.

digifish_music · « **Reply #9 on:** December 03, 2008, 12:20:51 AM »

Quote from: Gutbucket on December 02, 2008, 10:16:07 PM

My pragmatic side urges me to mention that we all sort of come up with our own standards of rigor for comparison testing. Some don't care to do any at all. I've found it incredibly useful to run and compare two rigs myself.. more so than to download and listen to the comparisons of others if only because I know the original sound and all the dirty details of the comp. I guess that invalidates my impartial observer status but makes applying what I hear to what I do next time clearer.

Curious if anyone has tired this comparison software yet. I haven't had time to fool with it but the thought of tuning the typical ABX comp upside down by listening to only the difference between recordings is intriguing. It could also potentially 'train' the listener for what to listen for to hear a difference which might have been missed in an typical quick AB comp, but might begin to hear with repeated or long term listening or as her listening skills advance over time. After all, those Edison cylinders sounded just like the real thing to the first, less critical listeners, but I bet their standards of fidelity changed by the end of their lifetimes.

Interesting SW, I made some EQ changes to a file and did the difference thing and it showed what I had taken away, for listening. However it requires you have identical recordings - well, recordings of the same exact same thing.

Could be very useful. He has a presentation on it here...

http://libinst.com/Detecting%20Differences%20(slides).pdf

BTW: After installing the SW, go here and download the files on this page, they will allow you to hear the various tweaks/differences shown...

http://libinst.com/diffmaker_example_files.htm

It's also interesting in his test (bottom of the page) he chose 7 as a minimum number of A/B comparisons to perform.

digifish

digifish_music · « **Reply #10 on:** December 03, 2008, 12:21:56 AM »

Quote from: Freelunch on December 02, 2008, 10:33:25 PM

I think the notion of a "*gold standard*" is overly presumptuous.

There are many useful methodologies depending on the gear and source available. All will have compromises.

I presume nothing, it's a discussion, started with a list of criteria that I like...you are welcome to suggest your own. My criteria are all selected to make differences (based on my own experience and reading of the literature) more likely to be heard.

digifish

Church-Audio · « **Reply #11 on:** December 03, 2008, 01:20:15 AM »

The problem is this you have people that think using microphones in a live situation is different then sticking them in front of a pair of speakers and micing them..

That is the crux of the situation as long as people think that some "magic only happens live" you will never be able to make a standard. I agree 100% if there are differences they should be audible in any situation and if they are not then one must ask is this mod worth it?

I think some hide behind the fact that there is no standard test. The funny thing is we ALL benefit from a standard test we can all agree on.

You cant use separate mics even if they are close to the same axis they will sound different.. And these differences mean that some of the things being detected in a sample are skewed due to the fact that there are slightly different signal paths... You need to have ONE SOURCE ONE PAIR OF MICS ONE SET OF CABLES if you dont have this you have nothing but a skewed test.

So since we cant ask a band to play it again exactly the same way they played it the first time we need PRERECORDED MATERIAL.. Then and only then can we ever hope to have a test were any differences are with the item added to the signal chain and not mic position and changes in the bands performance.

There is no way in hell you can have two different sets of mics that will sound the same and you cant put them in the same space so you will not have two identical signal chains ever I test mics for a living you change the position of a mic by an inch its a huge difference in the way it sounds in relation to another pair of mics recording the same source less so if the mics are omni but if they are cardioid forget about it.

In the end I think that there is no way we can all agree on a standard test that's sad because there are a lot of companies making money from the fact that we cant agree and they are laughing all the way to the bank.

Chris

digifish_music · « **Reply #12 on:** December 03, 2008, 03:23:10 AM »

Quote from: Church-Audio on December 03, 2008, 01:20:15 AM

The problem is this you have people that think using microphones in a live situation is different then sticking them in front of a pair of speakers and micing them..

That is the crux of the situation as long as people think that some "magic only happens live" you will never be able to make a standard. I agree 100% if there are differences they should be audible in any situation and if they are not then one must ask is this mod worth it?

I think some hide behind the fact that there is no standard test. The funny thing is we ALL benefit from a standard test we can all agree on.

...

You cant use separate mics even if they are close to the same axis they will sound different.. And these differences mean that some of the things being detected in a sample are skewed due to the fact that there are slightly different signal paths... You need to have ONE SOURCE ONE PAIR OF MICS ONE SET OF CABLES if you dont have this you have nothing but a skewed test.

So since we cant ask a band to play it again exactly the same way they played it the first time we need PRERECORDED MATERIAL.. Then and only then can we ever hope to have a test were any differences are with the item added to the signal chain and not mic position and changes in the bands performance.

Chris

Yes identical signal chains up to the mic preamp are ideal and to be favoured. I thought the FR2LE test tenesejedd set up, that I hijacked, was excellent...apart from the role of chance in the outcome. I would have probably chosen a studio recording of the highest quality tho and made sure to quiet the house down a little

Last time I did a test I used pre-recorded material that I synthesized to make sure it was free of (noticeable) hiss and had a decaying echo, so the sound could descend into the preamp noise.

As far as I can see, simple signals (spanning a wide range of frequencies, dynamics & transients) recorded in a quiet location should show differences more readily than in a wash of noise at a concert.

At the end of the day, not all need to agree, if you do the test properly, then those that don't like the methods can complain, but at least some solid data will be out there.

digifish

crispin · « **Reply #13 on:** December 03, 2008, 03:54:56 AM »

Somehow I like this thread ... because it is trying to add some objective criteria to a subject that has become too subjective. It was very informative when digifish divided up two recordings into 8 slices and reassembled them in little pieces for a blind side by side comparison - and those who clearly heard differences before (when they knew which was which) suddenly lost this ability. On the technical level - probably there were still differences in the recorded waveform - but we as the listener could not hear the differences. One can argue that it was an unfair comparison since the recordings were of music from a hi-fi system and not a live performance - but certainly this test tells most users that (a) differences are small (b) choosing the better/clearer/fuller (choose your own adjective) is subjective at best.

When I was choosing a portable recorder to record classical music .. I found the side by side comparisons on the WingField audio website incredibly helpful. Clearly there were large differences in the 'tone' of the music - but who could say if it is better to have a drier sound that may match the original sound or a sound where the bass has been boosted? If Digifish had divided up all these recordings into little clips for side by side comparison ... and asked which clip sounds 'better' in a blind test ... certainly the results would be different than unblind tests. There are also other differences besides the accuracy of the recording the waveform - and that is stereo separation - and since most of us are trying to record in stereo - this must be an important criterion - but how to objectively do comparisons?

So kudos for Digifish's efforts to bring objectivity to such a subjective science.

wbrisette · « **Reply #14 on:** December 03, 2008, 06:34:16 AM »

Quote from: crispin on December 03, 2008, 03:54:56 AM

When I was choosing a portable recorder to record classical music .. I found the side by side comparisons on the WingField audio website incredibly helpful.

What I tend to find is once you get over the low-end records, the recorder has less to do with the process and the microphones, placement, etc. have a lot more to do with it. Which is why there is mention of using same setup rigs here. I'll bet if you put the 7xx series, Nagra VI, Deva 4/5, and Cantar all next to each other all being fed the same signal you wouldn't be able to tell the difference. However, start putting mics in different locations, or even the same location at different times/events, and you may notice some changes. But those come from factors other than the recorder itself.

Wayne

Author Topic: Taperssection - product listening comparisons gold standard (Read 19962 times)

digifish_music

Taperssection - product listening comparisons gold standard

Church-Audio

Re: Taperssection - blind comparison gold standard

it-goes-to-eleven

Re: Taperssection - blind comparison gold standard

Church-Audio

Re: Taperssection - blind comparison gold standard

digifish_music

Re: Taperssection - blind comparison gold standard

digifish_music

Re: Taperssection - blind comparison gold standard

page

Re: Taperssection - blind comparison gold standard

Gutbucket

Re: Taperssection - product listening comparisons gold standard

it-goes-to-eleven

Re: Taperssection - product listening comparisons gold standard

digifish_music

Re: Taperssection - product listening comparisons gold standard

digifish_music

Re: Taperssection - product listening comparisons gold standard

Church-Audio

Re: Taperssection - product listening comparisons gold standard

digifish_music

Re: Taperssection - product listening comparisons gold standard

crispin

Re: Taperssection - product listening comparisons gold standard

wbrisette

Re: Taperssection - product listening comparisons gold standard

Author Topic: Taperssection - product listening comparisons *gold standard* (Read 19962 times)

Author Topic: Taperssection - product listening comparisons gold standard (Read 19962 times)