Aaronji
asked a good question and I thought it deserved it's own topic...perhaps we can come up with a gold standard for product comparisons, and how some of the criteria may be achieved?
All 'sake oil' is 'essential oil' until it fails a well constructed blind listening test
digifish
Funny, when I made the "snake oil" crack, I was thinking of this thread http://taperssection.com/index.php/topic,109703.0.html, so I didn't really get your response. Then I read the FR2LE thread, and it all made sense...
I am curious what people would consider constitutes a "well constructed blind listening test", though. Seems like it would be difficult to simulate a concert setting at home and even more difficult to run a controlled test at a show. Any thoughts?
A couple of basic things (all intended to improve listener sensitivity to differences)...
1. Levels need to be matched. Louder almost always sound better. The threshold for loudness differences is in the order of 0.5 dB (under the best listening conditions in an A/B back-to-back change. It rises to about 1.5 dB at 1 minute separation between the examples, again under ideal listening conditions). In the real world 1 dB is fairly close to the just noticeable limit.
2. The samples need to be spliced back-to-back with an instant (click free) transition between the recordings. The duration of the selections should ideally be around 5 seconds (however this is open to debate and will depend on the qualities being assessed. Generally, short selections A/B no gap usually produce the most sensitive discriminations).
3. The listener should be allowed repeated presentations of each comparison paring until they make their decision.
4. There should be an objective response indicator. That is they should be asked to choose based on preference - either 1st or 2nd half, or pick the modded one or whatever. But they should be given a A/B 'forced choice'.
5. The number of comparisons should exclude chance from playing a significant role in the outcome. It's typical to shoot for 95% confidence in the perceptual world. So that means a performance of ~7 consistent identifications out of 8. Ideally 10 randomized A/B or B/A comparisons would make a good standard.
5. The comparison should be done blind. That is the listener should not know which example they are listening to on any given session.
6. No feedback during the assessment session should be given. The internet makes this easy as you can download a file and listen to it without seeing the experimenter or interacting with them.
7 The recordings should be simultaneous from the same set of mics. Alternatively, the same the same microphones used in the same position recording the exact same sound. That is why the recording of a Hi-Fi source, ticking clocks, test signals or whatever is a great paradigm.
Note: One thing you should
not do is compare A vs B with the knowledge of what you are listening to. Perception is just too influenced by expectation.
digifish