Hoping this thread can somewhat get steered back on topic:
1. Levels need to be matched. Louder almost always sound better. The threshold for loudness differences is in the order of 0.5 dB (under the best listening conditions in an A/B back-to-back change. It rises to about 1.5 dB at 1 minute separation between the examples, again under ideal listening conditions). In the real world 1 dB is fairly close to the just noticeable limit.
This might not be an issue with very accurate recorders or preamps, but should the goal be to match peak or RMS levels?
I was going to whip up a blind comparision of a recording I made with my Audix Micro mics right next to a friend running Neumann KM140s - the idea being we should pretty much all agree that the two setups should sound audibly different, and so it would serve as a kind of test that the sort of blind test you propose either is or is not sensitive enough for the majority of posters here to actually correctly identify the different sources on their playback systems. But evidently one set of mics (or something else in the respective recording chains) must compress transients or somesuch, as matching peak levels results in fairly different average loudness levels.
Potentially, some preamp/recorder mods might do something similar (perhaps even in an audibly pleasing way), which would make proper comparisons difficult. How to proceed in such a case?