Without seeing the actual waveforms, we can only guess, but: Some signals have a lot of internal variation in their moment-to-moment levels; others, not as much. It all depends on the nature of the original sound plus any processing that's been applied.
If the variation in levels within a signal is considerable, that signal's average levels will be considerably lower than its peak levels. If the variation is less, then average levels won't be as far below the peak levels. For example, if any compression and/or limiting are applied to a signal, its peak levels will then become more similar to its average levels (the overall variation is reduced). But naturally-occurring (not electronically generated) sound, recorded without compression or limiting, often has peak levels that are much higher than its average levels.
Some people describe this as the "peak-to-average ratio" of the signal. A piano or a drum set can have 20+ dB difference between peak and average levels, while the 'cello section of an orchestra might have only 4 or 5 dB depending on the music and the miking.
The point is, if you have both types of signal on hand and you set their peak levels to be equal, then their average levels will differ considerably--the tracks with greater internal variation (contrast) will have lower average levels. And the perception of loudness depends much more on the average levels than on the peaks.
--best regards