Sometimes 'aligned' invariably means SOMETHING will display phasing artifacts, and you're picking the least evil version. This discussion should involve the notion of pre-delay, the timing between a dry sound (source/board) and the reverb return (ambient mics, or in reverb, the sound returning after it's bounced and returned). A lack of pre-delay implies the room has no size, no matter the amount of reverb decay. As someone who does this for a living in recording and broadcast, I've never picked the version that's perfectly aligned but then, my mix sources are more and varied and the room versus direct sound is not balanced more toward the room as it is when you and I make an audience/board recording such as is being discussed here. Even then, any reduction I do in the timing difference is only enough to clean up any sense of echo and tighten it up, but never all the way to matched, usually out in the 15-20mS range, past flanging into echo. I just make the room 'smaller' as needed to get clarity, and a good sense of space. There's an additional next-level argument to be made for measuring tempo per song, and adjusting the timing so the delay is tempo related, creating an on-beat short slapback to accentuate the sense of space, with it being tucked in musically, rather than competing. Overkill, but effective. It's what you do with reverb pre-delay in a mix of dry instruments and vocals when mixing a record. Higher the tempo, shorter the timing difference.
Related tangent. Occasionally I get a mono board tape with no ambience, and often can find a short decay room ambience that gives a little sense of space without obviously being reverb. Usually sounds more like EQ than ambience. Experimenting with pre-delay is critical to the result. The level of it is often surprisingly low to be both effective and non-obvious, frequently -20 to -36 relative, and you can still clearly hear it come and go with a mute toggle.